The crux of the AI Alignment (safety) movement is based upon two ideas:
- The orthogonality of intelligence and motivation in artificially intelligent systems
- I.e. “Intelligent” systems do not necessarily have human-like motivations
- Instrumental convergence within autonomously-pursued goal systems
- I.e. there exists an intermediary set of goals that any artificially intelligent system would almost certainly pursue, no matter their explicitly set “final” goal. These typically include objectives like goal-content integrity and cognitive enhancement
These principles were first proposed by Nick Bostrom here
Sound bites (and recent articles in New York Times/ Tech Crunch
) do a poor job of communicating these principles. The concern is not
that a conscious AI system will rise up and “take revenge” on humans. Instead, it’s that an advanced and sufficiently capable AI system will pursue a human-specified goal, and by doing so, will inadvertently damage systems/ entities/ institutions that humans value.
The poem begins as an old sorcerer departs his workshop, leaving his apprentice with chores to perform. Tired of fetching water by pail, the apprentice enchants a broom to do the work for him – using magic in which he is not yet fully trained. The broom performs the chores as enchanted and fills the sorcerer’s cauldron with water. However, the apprentice soon realizes that the broom has obeyed only too well. The broom continues to bring buckets and the floor becomes awash with water. The apprentice realizes that he cannot stop the broom because he does not know how.
In a moment of desperation, the apprentice splits the broom in two with an axe – but each of the pieces becomes a whole new broom that takes up a pail and continues fetching water, now at twice the speed. The broom continues to bring water and fills the room until all seems lost. In the final moments, the old sorcerer returns and quickly breaks the spell. The poem finishes with the old sorcerer’s statement that such powerful spirits should only be called by those that have mastered them.
In this story, the apprentice under-specifies the goal he wishes his enchanted broom to pursue, not realizing that the broom:
- Does not care about filling the room with water. It only cares about increasing the chances that the bucket is filled
- Does not wish to be shut off as that would reduce its ability to ensure that the bucket is filled
**”Consciousness” in AI systems is a philosophical question and is not requisite for large-scale, negative impact from AI systems.
Thanks to Nate Soares/ Eliezer for the content behind this post.
On July 22nd, I hosted a discussion with Tim Urban at Google about the future of artificial intelligence. Tim spoke to his blog Wait but Why and his writing on the The Road to Superintelligence.
The video recording is here. This talk was presented for Google’s Singularity Network.
Bio: Tim Urban has become one of the Internet’s most popular writers. With wry stick-figure illustrations and occasionally epic prose on everything from procrastination to artificial intelligence, Urban’s blog, Wait But Why, has garnered millions of unique page views, thousands of patrons and famous fans like Elon Musk. Urban has previously written long-form posts on The Road to Superintelligence, and his recent TED talk on procrastination has more than 6 million views.
Video Recording: Here
On November 23rd, I had the opportunity to host a discussion with John Searle at Google’s Headquarters in Mountain View (see the video recording here). The discussion focused on the philosophy of mind and the potential for consciousness in artificial intelligence.
As a brief introduction, John Searle is the Slusser Professor of Philosophy at the University of California, Berkeley. He is widely noted for his contributions to the philosophy of language, philosophy of mind and social philosophy. Searle has received the Jean Nicod Prize, the National Humanities Medal, and the Mind & Brain Prize for his work. Among his notable concepts is the “Chinese room” argument against “strong” artificial intelligence.
Of special note, there is a question from Ray Kurzweil to John @38:51.
This Talk was presented for Google’s Singularity Network.
In his review of the hypothesized superintelligent agent, Bill Hibbard, principal author of the Vis5D, Cave5D and VisAD open source visualization systems, proposes a mathematical framework for reasoning about AI agents, discusses sources and risks of unexpected AI behavior, and presents an approach for designing superintelligent systems which may avoid unintended existential risk.
Following his initial description of the AI-environment framework, Hibbard begins by noting that a superintelligent agent may fail to satisfy the intentions of its designer when pursuing an instrumental behavior implicit to its final utility function. This said instrumental behavior, while unintended, could occur in order for the AI to preserve its own existence, to eliminate threats to itself and its utility function, or to increase its own efficiency and computing resources (see: Nick Bostrom’s paperclip maximizer).
Hibbard notes that several approaches to human-safe AI suggest designing intelligent machines to share human values so that actions we dislike, such as taking resources from humans, violate the AI’s motivations. However, humans are often unable to accurately write down their own values, and errors in doing so may motivate harmful instrumental AI action. Statistical algorithms may be able to learn human values by analyzing large amounts of human interaction data, but to accurately learn human values will require powerful learning ability. A chicken-and-egg problem for safe AI follows: learning human values requires powerful AI, but safe AI requires knowledge of human values.
Hibbard proposes a solution to this problem through a “first stage” superintelligent agent that is explicitly not allowed to act within the learning environment (thus refraining from unintended actions). The learning environment includes a set of safe, human-level surrogate AI agents, independent of the superintelligent agent, whose actions in composite mirror those of the superintelligent AI. As such, the superintelligent agent can observe humans, as well as their interactions with the surrogates and physical objects, and develop a safe environmental model from which it learns human values.
Hibbard’s mature superintelligent agent may still pose an existential threat (he specifically notes the dangers of military and economic competition), however, its utility function should assign nearly minimal value to human extinction. See his full discussion here!
On June 4th, I had the opportunity to host a ‘Fireside Chat’ with Ray Kurzweil at Google’s Headquarters in Mountain View (see the video recording here). The Chat focused on Ray’s predictions for the Singularity, his view on the current state of AI, and the potential economic and societal impact of accelerating technologies.
As a brief background, Ray Kurzweil is one of the world’s leading inventors, thinkers, and futurists, with a thirty-year track record of accurate predictions. Called “the restless genius” by The Wall Street Journal and “the ultimate thinking machine” by Forbes magazine, Kurzweil was selected as one of the top entrepreneurs by Inc. magazine, which described him as the “rightful heir to Thomas Edison.” PBS selected him as one of the “sixteen revolutionaries who made America.”
Kurzweil was the principal inventor of the first CCD flat-bed scanner, the first omni-font optical character recognition, the first print-to-speech reading machine for the blind, the first text-to-speech synthesizer, the first music synthesizer capable of recreating the grand piano and other orchestral instruments, and the first commercially marketed large-vocabulary speech recognition.
Among Kurzweil’s many honors, he recently received the 2015 Technical Grammy Award for outstanding achievements in the field of music technology; he is the recipient of the National Medal of Technology, was inducted into the National Inventors Hall of Fame, holds twenty honorary Doctorates, and honors from three U.S. presidents.
Ray has written five national best-selling books, including New York Times best sellers The Singularity Is Near (2005) and How To Create A Mind (2012). He is a Director of Engineering at Google heading up a team developing machine intelligence and natural language understanding.
This Talk was presented for Google’s Singularity Network.