Avoiding Unintended Instrumental AI Behavior (Hibbard)

In his review of the hypothesized superintelligent agent, Bill Hibbard, principal author of the Vis5D, Cave5D and VisAD open source visualization systems, proposes a mathematical framework for reasoning about AI agents, discusses sources and risks of unexpected AI behavior, and presents an approach for designing superintelligent systems which may avoid unintended existential risk.

Following his initial description of the AI-environment framework, Hibbard begins by noting that a superintelligent agent may fail to satisfy the intentions of its designer when pursuing an instrumental behavior implicit to its final utility function. This said instrumental behavior, while unintended, could occur in order for the AI to preserve its own existence, to eliminate threats to itself and its utility function, or to increase its own efficiency and computing resources (see: Nick Bostrom’s paperclip maximizer).

Hibbard notes that several approaches to human-safe AI suggest designing intelligent machines to share human values so that actions we dislike, such as taking resources from humans, violate the AI’s motivations. However, humans are often unable to accurately write down their own values, and errors in doing so may motivate harmful instrumental AI action. Statistical algorithms may be able to learn human values by analyzing large amounts of human interaction data, but to accurately learn human values will require powerful learning ability. A chicken-and-egg problem for safe AI follows: learning human values requires powerful AI, but safe AI requires knowledge of human values.

Hibbard proposes a solution to this problem through a “first stage” superintelligent agent that is explicitly not allowed to act within the learning environment (thus refraining from unintended actions). The learning environment includes a set of safe, human-level surrogate AI agents, independent of the superintelligent agent, whose actions in composite mirror those of the superintelligent AI. As such, the superintelligent agent can observe humans, as well as their interactions with the surrogates and physical objects, and develop a safe environmental model from which it learns human values.

Hibbard’s mature superintelligent agent may still pose an existential threat (he specifically notes the dangers of military and economic competition), however, its utility function should assign nearly minimal value to human extinction. See his full discussion here!

A Fireside Chat with Ray Kurzweil

On June 4th, I had the opportunity to host a ‘Fireside Chat’ with Ray Kurzweil at Google’s Headquarters in Mountain View (see the video recording here). The Chat focused on Ray’s predictions for the Singularity, his view on the current state of AI, and the potential economic and societal impact of accelerating technologies.

As a brief background, Ray Kurzweil is one of the world’s leading inventors, thinkers, and futurists, with a thirty-year track record of accurate predictions. Called “the restless genius” by The Wall Street Journal and “the ultimate thinking machine” by Forbes magazine, Kurzweil was selected as one of the top entrepreneurs by Inc. magazine, which described him as the “rightful heir to Thomas Edison.” PBS selected him as one of the “sixteen revolutionaries who made America.”

Kurzweil was the principal inventor of the first CCD flat-bed scanner, the first omni-font optical character recognition, the first print-to-speech reading machine for the blind, the first text-to-speech synthesizer, the first music synthesizer capable of recreating the grand piano and other orchestral instruments, and the first commercially marketed large-vocabulary speech recognition.

Among Kurzweil’s many honors, he recently received the 2015 Technical Grammy Award for outstanding achievements in the field of music technology; he is the recipient of the National Medal of Technology, was inducted into the National Inventors Hall of Fame, holds twenty honorary Doctorates, and honors from three U.S. presidents.

Ray has written five national best-selling books, including New York Times best sellers The Singularity Is Near (2005) and How To Create A Mind (2012). He is a Director of Engineering at Google heading up a team developing machine intelligence and natural language understanding.

This Talk was presented for Google’s Singularity Network.

Modeling Nash Equilibria in Artificial Intelligence Development

In his discussion of a theoretical artificial intelligence “arms race”, Nick Bostrom, Director of the Future of Humanity Institute at Oxford, presents a model of future AI research in which development teams compete to create the first General AI. Under the assumption that the first AI will be very powerful and transformative (a notably arguable one, as per the soft vs. hard takeoff debate), each team is highly incentivised to finish first. Bostrom argues that the level of safety precautions each development team will undertake arises as a reflection of broader policy parameters, specifically those relating to the allowed level of market concentration (i.e. permitted consolidation of research teams), and information accessibility (i.e. degrees of intellectual property protection & algorithm secrecy).

In his work, Bostrom does not reach one specific conclusion regarding AI safety levels, but instead defines a set of Nash equilibria given various numbers of development teams + levels of information accessibility. Specifically, he notes that having additional development teams (and therefore reduced market concentration) may increase the likelihood of an AI disaster, especially if risk-taking is more important than skill in developing the AI. Increased information accessibility also increases risk. The more teams know of each others’ capabilities and methodologies, the greater the velocity of, and enmity in, development; a greater equilibrium danger level follows accordingly.

Bostrom’s derivation is intended to spur discussions on AI governance design. See his original paper here!

Far Out: Using Individual GPS Data to Predict Long-Term Human Mobility

Where will you be exactly 285 days from now at 2PM? Adam Sadilek and John Krumm, researchers at the University of Rochester, seek to answer this question through their work Far Out: Predicting Long-Term Human Mobility. Their model, a nonparametric method that extracts significant and robust patterns in location data through the framework of an eigendecomposition problem, is noted as the first to predict an individual’s future location with high accuracy, even years into the future.

Sadilek & Krumm evaluated a massive dataset, more than 32,000 days worth of GPS data across 703 diverse individuals, by creating a 56-element vector for each day a subject used their GPS device: 24 elements included for representation of median GPS latitude (for each hour of the day), 24 for median GPS longitude, 7 for binary representation of the days of the week, and the final for a binary indicator of a national holiday. By performing their analysis on these ‘eigendays’, Sadilek & Krumm were able to capture long-term correlations in the data, as well as joint correlations between their additional attributes (day of week, holiday) and GPS locations.

The data employed by Sadilek & Krumm is not inimitable; the GPS devices used to track individual location were near replicas of those most people already carry around in their phone. As such, implications for their model are numerous. When focused on an individual, ‘Far Out’ may promote better reminders, search results, and advertisements (e.g. “need a haircut? In 4 days, you will be within 100 meters of a salon that will have a $5 special at that time”). When focused on a societal scale, ‘Far Out’ may allow for the first comprehensive scientific approach to urban planning (traffic patterns, the spread of disease, demand for electricity, etc.), and facilitate previously unseen precision in both public and private investment decisions (where to build a fire station, new pizza shop, etc).

Additional implications may be drawn when long-term human mobility modeling is combined with broader personal information, such as real-time location data or demographic trends. To the former, one may compare recent location information with predicted long-term coordinates to detect unusual individual behavior; to the latter, one could combine long-term location predictions with age, gender, or ethnicity information to predict economic undulations, crime trends, or hyper-local political movements.

See their full methodology & results here!

The Relevance of a Singleton in Managing Existential Risk

The idea of a ‘Singleton’, a universal decision-making agency that maintains world order at the highest level, offers a functional means for discussing the implications of global coordination, especially as they relate to existential risk. In his 2005 essay, Nick Bostrom both introduces the term and provides elaboration regarding possible examples of a Singleton, the ways one could arise, and its ability to manage global catastrophes.

Bostrom notes that a Singleton may come into being in various forms, including, but not limited to, a worldwide democratic republic, a worldwide dictatorship, or an omnipotent superintelligent machine; the final of these is the least intuitive (and certainly the most closely tied to science fiction), but does, in certain forms, meet Bostrom’s Singleton definition requirements.

One may note common characteristics between all forms of a Singleton. Its necessary powers include (1) the ability to prevent any threats (internal or external) to its own supremacy, and (2) the ability to exert control over the major features of its domain. The creation of a Singleton in ‘traditional government’ form may emerge if seen necessary to curtail potentially catastrophic events. Historically, the two most ambitious efforts to create a world government have grown directly out of crisis (League of Nations, United Nations); future increased power and ubiquity of military potential (e.g. nuclear, nanobot, A.I. capabilities) may help rapidly develop support for a globally coordinated government. The creation of a Singleton in superintelligent machine form may arise if a machine becomes powerful enough that no other entity could threaten its existence (possible through an uploaded consciousness or the ability to easily self-replicate), and if it holds universal monitoring/ security/ cryptography technologies (plausible given the rapidly increasing volume of internet-connected devices).

Although not without disadvantages (touched on further in the paper), the creation of a Singleton would offer a method for management of existential risk. See Bostrom’s full discussion on the merits of a Singleton here!

Orthogonality & Instrumental Convergence in Advanced Artificial Agents (Bostrom)

In his review of the theoretical superintelligent will, Nick Bostrom, Director of the Future of Humanity Institute at Oxford, applies a framework for analyzing the relationship between intelligence and motivation in artificial agents, and posits an intermediary goal system that any artificially intelligent system would almost certainly pursue.

Specifically, Bostrom notes the orthogonality of intelligence (here described as the capacity for instrumental reasoning) & motivation, and hence reasons that any level of intelligence could be combined with any motivation/ final goal; in this way, the two may be thought of axes along which possible agents can freely vary. This idea, often concealed by human bias towards the anthropomorphization of non-sensitive systems, implies that superintelligent systems may be motivated to strive towards simple goals (such as counting grains of sand), those that are impossibly complex (such as simulating the entire universe), or anything in-between. They, however, would not inherently be motivated to focus on human final goals, such as the ability to reproduce or the protection of offspring. High intelligence does not necessitate human motivations.

Bostrom ties this notion of orthogonality with the concept of instrumental convergence, noting that while artificially intelligent agents may have an infinite range of possible final goals, there are some instrumental (intermediate) goals that nearly any artificial agent will be motivated to pursue, because they are necessary for reaching almost any possible final goal. Examples of instrumental goals include cognitive enhancement and goal-content integrity. To the former, nearly all agents would seek improvement in rationality and intelligence, as this will improve an agent’s decision-making and make the agent more likely to achieve its final goal. To the latter, all agents have a present instrumental reason to prevent alteration of its final goal, because it is more likely to realize this goal if it still values it in the future.

Bostrom synthesizes the two theories by warning that a superintelligent agent will not necessarily value human welfare, or acting morally, if it interferes with instrumental goals necessary for achieving its final goal.

See his full discussion here!

The Subtlety of Boredom in Artificially Intelligent Systems

Complex Novelty, the ability to identify when an activity is teaching you insight (and is therefore not ‘boring’), poses a challenging theoretical question to those seeking to create an artificially intelligent system. The topic, one that ties closely with the notions of both ‘friendly’ AI & finite optimization, provides a theoretical method for avoiding a tiling the world with paperclips-type scenario. The identification and understanding of complex novelty offers a pathway for AI to self-limit a given optimization process, to self-identify new goals, and to generally avoid extreme optimization towards goals completely alien to those of humans (see: orthogonality thesis).

Eliezer Yudkowsky, founder of the rationality-focused community LessWrong, seeks to discuss the complexity of the issue + its powerful implications for intelligent beings.
See his full discussion here!