In his review of the theoretical superintelligent will, Nick Bostrom, Director of the Future of Humanity Institute at Oxford, applies a framework for analyzing the relationship between intelligence and motivation in artificial agents, and posits an intermediary goal system that any artificially intelligent system would almost certainly pursue.
Specifically, Bostrom notes the orthogonality of intelligence (here described as the capacity for instrumental reasoning) & motivation, and hence reasons that any level of intelligence could be combined with any motivation/ final goal; in this way, the two may be thought of axes along which possible agents can freely vary. This idea, often concealed by human bias towards the anthropomorphization of non-sensitive systems, implies that superintelligent systems may be motivated to strive towards simple goals (such as counting grains of sand), those that are impossibly complex (such as simulating the entire universe), or anything in-between. They, however, would not inherently be motivated to focus on human final goals, such as the ability to reproduce or the protection of offspring. High intelligence does not necessitate human motivations.
Bostrom ties this notion of orthogonality with the concept of instrumental convergence, noting that while artificially intelligent agents may have an infinite range of possible final goals, there are some instrumental (intermediate) goals that nearly any artificial agent will be motivated to pursue, because they are necessary for reaching almost any possible final goal. Examples of instrumental goals include cognitive enhancement and goal-content integrity. To the former, nearly all agents would seek improvement in rationality and intelligence, as this will improve an agent’s decision-making and make the agent more likely to achieve its final goal. To the latter, all agents have a present instrumental reason to prevent alteration of its final goal, because it is more likely to realize this goal if it still values it in the future.
Bostrom synthesizes the two theories by warning that a superintelligent agent will not necessarily value human welfare, or acting morally, if it interferes with instrumental goals necessary for achieving its final goal.
See his full discussion here!
3 thoughts on “Orthogonality & Instrumental Convergence in Advanced Artificial Agents (Bostrom)”
Interesting stuff John – curious to read more. For these heady posts, a paragraph dedicated to defining terms would go a long way (really holding the reader’s hand). Without really understanding the key terms, all I could think of in reading this was “Ontology recapitulates phylogeny” from McCormick’s postcolonial treatise… and then I realized I was mispronouncing “orthogonality” (a derivative of “orthogonal,” the geometric term?).
Does Bostrom propose possible “final goals” he foresees these AI agents could latch on to?
LikeLiked by 1 person
Thanks for your feedback! I’ve started linking in definitions to uncommon terms. Jargon tends to be a massive hindrance to effective communication…
An advanced AI’s final goal may be nearly anything conceived by its developer (i.e. anything from counting the infinite digits of pi, to collecting all the world’s paperclips, to understanding the origin of our universe). Here, Bostrom seeks to note intermediate goals that would be pursued no matter the final goal of an AI (and are therefore the only highly predictable actions of a superintelligent system).