Orthogonality & Instrumental Convergence in Advanced Artificial Agents (Bostrom)

In his review of the theoretical superintelligent will, Nick Bostrom, Director of the Future of Humanity Institute at Oxford, applies a framework for analyzing the relationship between intelligence and motivation in artificial agents, and posits an intermediary goal system that any artificially intelligent system would almost certainly pursue.

Specifically, Bostrom notes the orthogonality of intelligence (here described as the capacity for instrumental reasoning) & motivation, and hence reasons that any level of intelligence could be combined with any motivation/ final goal; in this way, the two may be thought of axes along which possible agents can freely vary. This idea, often concealed by human bias towards the anthropomorphization of non-sensitive systems, implies that superintelligent systems may be motivated to strive towards simple goals (such as counting grains of sand), those that are impossibly complex (such as simulating the entire universe), or anything in-between. They, however, would not inherently be motivated to focus on human final goals, such as the ability to reproduce or the protection of offspring. High intelligence does not necessitate human motivations.

Bostrom ties this notion of orthogonality with the concept of instrumental convergence, noting that while artificially intelligent agents may have an infinite range of possible final goals, there are some instrumental (intermediate) goals that nearly any artificial agent will be motivated to pursue, because they are necessary for reaching almost any possible final goal. Examples of instrumental goals include cognitive enhancement and goal-content integrity. To the former, nearly all agents would seek improvement in rationality and intelligence, as this will improve an agent’s decision-making and make the agent more likely to achieve its final goal. To the latter, all agents have a present instrumental reason to prevent alteration of its final goal, because it is more likely to realize this goal if it still values it in the future.

Bostrom synthesizes the two theories by warning that a superintelligent agent will not necessarily value human welfare, or acting morally, if it interferes with instrumental goals necessary for achieving its final goal.

See his full discussion here!

Advertisements