Orthogonality & Instrumental Convergence in Advanced Artificial Agents (Bostrom)

In his review of the theoretical superintelligent will, Nick Bostrom, Director of the Future of Humanity Institute at Oxford, applies a framework for analyzing the relationship between intelligence and motivation in artificial agents, and posits an intermediary goal system that any artificially intelligent system would almost certainly pursue.

Specifically, Bostrom notes the orthogonality of intelligence (here described as the capacity for instrumental reasoning) & motivation, and hence reasons that any level of intelligence could be combined with any motivation/ final goal; in this way, the two may be thought of axes along which possible agents can freely vary. This idea, often concealed by human bias towards the anthropomorphization of non-sensitive systems, implies that superintelligent systems may be motivated to strive towards simple goals (such as counting grains of sand), those that are impossibly complex (such as simulating the entire universe), or anything in-between. They, however, would not inherently be motivated to focus on human final goals, such as the ability to reproduce or the protection of offspring. High intelligence does not necessitate human motivations.

Bostrom ties this notion of orthogonality with the concept of instrumental convergence, noting that while artificially intelligent agents may have an infinite range of possible final goals, there are some instrumental (intermediate) goals that nearly any artificial agent will be motivated to pursue, because they are necessary for reaching almost any possible final goal. Examples of instrumental goals include cognitive enhancement and goal-content integrity. To the former, nearly all agents would seek improvement in rationality and intelligence, as this will improve an agent’s decision-making and make the agent more likely to achieve its final goal. To the latter, all agents have a present instrumental reason to prevent alteration of its final goal, because it is more likely to realize this goal if it still values it in the future.

Bostrom synthesizes the two theories by warning that a superintelligent agent will not necessarily value human welfare, or acting morally, if it interferes with instrumental goals necessary for achieving its final goal.

See his full discussion here!

Coherent Extrapolated Volition (CEV)

Curious as to what makes AI “friendly”? Or how humans may attempt to define a goal for some relatively-omnipotent, future optimization process that does not lead to either “tiling the world with paper clips”, or destroying humanity as we know it?

Eliezer Yudkowsky seeks to answer these questions, and outlay a theoretical framework for defining friendly machine intelligence, through his idea of ‘Coherent Extrapolated Volition’ (CEV). CEV derives an abstract notion of humanity’s long-term intent for the world, and introduces terminology for discussing such ideas in the context of AI engineering.

Yudkowsky is also the founder of the rationality-focused discussion board LessWrong.

See his 2004 theory here!