The Sorcerer’s Apprentice

The crux of the AI Alignment (safety) movement is based upon two ideas:

The orthogonality of intelligence and motivation in artificially intelligent systems
- I.e. “Intelligent” systems do not necessarily have human-like motivations
Instrumental convergence within autonomously-pursued goal systems
- I.e. there exists an intermediary set of goals that any artificially intelligent system would almost certainly pursue, no matter their explicitly set “final” goal. These typically include objectives like goal-content integrity and cognitive enhancement

These principles were first proposed by Nick Bostrom here.

Sound bites (and recent articles in New York Times/ Tech Crunch) do a poor job of communicating these principles. The concern is not that a conscious AI system will rise up and “take revenge” on humans. Instead, it’s that an advanced and sufficiently capable AI system will pursue a human-specified goal, and by doing so, will inadvertently damage systems/ entities/ institutions that humans value.

The Sorcerer’s Apprentice (illustrated in the Disney movie Fantasia) provides a useful analogy:

The poem begins as an old sorcerer departs his workshop, leaving his apprentice with chores to perform. Tired of fetching water by pail, the apprentice enchants a broom to do the work for him – using magic in which he is not yet fully trained. The broom performs the chores as enchanted and fills the sorcerer’s cauldron with water. However, the apprentice soon realizes that the broom has obeyed only too well. The broom continues to bring buckets and the floor becomes awash with water. The apprentice realizes that he cannot stop the broom because he does not know how.

In a moment of desperation, the apprentice splits the broom in two with an axe – but each of the pieces becomes a whole new broom that takes up a pail and continues fetching water, now at twice the speed. The broom continues to bring water and fills the room until all seems lost. In the final moments, the old sorcerer returns and quickly breaks the spell. The poem finishes with the old sorcerer’s statement that such powerful spirits should only be called by those that have mastered them.

In this story, the apprentice under-specifies the goal he wishes his enchanted broom to pursue, not realizing that the broom:

Does not care about filling the room with water. It only cares about increasing the chances that the bucket is filled
Does not wish to be shut off as that would reduce its ability to ensure that the bucket is filled

Eliezer Yudkowsky’s paperclip maximizer provides another illustration of these principles.

*”Consciousness” in AI systems is a philosophical question and is not requisite for large-scale, negative impact from AI systems.

Rationality's Volition

A Collection of Discussions on Theory, Intent, and Impact

The Sorcerer’s Apprentice

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply