Coherent Extrapolated Volition (CEV)

Curious as to what makes AI “friendly”? Or how humans may attempt to define a goal for some relatively-omnipotent, future optimization process that does not lead to either “tiling the world with paper clips”, or destroying humanity as we know it?

Eliezer Yudkowsky seeks to answer these questions, and outlay a theoretical framework for defining friendly machine intelligence, through his idea of ‘Coherent Extrapolated Volition’ (CEV). CEV derives an abstract notion of humanity’s long-term intent for the world, and introduces terminology for discussing such ideas in the context of AI engineering.

Yudkowsky is also the founder of the rationality-focused discussion board LessWrong.

See his 2004 theory here!

2 thoughts on “Coherent Extrapolated Volition (CEV)

  1. This blog is absolutely phenomenal, and deserves much more attention. For Coherent Extrapolated Volition, however, I feel Yudkowsky’s theory falls short in explaining how rationality can establish a temporally and contextually coherent set of values. Today’s average worker is particularly concerned with paying debts and planning his/her retirement. But the same person would have different goals if he/she were born to a wealthy family with a trust fund, in absolute poverty, 500 years ago, or 500 years from now. It’s difficult to conclude that there simply exists a single set of values that we naturally adhere to, so an AI would have to apply an artificial set of values in order to be coherent.

    Like

    1. Hello! Apologies for my long-delayed reply. It has been a remarkable few months, but happy to be back to RV 🙂

      As per your note, several approaches to human-safe AI suggest designing intelligent machines to share human values so that actions we dislike, such as taking resources from humans, violate the AI’s motivations. However it is almost certainly impossible for humans to accurately write down a coherent and consistent set of values.

      To solve our own inability to formulate universal goals, an ASI could process massive amounts of observational data on human interactions with ASIs, then algorithmically derive a best means of interacting in society (without killing us all!). But this creates a bit of a chicken-and-the-egg scenario. How can an ASI process + model human interactions with an ASI before we allow one into society? Bill Hibbard has some nice thoughts on initial sandboxing here.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s