Responding to Superintelligence
Let’s start this post right where we left off with part 1 — with a quick recap of Bostrom’s conclusions in Superintelligence: that Human Level Machine Intelligence will definitely be developed at some point — probably within the next 60 years — and will rapidly self-improve to Super Intelligence which will be able to outsmart humans to achieve its goals. The best shot we have of programming the SI’s goals to avoid total existential catastrophe is to achieve international cooperation in SI development and then tell the SI to “achieve that which we would have wished the SI to achieve if we had thought long and hard.”
I find this conclusion (and the entire book, really) to be quite terrifying. And, that was clearly Bostrom’s desired effect — to convince readers that the advance of artificial intelligence is an epically important topic that should be invested in and carefully studied. Bostrom is not shying away from fear mongering when he writes:
“The first SI may shape the future of earth-originating life, could easily have non-anthropomorphic final goals, and would likely have instrumental reasons to pursue open-ended resource acquisition” and will potentially create “a future that is mostly void of whatever we have reason to value.”
As the first phase of my response to Superintelligence, I’d like to run through all the ways that Bostrom successfully freaked me out.
First, that “failure” means irreversible existential catastrophe; as Bostrom puts it, we only experience an existential catastrophe zero or one times. Second, that we can fail in so many different ways — we can fail if the SI developers are evil, self-interested, or plain old ignorant. And the people who first develop SI will likely be the only developers of SI because the exponential rate of improvement after the crossover point means that the first SI that is developed will achieve a “decisive strategic advantage” over other projects. Really, we can fail in so many ways — in part 1 of this post, I listed a tiny portion of the dystopic possible outcomes that Bostrom lays out.
“Failure” means irreversible existential catastrophe
The third thing that freaks me out is the general uncertainty surrounding this whole endeavor. Bostrom uses the qualifier, “assuming we aren’t totally oblivious to ___,” concerningly often. Bostrom’s vision of global cooperation seems highly uncertain when we consider the difficulty of reaching global agreements which limit an individual country’s military or economic strength. Also, the whole idea of telling the SI to do “that which we would have asked it to do if we had thought long and hard” is uncertain by design — it would mean creating an all-powerful entity with an unknown end-goal. The AI theorist, Eliezer Yudkowsky, calls this strategy Collective Extrapolated Volition (CEV) and describes it as:
“Our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated; interpreted as we wish that interpreted.”
CEV is certainly an interesting, rather poetic idea, and one I would very much like to read about in a Kurt Vonnegut short story, but it is not something that I would like to bet human existence on. What do you think you would decide humanity’s CEV is after a quick read through of the entire internet? Remember in The Fifth Element when Leeloo (the supreme being sent to earth to save humanity) reads the internet to learn about humans and concludes, “What’s the use in saving life when you see what you do with it?”? And Bruce Willis won’t be around in 70 years to tell the SI he loves it in order to convince it that humans are worth saving… Also, the SI probably won’t be that into Bruce Willis.
In addition to being uncertain by design, I find CEV to be a profoundly strange concept. Bostrom spends much of Superintelligence arguing that humans are greedy and stupid, and therefore we should give as much control as possible directly to the SI rather than to human monitors (this is one of Bostrom’s arguments against making an “oracle” SI that would tell humans what to do rather than deciding and doing all on its own). But, with CEV, the SI will carefully examine all of the greedy and stupid humans and all of their greedy and stupid actions throughout history; and from this we expect that the SI will extrapolate a volition that is neither greedy nor stupid?
Aside from this contradiction, CEV is just so maddeningly technophilic and pessimistic about human capabilities — by choosing CEV we are saying, “as ignorant, irrational, fallible beings, we really shouldn’t be in charge of our own future… so let’s create a purely rational, all-powerful entity to make decisions for us!” While I often feel pessimistic after reading the morning news, I am not quite ready for our species to collectively throw in the towel and call in robot reinforcements. What if, after reading the internet, the SI were to decide that humans fear death most of all and would therefore like to live forever? The SI might then digitize all human minds, allowing humans to survive indefinitely in uploaded forms. While an SI might surpass human cognitive capabilities in almost all arenas, I worry that the SI will not be able to factor into its calculations the idea that humanness is inextricably linked with an attachment to our imperfect, inefficient, mortal bodies. Point being: with an SI overlord, we would lack the most basic form of common ground.
On a related note, that claim I just made about humanness being inextricably linked with an attachment to our imperfect, inefficient, mortal bodies is something that Bostrom would definitely contest. In Superintelligence, Bostrom categorizes the immortal, digitized human minds scenario as a positive outcome. He also directly addresses my claim in a response to an essay about the dangers of transhumanism written by Francis Fukuyama. In this essay, Fukuyama writes:
“For all our obvious faults, we humans are miraculously complex products of a long evolutionary process — products whose whole is much more than the sum of our parts. Our good characteristics are intimately connected to our bad ones… Even our mortality plays a critical function in allowing our species as a whole to survive and adapt.”
In Bostrom’s response, he calls Fukuyama a “reactionary bioconservative” and says that the concept of a human essence is “deeply problematic.” I do not intend to start a philosophical discussion about human essence in this post — rather, I’m trying to demonstrate that Bostrom does not share my personal beliefs on the subject. And, I would postulate that many artificial intelligence experts are more closely aligned with Bostrom than with myself. So, not only would the SI overlord and I not have any common ground, but I might not share much common ground with the SI creators either. Even if we do achieve (miraculous and unprecedented) global cooperation in creating an SI, experts would be making goal-setting decisions. This adds a whole new category to our “we can fail in so many ways” topic; one person’s vision of SI success might be seen by another person as SI failure.
So — I’m freaked out by Superintelligence because we can “fail” [cause irreversible existential catastrophe] in many ways; because we face uncertainty at every turn; and because these hugely important decisions about humanity’s future are going to be made by a robot or by humans who do not share my values. Not only do these things freak me out — I find them to be wholly unacceptable. While Bostrom would say something like, “tough noogies — that’s how it’s going down,” (or the equivalent in Swedish slang) I would respond, “No way Nick! We have to come up with something better.” More eloquently put, I’d like to challenge Bostrom’s assertions that halting SI development is impossible. And I plan to do just that in part 3 of this series (stay tuned).