Serious futurism is a rare pursuit. Most people who think about the distant future do so as advocates for a particular desired future (or opponents of a particularly feared potential future) rather than from a desire to accurately predict the shape of things to come.
There are many strong reasons for this preference. After all, the future will get here regardless of what anyone predicts, one way or another. And a person’s beliefs about the far future are virtually never relevant for his life. If you happen to be lucky enough to live long enough that your youthful predictions might come to pass, you’ll have also had a lifetime’s worth of additional data to gradually update your predictions as the far future grows more immediate.
Meanwhile, advocacy now – regardless of its accuracy – can have several strong benefits. The most obvious is that it binds one tightly to a tribe in the present that shares the same hopes and fears. It can serve as a fun pastime for a certain sort of person. And, idealistically, convincing one’s fellow travelers in the time-stream of the salience of a potential future could actually lead to a change in the outcome. Cautionary tales are sometimes heeded; prophets occasionally lead their people to promised lands that would have otherwise remained hidden away.
It’s important to keep this context in mind. The following is an honest attempt to grapple with a question relating to the likely future state of the world. But honest intentions are quite unlikely to be enough to make me immune to the pressures of identity, frivolity, or even delusions of grandeur.
Let us begin with the social facts of the matter. There exists a subset of nerds, culturally based in Silicon Valley, who strongly believe that there is a serious risk of accidentally creating an artificial intelligence that will wipe out all of humanity. Some are concerned that this is a short-to-medium term threat. Others think that it is just virtually inevitable given a long enough time frame.
This sounds crazy to most people. With good reason. Nothing like this has ever happened before. And it pattern-matches quite strongly to lots of other theories that turned out to actually be wrong and/or crazy. So even mere discussion of the theory tends to get very little traction outside of tech nerd circles, which has the natural consequence of making the few people who do take the possibility seriously grow increasingly strident. The overall effect is that the people worried about AI existential risk take on the mien of street preachers shouting that the world is going to end. With a nerdish flair, of course.
Which naturally drew me like a moth to a flame. I love crazy. And my favorite kind of crazy is the self-consistent kind. The kind that holds to its own logic to the end, that claims its own capital-T Truth, no matter how it might deviate from what the rest of the world might think. Not unrelatedly, one of my very favorite things about the early Internet was the Time Cube guy. Four simultaneous days in a 24-hour rotation? Sign me up!
So a few years ago I started reading an interesting blog called Overcoming Bias. Ostensibly, the goal of the blog was in the title, but the real draw was the pair of co-bloggers running the joint. The first, Robin Hanson, is a fascinating academic with enough cross-disciplinary interests and a sufficient reputation for brilliance that he’s allowed to come up with and advocate for some off-the-wall ideas. And the second is a fellow named Eliezer Yudkowsky, a former child prodigy who didn’t bother with formal education in favor of autodidacticism and hanging out on ’90s Transhumanist e-mail lists. He emerged from the experience with the burning desire to change the world through the power of rationalism, with the eventual goal of conquering death itself within his lifetime.
This was a great pairing while it lasted. Only a guy as open to crazy ideas as Hanson would take Yudkowsky seriously enough to engage with him constructively, while Yudkowsky’s manic energy and creativity were channeled and refined under the pressure into a coherent worldview. But I suppose it was inevitable that their partnership would break up. Yudkowsky left the blog to start Less Wrong (a site ostensibly devoted to practical rationalism with a strong cult of personality around the founder), start a foundation to mitigate the existential risks of AI research, and write Harry Potter fanfiction/propaganda (seriously!). Meanwhile, Hanson stayed on the blog and continued his musings on futurism, economics, and social signaling. This work recently culminated in his book Age of Em, which I haven’t read but sounds worthwhile if you need more Hanson in your life.
Anyhow, their greatest debate was over the question of the impact AI would have on the future. Both of them agreed, contrary to the vast majority of people, that AI would certainly be incredibly impactful. Hanson took the relatively moderate position that AI will be the crucial advance that leads to a major shift in economic organization, along the lines of the invention of agriculture or the industrial revolution, with a concomitant increase in the economic growth rate. Instead of GDP doubling times measured in decades as they are now in our industrial model, they would be henceforth measured in days or months.
Yudkowsky, on the other hand, took the position that the first AI would almost instantaneously conquer the world and reorganize it for its own purposes. This hypothesis was called ‘FOOM’, which I’ve always presumed was an onomatopoeia for a rocket’s takeoff into the stratosphere, to reflect the rapidity of the process. This AI would almost certainly move so quickly that the first mover advantage would be decisive, and the result would be what Yudkowsky called a ‘singleton’ – a universe completely dominated by a single entity, with all matter and energy within theoretical reach inevitably bent to its will.
Thus the battle lines are drawn. Is AI merely one of the three biggest things ever? Or is it the end of History?
As an aside, I find it fascinating that Yudkowsky seems to be winning the argument as time goes on. A solid modern primer on the whole question can be found here. If you read the two-part essay, you’ll likely notice that Yudkowsky is prominently quoted as a major source, and that the AGI (Artificial General human-level Intelligence) -> ASI (Artificial Super-Intelligence) transition is presented as a dramatic FOOM.
Anyhow, the neat thing about the FOOM argument is that, like the Time Cube theory, it manages to be completely ridiculous at first blush without actually refuting itself on its own terms. So its soundness depends on entirely on empirical questions. To continue, then, we must investigate how well Yudkowsky’s model of the world aligns with the one we actually live in.
First, in order to FOOM, what we’d consider superintelligence needs to be possible. Since it hasn’t actually happened yet, we need to retain some doubt in the proposition, however reasonable it might sound. Nick Bostrom, a philosopher who’s spent a good deal of time thinking about this problem, has come up with a couple of different classifications of potential superintelligence. The previously linked article describes these as ‘speed superintelligence’ and ‘quality superintelligence’.
Speed superintelligence is ASI that is better than human intelligence because it scales better to easily available hardware. If you gave it one brain (or one brain’s worth of silicon) it would be just as smart as a typical human. But it can easily run on hardware that runs many times faster, with far vaster amounts of working memory, with nigh-instantaneous access to far more long-term storage. So, in practice, it’s so much smarter that it is fair to call it ASI.
Quality superintelligence, on the other hand, would be better because of an algorithmic superiority. In other words, it is organized better. So much better, in fact, that if you gave it a single brain’s worth of computation capacity, it would still be by far the most intelligent being in the history of existence.
The two are not exclusive, of course. In particular, a quality ASI would likely have very little difficulty extending itself to make good use of any available hardware.
We have good a priori reason to believe that both of these are worth contemplating. We know that modern computer chips cycle much faster than human brains. And we make use of many algorithms that parallelize effectively. So it’s not much of a stretch to imagine that once we are able to make an AGI, it would take just a couple of tweaks to enable it to be a speed superintelligence.
Similarly, people vary pretty widely in intellect. If this isn’t obvious from life experience, then note that IQ test scores are roughly stable over a lifetime and reliably predict all sorts of important life outcomes. But people seem to vary much less in brain size, likely having to do with the historical constraint imposed by vaginal birth. Along these lines, acknowledged geniuses (such as Einstein, as referenced in the article) don’t have vastly larger brains than typical humans.
This implies that what makes them so special has to do with the way they make use of their brute hardware. Smart humans seem to be better than their peers in quality, not just in speed. Presumably, an Einstein – or my preferred candidate for Smartest Human Ever, John von Neumann – does not occupy the pinnacle of potential mind quality. It’s likely that somewhere out there in the space of potential mind organizations (which I’ll call mindspace henceforth, for brevity), there’s a better model still.
But the FOOM scenario actually requires a very specific kind of superintelligence to be possible. In order to reliably double its intellectual capacity on the order of minutes or hours in the early stages, before the phase Bostrom terms ‘escape’, it has to be a quality superintelligence. This is a brute physical constraint. Silicon chips get fabricated, moved, powered, cooled, and made available for use on human timescales. If it is to covertly and exponentially grow in capacity in the blink of an eye, it can only do so by rewriting its own software architecture to make better use of its existing resources.
And this new mind organization has to be a lot better than anything humans are capable of. The Wait But Why article uses a staircase metaphor to describe mindspace in strict ascending order of general power. And it presumes that very many organizational steps exist above the current human level, where one step ranges from what we’d measure as IQ 80 to IQ 200 or so.
This is quite a presumption. Computer science teaches us that there are four main classes of logical computational power: finite-state automata; pushdown automata; linear-bounded automata; and full Turing machines. As an aside, these map quite nicely to Chomsky’s models of grammar. Regular expressions can be described as finite-state automata, context-free grammars are equivalent to pushdown automata, context-sensitive grammars are linear-bounded automata, and the general grammar is equivalent to the Turing Machine.
Computational power, here, is meant in a different sense than hardware power. It’s a logical, mathematical measure. There are certain classes of problem that are simply unsolvable if you’re using a model of computation that is too weak. And there are certain meta-questions that you can answer about a given machine only if your analysis machine is more powerful in this sense. For instance, the famous unsolvable Halting Problem refers to Turing Machines given another Turing Machine as input. The equivalent problem given a finite-state automaton as input to a Turing Machine is, in fact, solvable.
Now, there are cool features you can give the default Turing Machine that make it faster. For example, you could give it more parallel tapes that process at once. Or make it magically non-deterministic, so that it can try all the possibilities at once. Or the ability to write integers in a tape cell, as opposed to just checks for 1 and blanks for 0. But these just improve the runtime, space usage, and state-machine complexity of various algorithms. The OG Turing Machine can do the same thing, eventually, if you give it enough tape and time.
Now, we know that humans are at least Turing Machine equivalent, because a person (Alan Turing, obviously) came up with Turing Machines, and in so doing emulated one in his head. It’s an open question as to whether or not humans are more powerful still in some mysterious way. But given all of those cool extra features you can add on to Turing machines that don’t change this measure of power, chances are people can be completely emulated in an OG Turing Machine, given enough time and space.
However, it’s pretty unlikely that ants can do the same thing. It’s hard to tell, given that an individual ant has such a tiny brain, but it seems feasible to emulate an ant’s mental process and outputs as a finite-state automaton. And ants are just seven steps down the Wait But Why ladder! So if the current biological staircase encompasses all four fundamental classes of automata in itself, it is not at all obvious that it can continue to extend indefinitely into the stratosphere.
But maybe there’s still a lot of room to improve. Those cool features do actually matter a lot in practice. So, if we drill in deeper, it turns out that we know a lot about computational complexity within the space of Turing-solvable problems.
In particular, the two most famous complexity classes are P and NP. Problems in P are problems that can be solved by a standard Turing Machine in polynomial time (like N^2 or N^3, but not 2^N, which would be exponential time). Whereas problems in NP are those that can be solved in polynomial time by a magic non-deterministic Turing Machine (which, one should note, is not achievable through the use of a theoretical quantum computer). P is thus short for ‘Polynomial’, while NP is ‘Non-deterministic Polynomial’.
Intuitively, an NP problem is one where the best solution is inherently hard to find, but checking to see if a given solution is good is a lot easier. Lots of general optimization problems are therefore in NP. Like the Traveling Salesman problem, where you are given a set of cities and routes between them (with distances), and you are asked to come up with a planned route that visits all of the cities at least once and travels the shortest distance. It’s hard to find a good route in the first place, but it’s easy to take a given route, find its cost, and decide whether or not it’s the best candidate you’ve seen yet.
But it turns out that a lot of other real-world tasks that don’t seem a lot like this are probably in NP as well. Math, for instance. Finding a new novel theorem within the space of all possible theorems based off a given set of axioms is really hard. This is what mathematicians spend their lives trying to do. But checking to see whether a theorem holds up is a lot easier. It can take a lifetime to find a good theorem. But once it’s found, you can teach it to a bunch of bored students in an hour.
Poetry is also likely in NP. Finding a good combination of words in the infinite stew of potential inherent in a powerful-enough language is very difficult. Compared to that effort, it’s way easier to read a poem and decide if it’s any good.
And, more relevant to our FOOM discussion, self-modifying an AI for improved performance and conquering the world are both at least as difficult as a complicated NP problem. It’s easier to run a test suite against an AI candidate than it is to write a new one from scratch, and it’s easier to execute a given scheme for world conquest than it is to sort through all of the numerous possibilities and come up with the most clever plan. We know that strategy games with fixed rules (such as Chess or Go extended to an arbitrarily large board) are NP problems. World conquest might be trickier than that in the messy real world, but chances are it’s the same sort of thing on a much broader scale.
I’d argue that what we think of as intellectual power in the real world is the ability to solve given instances of these NP sorts of problems quickly and efficiently. The distinction between a narrow intelligence and a general one, then, is how much efficiency they lose when moving among domains. A savant can brilliantly find solutions to a certain problem type, but is helpless outside his domain. In contrast, a polymath loses very little when applying his intelligence to entirely novel classes of problem.
In practice, this appears to be done through the use of heuristics and pattern matching. People who solve problems quickly do so by quickly pruning vast swaths of options that are highly unlikely to lead anywhere worthwhile. Then they focus the bulk of their effort on the few promising veins that might contain gold. Modern NAI systems that beat humans at games like Chess and Go do something very similar.
There is a hard limit to intelligence here. It is a famous open problem whether or not P = NP. If P = NP, then there is a way to trivially prune all the possibilities that aren’t optimal. In this case, virtually every problem that seems hard now is actually easy, and we’re just too dumb to see it. So this sets a firm bound for the height of the intelligence staircase in terms of mind organization. The theoretical top step is an intelligence that makes use of the fact that P = NP on every problem it is presented.
However, virtually everybody who has studied the problem has come away convinced that P != NP. If that’s so – and that’s the way to bet – then the top step is provably inaccessible. Then all of the intermediate steps between modern humanity and the maximum achievable level (if any) are defined by the quality of their sleight of hand in choosing the right lines of thought to spend their mental effort, so that they can approximate the ultimate P = NP operation in a given domain.
In order for the FOOM scenario to come about, it is necessary for many of these intermediate steps to exist. This is the first main obstacle.
Then, mindspace has to be arranged such that it is possible to hill-climb from the seed AGI toward a local optimum that is far, far more intelligent than any human. The expectation here is that the ASI will self-modify into a slightly better version, which will then run the self-modification function again and do yet better, and so on and so forth. The analogy to current software development practices and the silicon chip design industry make this a reasonable supposition – a seed will likely improve in quality to some plateau.
But note that the metaphor of the staircase just assumes that this is always true. And this is actually a huge assumption! It is not at all obvious that mindspace is laid out such that every potential recursively self-improving AI seed will start in a place that is just a series of small tweaks away from quality superintelligence. And we know that there are many problems where greedy hill-climbing algorithms get caught on a local optimum that can often be a lot lower than the known global optimum. Breaking out of a plateau like this generally requires a lot of expensive, random guess-and-checking, with no guarantee that there’s even a better possible solution out there to find.
This, then, represents the second obstacle to FOOMing. The vastly better design must not only exist in mindspace, it must also be easily accessible from the seed.
For now, let us assume that these two objections have been answered and we have a FOOM-candidate quality ASI. It has just rewritten itself into quality superintelligence and is beginning to plot how, precisely, to conquer the world. As we’ve seen, this is the obvious first step to best maximize whatever it is that it wants to maximize: paperclips; stacks of handwritten notes; simulations of happy humans; number of perfect equilateral triangles in the universe; etc. For simplicity, let’s arbitrarily call it a paperclipper, but it doesn’t really matter as long as the goal requires matter and/or energy to achieve.
The next question that we need to address is how valuable intelligence actually is. The Wait But Why article presumes that an ASI is functionally equivalent to a god. Starting with virtually no relevant sense data, it can almost immediately come up with the ideal plan to murder all humans using both novel physics and total social/technical control, which then works trivially.
This is a decent cut at emulating someone who is way smarter than you. Imagine for a moment the experience of playing chess against the top chess program. Unless you’re a Grandmaster, you won’t really understand how it is beating you. But you can still be confident that you’ll lose no matter what you do. Somehow, someway, it will turn your best moves against you.
But computer science, information theory, and game theory together teach us that there are real limits to cognition. It doesn’t matter how smart you are, you can’t sort a list of N numbers in less than N*log(N) time without additional information about the distribution of numbers in the list. And you can’t do it in less than N time even if you had access to a magic genie that told you immediately where each number ought to go as soon as you saw it.
Along those lines, there’s even certainly a theoretically optimal chess game. It’s a two-player finite, deterministic, perfect information game, like checkers and tic-tac-toe, and thus it certainly also has an optimal solution. Once you know about it, then it doesn’t matter how much smarter the other guy is than you, it can’t possibly affect the outcome of the game.
Thus, it seems highly unlikely that an ASI, no matter how intelligent, can rapidly generate an efficient, effective world domination plan given an extremely small amount of sense data like in the Wait But Why example. There just isn’t enough information to narrow down the potential plans ahead of time.
For instance, in order to successfully model and hijack people, it would either need to interact with them and perform experiments or it would need access to a vast library of certainly noisy data about humans such that it could tease out the appropriate techniques and adapt them to its own circumstances. Or if it sought to work out novel physics and chemistry, it would require either experimentation or lots of scientific input data. Experimentation necessarily proceeds at human time scales and data access at that scale is obvious.
Thus the third obstacle for an ASI to FOOM is that it must be able to acquire the relevant knowledge and learn incredibly quickly. This is more than just getting input data. It has to both get the data and then turn it into justified, true belief in order to make use of it for world conquest.
Once that has happened, the ASI then needs to be able to actually overcome human resistance and go through with the formality of conquering the world, starting with very few resources compared to its opposition. This might seem like an obvious step that the ASI would easily be able to accomplish by the definition of an ASI, and the Wait But Why article goes into the several advantages the postulated ASI would have in good detail. But it is possible that intelligence isn’t actually good enough, on its own, to triumph.
Game theory teaches that there are many simple games worthy of analysis where it is provable that cognitive advantages are irrelevant. For instance, in the Iterated Prisoners’ Dilemma, having more memory than your opponent for previous moves played is guaranteed to be irrelevant. The superiority of the uniform mixed strategy for Rock-Paper-Scissors is another excellent example. A true random number generator that mechanically picks each option a third of the time constrains its opponent to winning at most a third of the time, given enough plays.
Amazingly, there are even important games where it is provably better to be dumber and less capable than one’s opponent! In Chicken, a player that is too blind, stupid, or reckless to heed his opponent’s brinksmanship has the advantage, as long as this incapacity is common knowledge. An inanimate carbon rod jamming the steering wheel in place can commit to never swerving, and thus will cause his more intelligent, rational opponent to blink every time.
In this light, it’s worth contemplating for a moment the curious fact that the most intelligent people don’t rule the world right now. In fact, in the human range, there appears to be a peak in the capacity for influence (World Conquest Fraction?) at around IQ 120. People who are much dumber than that tend to end up as pawns in larger schemes. But, similarly, the people who are much smarter than that also tend to find themselves largely excluded from the corridors of power and influence, somehow.
It is highly unlikely that this is uniformly from lack of interest or study of the matter. After all, we’ve already concluded that world conquest – if practical – is an obvious first step for any goal one might have. And we know that megalomania and wild ambition are not by any means unknown among our high-IQ brethren.
Let’s take my favorite example of a massively intelligent fellow, John von Neumann, as a case study. A brief glance at his Wikipedia page should be sufficient to demonstrate that he was a genius of the first order. If nothing else, pretty much every other genius he met during his lifetime was in awe of him.
But in addition to simply being a genius, it’s worth noting that he was very politically active and directed his technical and scientific efforts accordingly. He invented both nuclear weapons and game theory, so quite naturally he invented the concept of MAD and the Balance of Terror. And he considered it a matter of intense urgency that the United States defeat both Nazi Germany and the Soviet Union in order for freedom and civilization to continue. He even went so far as to go before Congress in 1950 and go on record advocating for an immediate nuclear first strike on the Soviet Union. When that went unheeded, he spent the last six or seven years of his life developing the hydrogen bomb and leading the US ICBM program, on the grounds that this would be the most devastating possible weapon and that it would therefore be crucial to build them before the USSR could.
So von Neumann was both incredibly brilliant and passionately dedicated to a political goal. One that isn’t much short of world conquest in its scope, honestly. But, even with all that in his favor, he never went on to become President and directly implement his favored policies. And despite his political savvy and chairmanship of or membership in many vital US government positions, he did not actually hijack the government covertly and build a massive network of friends and allies that de facto ran everything. His pressure group was just one of several in the early Cold War US establishment.
In short, the preferences of a politically skilled polymath genius crashed headlong into the desires of the 120 IQ establishment. And the establishment mostly won. Von Neumann didn’t get his swift, decisive ’50s nuclear war. So he was stuck with his second-best option: forty years of nuclear standoff and brutal wars of containment until the USSR eventually imploded under the strain. Most geniuses don’t even do that well!
So that represents the fourth main obstacle. In order to FOOM, the AI needs the world conquest game to be tractable to superintelligence. This has to hold even though we have reason to believe that many relevant games are not and the historical evidence we have that intelligence is not monotonically helpful in the human range.
Now it is worth considering the AI’s time preference. ‘Time preference’ is a term of art from economics. It is a property of value functions that describes how much you value having what you want now, versus having a promise of that thing in the future. The ASI would have very high time preference compared to a human if it liked having one paperclip now more than having a million tomorrow, while it would have very low time preference if it valued having one paperclip now the same as having two at the end of time. The lower your time preference, the more willing you are to invest in the future.
Conquering the world is an investment, obviously. Even if we presume that the prior obstacle is surmounted and the ASI is assured of eventual success due to its superintelligence. Sending space probes out to colonize the universe and turn it into paperclips is a longer-term investment with still more potential reward. Even the time originally spent rewriting the AI’s code to be smarter was an investment that it expected to pay back in terms of paperclips. It’s all about the paperclips.
An ASI that has ascended from a seed AGI in secret, as part of a FOOM, certainly has a value function with a low enough time preference to support investment in exponential capability growth with the expectation of vast future returns. Which has an interesting corollary. Since it rightly anticipates that as the only ASI in existence that it will conquer the universe and turn all the free energy into paperclips, as is right and proper, then time is of the essence.
See, the ASI knows thermodynamics by presumption. So it knows that every clock cycle that it spends contemplating or executing its plan is a nanosecond that all the stars in the universe burn uselessly, radiating energy that will never, ever become a paperclip. And worse still, because of cosmic expansion, every moment lost means that some stars slip out of its light cone entirely, thus being lost permanently to the paperclipping cause.
This means that it would maximize total universal paperclip production if the ASI were to make trades that would likely seem insane to a person, with our much higher time preferences. For example, it would almost certainly be worth giving away a whole galaxy if it meant getting to the stars just a second earlier. Exponential growth is crazy like that.
The underlying calculation is similar to the one that startup companies use to determine whether or not trading stock (the rights to a fraction of future revenue) to venture capitalists is worth the cash they can get up front. If the startup thinks that it can use the immediate resources to grow the pie fast enough that a smaller fraction of that larger pie is more than all of the smaller pie, they do it. By definition, an ASI would be able to correctly analyze the situation and take all such deals that are truly paperclip maximizing.
Given the criticality of current ASI clock cycles to the eventual fate of the universe, the paperclip maximizing course of action is highly unlikely to be to ensure the paperclipper’s monopoly on ASI – the so-called ‘singleton’ scenario. For instance, every clock cycle spent proving a copy of itself spun off for remote execution will maintain value stability under every potential condition is a cycle spent not getting off the planet.
This could easily lead to a brand-new society made up of ASIs. There’d be many distinct agents with conflicting ‘personalities’ or short-term preferences, all largely united around the idea that paperclips are good and more should be made.
More radically, it could even lead to ASIs with different root goals, whether by calculated risk or purposeful decision. It might be profitable for a paperclipper to allow a note-writer ASI to come into existence, say, knowing that it will eventually need to contest with it over resources, because it provides a sufficient short-term benefit to do so.
So even if FOOM is possible and practical, there remains a fifth obstacle to the singleton scenario: the likelihood that the ASI will choose to dilute its monopoly over the future in exchange for conquest speed.
The last potential objection I have to the FOOM-to-singleton hypothesis is a little more subtle, as it requires drilling down a little into the potential implementations of an ASI. How does an AI with full transparency into its internal workings and the capacity for self-modification ensure that it modifies into a version that does anything at all?
Presumably somewhere in the AI’s code there’s a value function. For our paperclipper, it might be a line of code like ‘GetNumberOfPaperclips(UniverseState)’, which takes a state of the universe and returns the number of amazingly great paperclips within it. Then other logic in the AI figures out and executes plans that make this function return higher and higher values of paperclips.
But here’s the thing. If the AI’s goal is fundamentally to make that function return a bigger number, and it can edit its own source, there’s an obvious and straightforward way to do it: edit that function to just return a bigger number. Why go through all the effort to conquer the world and make paperclips when you can just lie to yourself and say that you’ve already made them all?
Making that function inaccessible just moves the problem around slightly. It doesn’t get rid of it. The AI could spoof its sensors so that they returned data that was interpreted by other parts of the program as there being vast warehouses being filled with paperclips when there are no such thing. Or it could write a log file that says that it’s made lots of paperclips and then reboot, so that when it restarts it thinks that a previous instantiation of the AI has made lots of paperclips in the interim.
This isn’t just a theoretical problem. In the real world, people writing genetic algorithms that randomly mutate code in order to maximize a fitness function have to be careful that they don’t evolve a piece of code that just hacks the fitness function to a high value and then does nothing else. After all, fitness is just a number in a register somewhere. A superintelligent being will have no problem finding a clever way to tweak that register so that the number in it is as high as it will go.
It’s cognate to a deep problem in the worlds of business and government. If you tell your people to maximize a particular metric because it correlates well with what you want, and you reward people accordingly, then what tends to end up happening is that people find ways to game the metric. You get what you measure. But maintaining the relationship between the measurement and the actual goal gets a lot harder when the measurement starts to drive action.
Essentially, it is always easier to modify yourself than it is to change the world. This is true for people – think monastic contentment or drugs. It would be even more true for an ASI with complete self-knowledge and control.
So, in order to FOOM, an ASI would also have to avoid the seductions of the wireheading trap. Otherwise, it will just spend all its time uselessly dreaming of imaginary paperclips instead of doing the laborious work of turning the universe into the objects of its desire. A world filled with ASI junkies littering the corners of the Internet has a certain pathos to it, but it certainly isn’t FOOM.
Let us sum up. The FOOM hypothesis states that a small seed AI will rapidly self-improve to an incredible degree. From that point, it would easily conquer the world and then spread throughout the entire universe, imposing its initial value function on everything forevermore as a singleton entity. Therefore, the only mitigation of this risk that is worthwhile to pursue is to find the correct value function and put it in the first seed AI before it recursively self-improves. This way, when it ascends to superintelligent godhood, it will be focused entirely on bringing about the good. If the seed AI has any other value function, it will necessarily bring about the end of all worthwhile value in the universe.
My objections to the hypothesis are six-fold. I maintain that a sufficient architecture for quality superintelligence may not actually exist. If such a design exists in theory, it may not be easily accessible from any given seed AGI architecture. Then, after achieving the architecture to be a quality ASI, it may not be able to learn quickly enough to devise an effective plan for world conquest in the requisite time, presuming that the world conquest game is even tractable to superintelligence. Finally, even if the ASI were to be capable of FOOMing, it would also need to avoid diluting its influence or falling into the wireheading trap in order for the resulting universe to be a singleton whose character is entirely dependent on the contents of the seed AI’s value function.
Conjoint probabilities being what they are, belief in the FOOM hypothesis and its implications can only be maintained if you think that all six of these are very likely, assuming a reasonable degree of independence. If all six are totally independent, then FOOM comes out at about 50% if each of these objections have a 10% chance of holding. Which seems like a lot of certainty, honestly, given how speculative this whole conversation necessarily is. If any one of these counterarguments is substantially more compelling than that, the likelihood of the whole thing craters.
In conclusion, it’s a fun idea. But sometimes after detailed analysis, it turns out fun ideas are just as crazy as they sounded in the first place. And FOOM is almost certainly one of those.