This is my submission to the EA Criticism contest.

Enlightenment Values in a Vulnerable World

Introduction:

The Vulnerable World Hypothesis: If technological development continues then a set of capabilities will at some point be attained that make the devastation of civilization extremely likely, unless civilization sufficiently exits the semianarchic default condition.

The Vulnerable World Hypothesis (VWH) is an influential 2019 paper by philosopher Nick Bostrom. It begins with a metaphor for technological progress: An urn full of balls representing technologies of varying degrees of danger and reward. A white ball is a technology which powerfully increases human welfare, while a black ball is one which “by default destroys the civilization that invents it.” Bostrom stipulates that “the term ‘civilizational devastation’ in the VWH refers to any destructive event that is at least as bad as the death of 15 percent of the world population or a reduction of global GDP by > 50 percent lasting for more than a decade.” Given the dire consequences of such a technology, Bostrom argues for enlarged state capacity, especially in terms of global reach and surveillance, to prevent the devastating technology from being invented.

The VWH is a wet blanket thrown over Enlightenment values; values which are popular with many EAs and among thinkers associated with progress studies such as David Deutsch, Steven Pinker, and Tyler Cowen. These Enlightenment values can be summarized as: political liberty, technological progress, and political liberty ⇒ technological progress. Even if technology has a highly positive expected value on human welfare this can be easily outweighed by a small chance of catastrophic or existential risk. The value of political liberty is often tied to its promotion of technological progress. Large risks from technological progress would therefore confer large risks on political liberty. Bostrom highlights this connection but goes further. Not only is political liberty dangerous because of its facilitation of catastrophic technological risk, but strict political control is good (or at least better than you thought it was before) because it is necessary to prevent these risks. In response to a black ball technology Bostrom says that “It would be unacceptable if even a single state fails to put in place the machinery necessary for continuous surveillance and control of its citizens.” If Bostrom is right that even a small credence in the VWH requires continuously controlling and surveilling everyone on earth, then Enlightenment values should be rejected in the face of existential risk.

We do not know whether the VWH is true, and it is undecidable via statistical analysis until we draw a black ball or empty the urn. Thus, I consider the implications of the VWH for Enlightenment values both when it is false and when it is true. . If it is false then traditional arguments for Enlightenment values become even stronger. If the VWH is true I find that one can still reasonably believe that unconstrained technological progress and political liberty are important moral goods as both ends and means as long as some properties of the urn are satisfied. Even if these properties are not necessarily satisfied, I show that Bostrom’s proposed solution of empowering a global government likely increases existential risk overall.

Part 1: Outcomes Conditional on VWH Truth Value

VWH Is False

First, we can quickly consider what we should do if we knew that the VWH was false. In this case, the arguments made by progress studies in support of the set of Enlightenment values: political liberty, technological progress and political liberty ⇒ technological progress, are proved stronger. Since we know that there are no black balls in the urn, we can be confident that the (highly positive) sample mean of the effect of technology on human welfare is close to its true effect, and there is no future risk of ruin that will greatly upset this mean. There may still be other objections like effects on an inherently valuable environment, inequality, or doubts of the connection between political liberty and technological progress, but most people reading this are likely very positive about the effects of technological progress and political liberty except for their facilitation of catastrophic risks. If anthropogenic x-risk concerns are ameliorated then Enlightenment values look better than ever as tools for advancing human welfare.

VWH Is True

If VWH is true, then its implications depend on how we interpret the urn model. Bostrom suggests several different interpretations throughout the paper. There is the standard urn model where one ball is drawn at a time and the colors of the balls are independent and random. But this model is obviously an unrealistic description of technological progress. Independence means there is no room for technology to ameliorate future existential risks since previous draws do not affect future ones, but this contradicts Bostrom’s escape hatch of ‘exiting the semi anarchic condition.’ Complete randomness assumes that we have no knowledge about what the risk of a technology might be before it is actually invented.

An important clarification Bostrom makes is that “We can speak of vulnerabilities opening and closing. In the ‘easy nukes’ scenario, the period of vulnerability begins when the easy way of producing nuclear explosions is discovered. It ends when some level of technology is attained that makes it reasonably affordable to stop nuclear explosions from causing unacceptable damage.” This implies that the color of balls in the urn, i.e the risk from technologies, is not constant or independent of the balls which come before it. If the technology which abates nuclear risks came before easy nukes, then that ball would have changed color and no vulnerability would have opened. Additionally, “the metaphor would also become more realistic if we imagine that there is not just one hand daintily exploring the urn: instead, picture a throng of scuffling prospectors reaching in their arms in hopes of gold and glory, and citations.”

Wide Progress: Technological Antidotes

For the next two subsections, we’ll model the color of pulls from the urn as random, but not independent. That is, some technologies can change the risks of others but we don’t know what the risk of a technology will be before we invent it. Another basic assumption is that ‘technological maturity,’ i.e the inevitable topping out of our exponential growth into an S-curve, is desirable and stable, but the path there may be dangerous.

Another way to encode this assumption is: If we discovered all possible technologies at once (which in Bostrom’s wide definition of technology in the VWH paper includes ideas about coordination and insight), we would be in the safe region. It is only that certain orderings of tech progress are dangerous, not that some technologies are incompatible with civilization in all contexts.

Allowing some technologies to change the risks posed by others makes the model more realistic. Bostrom claims that his global surveillance solution to anthropogenic risks is a one-size-fits all antidote, but in fact dangerous technologies admit a range of antidotes. For example, bio-terrorism may be solved with strict state surveillance over labs and inputs, but it would also be solved by sufficiently cheap and effective vaccines, improved PPE, or genetically engineered improvements to our immune system. Bostrom suggests avoiding collapse from ‘easy nukes’ by having the state requisition all batteries, but we could also use advanced materials to build explosion resistant buildings or use easy nukes to power vehicles which allow us to live very spread out, lessening the impact of nuclear explosions. Even technologies which do not obviously disarm black balls can be antidotes by increasing our wealth enough to make safety investments affordable.

In general, we’d like there to be at least one injective function from black balls to white ones. That is, there exists some pairing of technologies such that every dangerous invention has an antidote, and there are at least as many antidotes as black balls. Given the immense power and general purpose of technology and a reasonable upper bound on the ratio of black to white balls at ~1 in 500 million, it seems almost certain that each black ball could find at least one antidote without any repeats. If you believe that technological maturity is stable, i.e there is a safe region as in Bostrom’s graph above, then it must be that all risky technologies are disarmed by some future technology. If we have at least one injective pairing, then the implications of the VWH shift towards the unconstrained progress promoted by Enlightenment values.

The limiting case of this danger-antidote relationship is an urn with two balls: one black, one white, representing a choice between extinction or technological ascendency. The white one is an antidote to the black one. Drawing nothing means stagnation until the earth is destroyed by natural processes. Let’s normalize the human value per century in this scenario to 1. Then this world has a value equal to the number of centuries humanity manages to survive on earth without any extra technology. The world where both balls are drawn is either empty or full of an astronomical number of human lives. Even in the worst case scenario where technology contributes nothing to human welfare except avoiding extinction (i.e the value of the technologically mature world is also 1), the expected value from drawing both balls (½ + ½ + …) still exceeds the finite stagnation world after a finite number of centuries.

In a world where every risk has at least one antidote, maximizing the number of “scuffling prospectors” pulling balls at once is desirable. This is intuitive in the 2-ball limiting case. If you can pull both balls at once then there’s no chance of an unalloyed existential risk. In general, pulling multiple balls at once decreases risk. To avoid black balls while pulling just one ball at a time, you need the antidote to show up before the black ball every time. But when you pull two balls at once, you have all the same chances for the antidote to show up before the black ball, plus the probability that the antidote and black ball are pulled at the same time. This additional probability is increasing in the number of balls per pull as long as each black ball has at least one antidote.

Fast Progress: Windows of Vulnerability

Bostrom models ‘windows of vulnerability’ from black ball technologies as opening when the ball is first pulled, and closing when some future ball makes us resistant to the black ball either by directly countering it with protective technology or by increasing wealth enough to make palliative safety investment affordable. We saw above that widening progress by increasing the number of people pulling balls from the urn decreases the probability that these windows ever open. If pulling a black ball before an antidote means certain destruction then this is the best we can do. But if we have some window after discovering a black ball to still get a technological solution in time, then decreasing the time between pulls from the urn can decrease risk.

If we keep increasing the pace of tech progress, then these windows of vulnerability will keep shrinking. Accelerating development is a form of differential development. Acceleration speeds up the arrival of late technologies more than close ones. This decreases risk because any antidotes that come before or simultaneously to black balls stay that way, and any antidotes that come after get accelerated further than the black balls which precede them, making it more likely that we could survive long enough to herald their arrival.

If we want to have a chance at the long-lasting and space-fearing future civilization which makes existential risk such an important consideration, we’ll need to greatly increase our technological ability. Doing this slowly, one ball at a time, just means less chance at pulling antidote technologies in time to disable black ball risks. For example, terraforming technology which allows small groups of humans to make changes to a planet’s atmosphere and geography may increase existential risk until space-settling technology puts people on many planets. If terraforming technology typically precedes space-settling then accelerating the pace of progress reduces risk. Enlightenment values enable wide and fast progress. Wide and fast progress can decrease risk from random draws from an urn with at least as many antidotes as risks. So Enlightenment values can decrease risk.

Differential Development

So far we’ve been assuming that the color of the ball we pull is completely random, but the best reason to slow technological progress is if decreasing the width or speed increases our ability to choose whether the next pull from the urn will be black or white. To accommodate differential technological development, we have to have some sense of what color a ball might be before we draw it. “We could stipulate, for example, that the balls have different textures and that there is a correlation between texture and color, so that we get clues about the color of a ball before we extract it.” This seems plausible at least for the most proximate impacts of technologies. As Bostrom puts it “don’t work on laser isotope separation, don’t work on bioweapons, and don’t develop forms of geoengineering that would empower random individuals to unilaterally make drastic alterations to the Earth’s climate.”

But the impacts of a technology further in the future quickly become radically uncertain. If nuclear war or AGI kill billions then quantum mechanics or Von Neumann architecture will have turned out to be a black ball, but no one would have predicted that at the time. It’s not clear how this should affect our treatment of current explorations in math or physics either. Additionally, even technologies that we are confident are dangerous can have net positive effects on existential risk by mitigating other risks. For example, climate control has potential dangers but also advantages in ameliorating damage from climate change, supervolcanoes, and nuclear winter. Another well-known phenomenon is when research towards a technological goal produces unexpected and impactful spinoff discoveries. For example, research on cheaper ways to manufacture vaccines may also be used to make pathogens easier to produce.

There is still room for differential development after a ball is pulled however. We’d be better off under random draws if we could set aside black balls or at least slow their roll until their antidotes were also discovered or developed. We could also try to speed up the development of the antidote technology. These strategies rely on the regulatory mechanism having consistently high accuracy and precision. If the mechanism is bad at picking black balls from white ones, then it will often end up slowing antidotes and speeding risks, washing out its overall effect. Even if the mechanism is an unbiased estimator of whether a technology will be disastrous, if the enforcement is imprecise, the effect would similarly be diluted. For example, if nuclear or bio weapons regulation also slows down nuclear power or biosafety research then we may be losing out on as much progress towards antidotes as we are gaining time until black balls. The idea here is like the wide progress model in reverse. Imagine differential regulation as choosing a group of balls to set aside rather than letting them develop. Imprecise regulation sets aside a large handful of balls rather than just one. Wider handfuls make it more likely that one or more antidote technologies are also set aside.

Further on this point, regulatory self-selection can lead enforcement which is centered on black balls to have even greater negative effects on surrounding white ball technologies than on the black ball itself. For example, laser isotope enrichment, a way to more cheaply enrich uranium, can be used for both nuclear power and nuclear weapons. If countries like the US or even international organizations like the UN ban this method on proliferation concerns, the people most likely to follow these rules are those using the method for peaceful nuclear power, like GE or Hitachi. The most dangerous uses of the technology will be much less affected because they take place in recalcitrant nations like North Korea or Iran. This can create a sort of Simpson’s paradox where differential development of nuclear technology relative to less dangerous fields is internally composed of differential rules that promote the most dangerous uses of a technology relative to their constructive ones. Even if you can devise a mechanism which accurately determines and precisely enforces differential development, protecting it from regulatory capture is a serious challenge.

The above reasoning applies to decisions about differential development on a societal level enforced by a government. The point of Enlightenment values, however, is that researchers, entrepreneurs, and philanthropic organizations get to make decisions about what to pursue themselves. Specialization ensures that you can’t contribute a little bit to all fields equally, so everyone has to choose something to differentially develop even with this uncertainty. The ethical intuitionism of not working on things which seem obviously dangerous combined with considerations for comparative advantage is unobjectionable. The research and advocacy done by many EA organizations which pushes people to work on highly impactful good technologies rather than potentially dangerous ones is consistent with Enlightenment values even though it is not the random technological progress we modeled above. That model only represents progress at the most aggregate and abstract level. Concrete individual decisions about which frontiers to push forwards are not random. Trying to pick good frontiers from bad is almost certainly good. But one need not feel paralyzed by uncertainty since even random choices decrease risk in the aggregate.

Differential development enforced by an error-prone and inflexible government is dangerous. Correctly predicting the impacts of R&D is difficult. Getting it wrong on the global level is much higher stakes than on the individual level. State enforced development mistakes are likely to be locked in for long periods of time and since the state is guiding technological progress, there aren’t other organizations who can mitigate their mistake by researching antidotes. Differential development on an individual level is unavoidable. Your views on the tractability of predicting the impact of your work determine the effort you should invest into picking the right field, but even random progress pushes humanity forward.

Part 2: Is Bostrom’s Plan Even An Antidote?

The above discussion of wide and fast progress maximizing the probability of discovering technological antidotes is moot if Bostrom’s plan is the one-size-fits-all antidote that he says it is. If his global surveillance state truly offers robust protection against any future black balls then even if every black ball has several possible antidotes, it’s probably not worth rolling the dice on discovering them in time with unconstrained technological progress. It would be worth sacrificing Enlightenment values for sure-thing protection against future catastrophic risk. But Bostrom’s plan is not likely to provide robust protection against future black balls, and it would plausibly cause a net increase in existential risk.

Bostrom does not want to rely on luck to develop technological antidotes in time to prevent catastrophic damage from black balls. He reasons that since it is possible that a technology will exist such that even a few anonymous actors are enough to cause massive destruction, the state’s capacity for surveillance and policing has to be near absolute. And since inter-state competition is the source of many existential risks and catastrophic events, the surveillance state must be global in reach. His most detailed proposal, “The High Tech Panopticon,” consists of everyone on earth being fitted with a “freedom tag” that constantly records video and audio of everything you do. This data is automatically scanned for any criminal or suspicious activity by an AI. “Other extreme measures that could be attempted in the absence of a fully universal monitoring system might include adopting a policy of preemptive incarceration, say whenever some set of unreliable indicators suggest a greater than 1% probability that some individual will attempt a city-destroying act or worse.” Any indication of dangerous research or nefarious plans would dispatch an armed police unit to imprison or kill the perpetrator.

Bostrom acknowledges that this proposal is extreme, and he does not claim that his plan would be desirable all-things-considered. Rather, he says that his model “provides a pro tanto reason to support strengthening surveillance capabilities and preventive policing systems and for favoring a global governance regime that is capable of decisive action.” I will not object to Bostrom’s plan in an all-things-considered sense because the drawbacks of a global surveillance state to human welfare outside of existential risk considerations are already clear. Instead, I will show that Bostrom’s plan for global governance would not decrease existential risk as long as we have a reasonable model for the incentive structures within such a state.

Global totalitarianism is its own existential risk

The only anthropogenic events that have come close to fulfilling Bostrom’s definition of ‘civilization destruction’ have been perpetrated by states. The conquests of the Mongol Empire may have killed more than 10% of the world’s population. The Thirty Years War killed nearly half of the Holy Roman Empire’s population. The Khmer Rouge executed nearly 20% of their own population. Communist China and the Soviet Union probably killed upwards of 100 million people combined.

None of these quite satisfy Bostrom’s 15% of global population threshold, but they are the closest we’ve come to it so far. Some of them arose out of interstate conflict which would, at least nominally, be avoided with a global state. But many of the deadliest events in human history have been states murdering people, quashing economic development, or causing famines, within their own borders. Establishing a global state with the police powers that Bostrom recommends would facilitate these catastrophic state-led massacres and incentivize state-enforced stagnation.

Dealing with a powerful global surveillance state is not unlike dealing with a powerful AI. Problems of alignment and instrumental convergence come immediately to the fore.

Instrumental convergence

To fulfill Bostrom’s mandate of extremely effective preventative policing and global authority, a government must first maintain power. Bostrom bit the bullet in recommending that this surveillance state quash any potentially dangerous technological development but to ensure that it will protect humanity from itself, the state must also seek out and destroy dissent. Given the draconian control that this state imposes on everyone, it seems likely that at least 15% of the population would strongly resent and resist this government. Executing dissenters could easily be a catastrophic risk in itself. Beyond directly killing or imprisoning anyone who might try to disobey the state, the government will likely find that scapegoating certain groups is a good way to justify their power, shift blame for stagnant or deteriorating

economic conditions, and maintain stability. This is an established strategy not only of totalitarian states, but also of democracies, and human groups in general. Crushing dissent and providing enemies to rally around are principal occupations of any state which wishes to maintain the level of control that Bostrom recommends. These are violent and costly processes which represent a serious catastrophic risk to humanity. And unlike many of Bostrom’s other examples of possible catastrophic risks, states have demonstrated their capacity for murder, imprisonment, and oppression on massive scales multiple times throughout history.

In addition to the direct catastrophic risk of recurring genocides, a global surveillance state would over-enforce technological stagnation relative to it’s mandate of preventing technological x-risk because it is easier to stay in control of a society in stasis than one that is rapidly growing. Feudal dynasties in Europe, for example, stayed stable for centuries because their powerful subordinates, large landholders, were bound by cultural, religious, and family norms. When a new group of merchants and industrialists began to gain economic power, they

demanded political influence. Influence is zero-sum so it had to come at the expense of the old guard. If it is to fulfill its mission of preventing anthropogenic risk long into the future, the global surveillance state cannot afford to risk usurpation. Any variance in the state’s control over technology is bad because it might mean that a future existential risk slips through and destroys billions of potential future lives. Therefore, it will use its mandate of regulating technology to not only shut down potentially dangerous projects, but also to prevent anything which might shift the balance of power within their social pyramid. Again, states have demonstrated their desire and ability to enforce technological stagnation or regress several times throughout history. These actions correspond to Bostrom’s ‘permanent stagnation’ class of existential risk. In this scenario, humanity may avoid total extinction at our own hands, but we remain at a low level of utility compared to a technologically mature world and we are eventually snuffed out by natural phenomena.

Alignment

The above risks arise from a global state which is loyally following its mandate of protecting humanity’s future from dangerous inventions. A state which is not so loyal to this mandate would still find these tools for staying in power instrumental, but would use them in pursuit of much less useful goals. Bostrom provides no mechanism for making sure that this global government stays aligned with the goal of reducing existential risk and conflates a government with the ability to enact risk reducing policies with one that will actually enact risk reducing policies. But the ruling class of this global government could easily preside over a catastrophic risk to their citizens and still enrich themselves. Even with strong-minded leaders and robust institutions, a global government with this much power is a single point of failure for human civilization. Power within this state will be sought after by every enterprising group whether they care about existential risk or not. All states today are to some extent captured by special interests which lead them to do net social harm for the good of some group. If the global state falls into the control of a group with less than global interests, the alignment of the state towards global catastrophic risks will not hold.

A state which is aligned with the interests of some specific religion, race, or an even smaller oligarchic group can preside over and perpetrate the killing of billions of people and still come out ahead with respect to its narrow interests. The history of government gives no evidence that alignment with decreasing global catastrophic risk is stable. By contrast, there is evidence that alignment with the interests of some powerful subset of constituents is essentially the default condition of government.

If Bostrom is right that minimizing existential risk requires a stable and powerful global government, then politicide, propaganda, genocide, scapegoating, and stagnation are all instrumental in pursuing the strategy of minimizing anthropogenic risk. A global state with this goal is therefore itself a catastrophic risk. If it disarmed other more dangerous risks, such a state could an antidote but whether it would do so isn’t obvious. In the next section we consider whether the panopticon government is likely to disarm many existential risks.

Global surveillance states have strong incentives to develop dangerous technologies

To guarantee authority over humanity’s dangerous technological development, the global surveillance state will try to keep their technology level as high as possible relative to their constituents. We saw above one part of their strategy: enforcing technological stagnation. This alone may not be sufficient, however. The state may benefit from using technology to increase its capacity for longevity and control. These incentives would lead a global state to develop and deploy dangerous technologies.

To conquer and retain authority over all existing nation-states, the global surveillance state will need exclusive access to powerful military technology. To carry out near-perfect surveillance and enforcement of technology standards around the world, they will need artificial intelligence. Bostrom describes both of these in his paper: “Encrypted video and audio is continuously uploaded from the device to the cloud and machine-interpreted in real time. AI algorithms classify the activities of the wearer, his hand movements, nearby objects, and other situational cues.” And “the global governance institution itself could retain an arsenal of nuclear weapons as a buffer against any breakout attempt.”

So the surveillance state at a minimum has good reason to develop and deploy the two most dangerous technologies of our time, nuclear weapons and artificial intelligence. The presumed lack of interstate conflicts might make nuclear weapons less dangerous, but some of the deadliest conflicts of all time (Thirty Years War, Sengoku Period Japan, American Civil War) were intrastate ones. If the global surveillance state has to use nuclear weapons to quell “breakout attempts″ then it doesn’t really make a difference what the borders look like on a map. More obviously, an artificial intelligence algorithm which is constantly monitoring video, audio, and location data in real time from literally every human being on earth is such a massive AI x-risk that I don’t understand why Bostrom even mentions it in this paper, let alone recommends it as a strategy to reduce existential risk. One wonders if Bostrom in this context should be read in Straussian terms! AI safety researchers argue over the feasibility of ‘boxing’ AIs in virtual environments, or restricting them to act as oracles only, but they all agree that training an AI with access to 80+% of all human sense-data and connecting it with the infrastructure to call out armed soldiers to kill or imprison anyone perceived as dangerous would be a disaster.

Beyond these two examples, a global surveillance state would be searching the urn specifically for black balls. This state would have little use for technologies which would improve the lives of the median person, and they would actively suppress those which would change the most important and high status factors of production. What they want are technologies which enhance their ability to maintain control over the globe. Technologies which add to their destructive and therefore deterrent power. Bio-weapons, nuclear weapons, AI, killer drones, and geo-engineering all fit the bill.

A global state will always see maintaining power as essential. A nuclear arsenal and an AI powered panopticon are basic requirements for the global surveillance state that Bostrom imagines. It is likely that such a state will find it valuable to expand its technological lead over all other organizations by actively seeking out black ball technologies. So in addition to posing an existential risk in and of itself, a global surveillance state would increase the risk from black ball technologies by actively seeking destructive power and preventing anyone else from developing antidotes.

Even global states are bad at solving coordination problems

Coordination problems are the central challenge of human society. The price system is currently the most powerful global coordination mechanism we know of. The price system leads participants to make socially beneficial tradeoffs even when everyone acts in their self-interest and no one knows more than their preferences and immediate circumstances. However, the price system has well documented inefficiencies when dealing with things that can’t be priced: access to public goods, and the effects of a transaction on bystanders including future people. Thus, technological progress and differential development will be underproduced compared to the ideal because much of the benefit from these pursuits accrues to future people who have no way of incentivizing present day researchers and entrepreneurs who incur the costs. Nation-states can theoretically solve these externality problems within their borders, but they still face challenges from externalities with other nations, future peoples, and global public goods. Bostrom is therefore correct that some addition or change to our current system of global coordination is needed to optimally address global catastrophic risks. This does not mean that any change towards a global state is an improvement.

In his paper Bostrom acknowledges the failure of existing states to solve even trivial coordination problems: “the problem confronting us here presents special challenges; yet states have frequently failed to solve easier collective action problems.” Few states have national carbon taxes or congestion pricing. In fact, most states spend billions on automobile and fossil fuel subsidies despite the obvious negative externalities. Similar things hold for even larger and more common agricultural subsidies despite negative externalities from carbon emissions, aquifer depletion, and fertilizer runoff. States also commonly subsidize suburban and rural living with land use regulations, single family zoning, transportation subsidies, and height restrictions despite the positive economic externalities of city life and the negative environmental externalities of suburban and rural living. Federal bureaucracies prefer to protect themselves from blame rather than do what produces the highest expected value for society. So not only are states failing to solve internal collective action problems, they are often actively making them worse! Some of the benefits from solving these problems may be captured by other nations, but this is a reason why countries might not work optimally hard on producing/avoiding externalities, not a reason why they would actively subsidize the negative and restrict the positive. This behavior is explained by standard public choice critiques of (especially democratic) governments. All of these modern states have the capacity to improve on voluntary allocation of resources, but they have incentives to do the opposite.

Since these coordination failures do not primarily come from a lack of state capacity or even externalities between states, they will not be solved simply by creating a global state with the same internal incentives that current states face. Despite this, Bostrom continues to assume that the power to take a socially beneficial action is sufficient to guarantee that the state will actually do it. “States have frequently failed to solve easier collective action problems … With effective global governance, however, the solution becomes trivial: simply prohibit all states from wielding the black-ball technology destructively.” If the institutional design of this global state looks similar to any modern state, this global state will be just as susceptible to concentrated-benefit-diffuse-cost attacks, rational ignorance and irrationality among voters, and internal bureaucracies optimizing socially inefficient sub-games. These will push the global state not only to ignore possible optimizations, but actively promote negative externalities when they benefit powerful stakeholders.

We should not commit the Nirvana fallacy of comparing an imperfect market solution to a perfect but unattainable government solution. Similarly, we should not reject a global state because it is imperfect but instead compare it to realistic options. Even when compared to our highly imperfect form of decentralized authority and inter-state competition, however, a realistic world state does not look like a clear winner in terms of its likelihood of solving externality problems. Collective actions problems tend to get bigger the bigger the collective. The world state can amortize costs over billions more people than any existing state which means that they can get away with costlier subsidies to more concentrated interest groups than any existing state. Internal bureaucracies in a world state will have to have many more layers and less oversight, allowing more dysgenic optimization and corruption to take place. Divisions between factions within the world state would be much larger than divisions within current nation-states. The costs of staying informed of a global state’s policies is likely higher than for smaller ones, and the chance that any one vote will change the outcome of a global election is certainly much lower, so voter ignorance and irrationality will abound.

The ability to solve coordination problems is not sufficient for solving coordination problems, as the track record of existing states shows. Coordination technologies will be important antidotes for many types of risks from externalities, but creating a global surveillance state is not one of these antidotes.

Steelman

The steelman for global governance and preventative policing is probably something like “on the margin, it would be good to increase government oversight of specifically dangerous technologies like nuclear weapons, AI, and bioweapons.” There are some specific objections to this policy change. One can reasonably doubt the ability of governments to predict which technologies are dangerous, which are beneficial, and decide appropriate and precise regulations. There may also be problems when governance institutions are captured, and so work towards anti-social interests which may include enforcing their exclusive ownership of powerful technologies, or even sponsoring their development, so that they can dominate their rivals. Depending on the enforcement and authority of a super-national government, there might be undesirable selection effects where the states most likely to use powerful technologies for good are also the ones which closely follow the restrictions on technological development, while more conflictual countries (e.g North Korea, Iran, Pakistan) refuse to follow the rules resulting in inadvertent differential development in favor of violence.

Some global governance, however, can come with big advantages. Super-national government can bring down trade and immigration barriers. Even without explicit treaties, increasing the economic and cultural interconnectedness of the world is likely a good way to avoid interstate conflict. Global government is not necessary or sufficient for these openness benefits, but it may help. International agreements such as the Paris Climate Accords can help with global externalities like climate change. A government organization which practices differential development by trying to accelerate certain beneficial technologies rather than banning dangerous ones could have a positive net impact even if its funding choices were random because of the positive externalities from most technologies.

A more moderate increase in the policing of risky technologies on the global scale could grab some low-hanging fruit. Nuclear weapons technology has been restrained by agreement and monitoring. International and non-governmental policing may be able to restrain the most outstanding tech risks without risking the dangerous overreach of a global state. This case is much more likely to improve the world’s risk profile than the global panopticon, but it is very different from what Bostrom proposes. Although Bostrom would likely see this plan as an improvement on the status quo he says that “while pursuing such limited objectives, one should bear in mind that the protection they would offer covers only special subsets of scenarios, and might be temporary. If one finds oneself in a position to influence the macroparameters of preventive policing capacity or global governance capacity, one should consider that fundamental changes in those domains may be the only way to achieve a general ability to stabilize our civilization against emerging technological vulnerabilities.”

Part 3: Synthesis and Conclusion

Bringing all of this together, what does it mean for how EA and progress studies should think about existential risk?

Humanity’s existential risk profile is dominated by risks coming from current and potential technologies. But existential risk is also reduced by technology. Technologies reduce non-anthropogenic existential risks like asteroid strikes, supervolcanoes and pandemics but technologies can also be antidotes to anthropogenic risks including technology itself. Given natural existential risk, and the current levels of unsolved anthropogenic risks, stagnation clearly has risks of its own. The question is how to proceed with technological progress without creating unacceptable risks along the way.

A hypothetical filter which bans the invention, development, or use of overall harmful technologies until future antidotes tip the scales–and bans nothing else– would be ideal. Realistic filters, however, present challenges and risks of their own. Bostrom’s panopticon government does not look especially promising. An organization or agent who is empowered and motivated to pursue existential risk reduction on this level will find it necessary to sustain the filter’s authority over humanity for centuries to come. Totalitarian strategies of crushing dissent, genocide of scapegoats, enforced stagnation, and the development of world-destroying weapons will be useful for securing the power that is a necessary instrument for this filter.

The power which comes with the ability to construct and enforce this filter will be the ultimate prize sought after by anyone with interests narrower than the future of all humanity. Small groups could easily coordinate and use the filter mechanism to spread costly externalities or risks among the rest of the world while greatly enriching themselves. Even worse, some groups will seek to use the filter mechanism to outright destroy others. There is currently no known institutional design with this kind of power that has demonstrated even temporary immunity to this misalignment. All current states sacrifice the interests of present groups and especially future people to benefit concentrated interests within their borders. Whether the filter is aligned with the interests of humanity as a whole or not, it will hold onto its power ruthlessly so it represents a significant existential risk. Previous governments include several cases of genocide which approach Bostrom’s definition of existential risk so bigger and more powerful versions of governments seem like an unpromising strategy to reduce net risk. Given the difficulty of correctly setting up this filter and the dangers of getting it wrong, we ought to look for safer and easier ways to decrease our existential risk.

Upholding Enlightenment values is a good place to start the search for this optimal risk-reduction strategy. Rapid technological progress is at least as likely to produce antidotes to natural and anthropogenic technological risks as it is to create more of them. The open society and rapid progress that Enlightenment values foster also facilitate fast adaptation when we encounter new problems. Adherence to these values is only heuristic, but our lack of information on the dangers and benefits of most future technologies and the poor quality of alternative coordination methods make a strong case for upholding Enlightenment values in the face of existential risk.

Action Relevance

Technological existential risk is an important consideration for human welfare but what follows from that recognition isn’t obvious. This essay makes two arguments. First, wide and fast progress can decrease overall risk. Second, high risk from technological progress does not itself justify state intervention in technological progress because states do not automatically, or in fact usually, internalize the relevant externalities that would lead them to actually decrease technological risk. Neither of these things mean that addressing technological existential risk is any less important! They just imply different strategies for addressing it.

What are the actionable steps that organizations and individuals would take given these arguments?

Differential development on the individual level is beneficial and essentially required by specialization so EAs and philanthropists who are convinced that AI safety research (or any other cause) is the best thing to devote their time and money to should remain that way and they should continue trying to convince others of the same.
Much more consideration of state failure and state-led existential risk is needed before the heuristic of Enlightenment Values can be reasonably overridden. Governments are not likely to improve AI governance.
R&D on coordination mechanisms to improve governance has high leverage and more work should be done in this space.
Supporting policies and politicians to get differential development is probably not an effective use of your time. State enforced differential development is unlikely to be accurate, precise, and resistant to regulatory capture. I think this applies to political efforts for artificial intelligence risks.
Differential development via speeding up beneficial technologies is better than banning dangerous ones because getting it wrong has fewer downsides as it corrects for positive externalities from technology anyways.
More moderate global governance which promotes interconnectedness between nations, funds beneficial technologies, and uses soft power to sanction violent nations could secure some low-hanging risk reductions without high costs.

Appendix

AGI risk specifics

Many view AGI as the largest source of existential risk for the next few centuries. Assuming that they are correct, let’s consider the argument for state-led differential development. The voluntary allocation of R&D effort plausibly overproduces AI capability due to the external costs of AI risk. So a group of benevolent planners could, in theory, make everyone better off by slowing down AI capability growth relative to AI safety knowledge.

Several things have to happen before this theoretical possibility is realized, however. First, these planners and their constituents have to actually care about regulating AI capability research. The temporal and global externalities of AI risk make this difficult for politicians who need immediate results and rationally irrational voters. The increasing importance of AI in the economy and the work being done by AI risk researchers and advocates has already helped to overcome this apathy. Several governments around the world have AI strategies.

Once governments are interested in regulating the AI sector they have to be guided towards regulating the sector in the interests of all of humanity, present and future, rather than the interests of some smaller group. This will also be difficult since political coordination among all of humanity’s present and future is much more difficult than within some smaller group such as the military or an industry interested in favorable AI regulation. It seems more likely that government intervention in the AI sector will look more like advancing military uses of AI, perhaps the most dangerous use of AI, or protecting the interests of Big Tech by raising entry barriers, extending their intellectual property, and providing government contracts rather than a principled slow down of AI progress so that we all have time to consider the potential consequences of developing AI too soon.

Even if we manage to convince governments to prioritize humanity’s future over the rewards offered by special interest groups, well-intentioned attempts to accelerate AI safety relative to AI capabilities could easily make things worse. To improve our risk profile, differential development needs to be accurate and precise. Governments need to understand where the risk of AI comes from and they need to be able to target that source without much spillover. This is particularly difficult with AI because the precise source and form of AI risk is a subject of intense debate and there is considerable overlap between AI capability research and AI safety research. The optimal regulation strategy is very different if AI risk comes primarily from something like modern corporate language models or if it looks more like a rogue computer virus or a killer robot or something no one has thought of yet. Regulating the wrong type of AI could easily increase overall existential risk since the technology has so much potential to solve other risks. It may also make it more difficult to retarget the regulation once we have more information in the future.

Additionally, AI capability research and AI safety research are sometimes hard to tell apart so highly precise targeting is necessary. FTX Future Fund’s ML Safety Scholars program has safety in the name, but it’s mostly about teaching young people how machine learning works. EU regulations on AI are so imprecise that most of scientific research in general is covered by them. Imprecise regulation may slow AI safety research as much or more than it slows the growth in AI capability. Even with high precision, AI safety and capabilities research are difficult to separate. AI safety researchers need good models of what AI capabilities will be. Their search for these models may inspire the creation of advanced AI, create info-hazards, or inadvertently create AI. AI capabilities researchers need good ways to understand and control their products. They may be the first to develop effective interpretability tools, kill switches, and simulation boxes. Promoting AI safety increases the risk of black balls from their field, and curtailing AI capabilities research decreases the chance of that field producing antidotes. There is still room for this to be a beneficial trade off of course, but it decreases the expected value even assuming that the government has completely altruistic intentions.

For these reasons, AI’s importance as a large existential risk is insufficient to justify government intervention without new institutional design. The huge gains from decreasing AI risk are balanced out by huge losses from increasing it. If governments regulate AI like they regulate most other industries they are far more likely to increase risk than decrease it as they pursue the interests of concentrated interest groups and their own short-term interests. Even with spotless intentions, correctly regulating AI is very difficult under the uncertainty over the sources of AI risk. The most consequential regulation may be to prevent military AI research but to even broach such a question raises the question–which countries will do this? The expected value from state regulation of AI is not sufficient to override the heuristic of upholding Enlightenment values.

What do we know about the truth of the VWH?

The short answer is not much. We’re pulling balls from an urn, but we don’t know the total number of balls and we’ve only drawn white balls so far. This is the exact formulation of Nassim Taleb’s Black Swan problem. No matter how many white swans we’ve observed, we will never learn anything about how many black swans there are unless we observe every swan or see a black one. Naively, one could use some inductive procedure; updating confidence that there are no black balls in the urn each time we observe a white ball. However, no matter how high our certainty gets, it will always take only one observation for it to collapse to zero, making a sliding scale of certainty meaningless. Taleb gives another illustrative example: Imagine you’re a Bayesian turkey on a farm. At first, you may be unsure whether the farmer has your best interests at heart. But “every single feeding will firm up the bird’s belief that it is the general rule of life to be fed every day by friendly members of the human race ‘looking out for its best interests,’ as a politician would say. On the afternoon of the Wednesday before Thanksgiving, something unexpected will happen to the turkey. It will incur a revision of belief.”

No amount of analysis on the previous 1000 days of your life as a Bayesian turkey could have informed you about the impending doom. And in fact, the Bayesian turkey’s confidence of its safety reached its peak when it was in the most danger. An eerily similar graph could be worked up for human’s well being over time. Just take a graph of basically any metric over time and end it swiftly some time in the future to confirm the doomsday argument.

So despite centuries of human existence and hundreds of millions of inventions passing by without extinction, it would be irresponsible to claim certainty in VWH’s falsehood. There are only three ways to resolve this uncertainty. One is to have a hard-to-vary explanatory theory which can give us knowledge outside the box of probability. E.g: “The turkey is at risk because it is on a human farm and humans like to eat turkeys on Thanksgiving” or “human technological progress is not an existential risk because the law of conservation of energy implies that any technology powerful and cheap enough to destroy civilization is also powerful and cheap enough to build a much more resilient civilization.” An explanatory theory like this might exist for our relationship with existential risk and technology, but few have been posited and none confirmed. The second way we can resolve uncertainty is to discover all technologies, i.e empty the urn without finding a black ball. The third way is to pull a black ball and destroy civilization.

Bounds on the ratio of black to white balls

Although we cannot resolve uncertainty around the VWH, we can use past data to put plausible upper bounds on the amount of risk that comes from the average invention. Since the invention which spells our end will certainly not be an average one, this is only useful for contextualizing the problem, not for predicting the future.

How many balls have we pulled from the urn of technology so far? Bostrom “uses the word ‘technology’ in its broadest sense … we count not only machines and physical devices but also other kinds of instrumentally efficacious templates and procedures – including scientific ideas, institutional designs, organizational techniques, ideologies, concepts, and memes.” This definition is so broad that it is difficult to quantify. To put a lower bound on it, here is some data on the number of patents and scientific papers published each year since around 1800.

There have probably been at least 100 million patents worldwide which is itself a lower bound for the number of inventions, and there are 120 million papers in the Microsoft academic database. We can be confident that these numbers severely undercount the number of inventions and scientific ideas, and they do not even attempt to capture “institutional designs, organizational techniques, ideologies, concepts, and memes.” A reasonable estimate of all the acts of invention and scientific discovery not tracked by these data plus all the other more amorphous concepts also in the urn easily exceeds 500 million. Following Toby Ord’s estimations of natural existential risk, we can use this historical data to put plausible upper bounds on the per-draw risk of pulling a black ball. Let’s normalize to groups of 100 thousand balls to save space on digits. So we’ve probably pulled between 2,200 and 10,000 groups of 100k balls from the urn of knowledge. If we had a 99% chance of avoiding extinction or catastrophe with each group, there would be at most a .00000000025% chance of surviving as long as we have. For our history to be more likely than a 1 in 1000 chance we’d have to have to have at the very least 99.6% of surviving each group of 100k draws without incident. If you think we’ve drawn more than 220 million balls so far, this minimum probability of safety increases further. For our history of no black ball incidents to be more likely than not, we’d need a probability of safety for each group of 100k draws between .9997 and .99993, depending on how many draws you think we’ve had so far.

These numbers are credible bounds on the chance of catastrophe from a given invention if the existential risk per invention is not increasing over time, which may be a suspect assumption. However, even accepting these bounds does not necessarily relieve worries about technological x-risk despite the microscopic probabilities they place on it. Even if the chance of catastrophe per draw from the urn is not increasing, the number of draws we are taking is increasing. If our invention rate keeps growing like it is, in 200 years we might be inventing 3 billion things a year. That’s 30,000 groups of 100k inventions. Even if each group has a 99.993% chance to not kill us, getting 30,000 of these in a row is around a 12% chance. And that’s just one typical year in 2200! (However, if we make it to 2200, we’ll have observed many billions more safe inventions so we’d have a lower bound on the rate of black balls in the urn. Not sure how this time inconsistency works.)

Again, this analysis doesn’t get us any closer to answering the question central to the VWH: is there at least one black ball in the urn? It does inform us about the most likely ratio of black balls to white balls in the urn. If it were not millions of times more likely to draw a white ball than a black ball from the urn, then there would be almost no chance of making it this far.

Is Technological Maturity stable?

The specific characteristics of a technological mature humanity certainly depend on technologies, cultures, and even biologies which are unimaginable to us today. This makes it impossible to say with confidence why it would or would not be stable. Perhaps the Fermi paradox gives us reason to doubt that it is, but perhaps not. The important thing is that if technological maturity is not stable, the argument for long termism and for caring a lot about existential risk becomes much weaker. The possibility of a long, populous, and rich future is what makes existential risk important.

If existential risk is constant or decreasing in the number of technologies we discover but never low enough to be called ‘stable,’ then rapid technological progress and political liberty is still good for all of the benefits it brings along the way. If technological maturity is unstable because risk is increasing in the number of technologies we discover then the implications depend sensitively on unknown parameters. Low value from technology and high risk might imply that a return to pre-industrial agrarian life maximizes human value. The other way around and it might be that the extra risk is worth the big gains we get in wealth and population from technology along the way.