The capability caution principle and the principle of maximal awkwardness

ShadowsThe Future of Life Institute discusses the

Capability Caution Principle: There being no consensus, we should avoid strong assumptions regarding upper limits on future AI capabilities.

It is an important meta-principle in careful design to avoid assuming the most reassuring possibility and instead design based on the most awkward possibility.

When inventing a cryptosystem, do not assume that the adversary is stupid and has limited resources: try to make something that can withstand a computationally and intellectually superior adversary. When testing a new explosive, do not assume it will be weak – stand as far away as possible. When trying to improve AI safety, do not assume AI will be stupid or weak, or that whoever implements it will be sane.

Often we think that the conservative choice is the pessimistic choice where nothing works. This is because “not working” is usually the most awkward possibility when building something. If I plan a project I should ensure that I can handle unforeseen delays and that my original plans and pathways have to be scrapped and replaced with something else. But from a safety or social impact perspective the most awkward situation is if something succeeds radically, in the near future, and we have to deal with the consequences.

Assuming the principle of maximal awkwardness is a form of steelmanning and the least convenient possible world.

This is an approach based on potential loss rather than probability. Most AI history tells us that wild dreams rarely, if ever, come true. But were we to get very powerful AI tools tomorrow it is not too hard to foresee a lot of damage and disruption. Even if you do not think the risk is existential you can probably imagine that autonomous hedge funds smarter than human traders, automated engineering in the hands of anybody and scalable automated identity theft could mess up the world system rather strongly. The fact that it might be unlikely is not as important as that the damage would be unacceptable. It is often easy to think that in uncertain cases the burden of proof is on the other party, rather than on the side where a mistaken belief would be dangerous.

As FLI stated it the principle goes both ways: do not assume the limits are super-high either. Maybe there is a complexity scaling making problem-solving systems unable to handle more than 7 things in “working memory” at the same time, limiting how deep their insights could be. Maybe social manipulation is not a tractable task. But this mainly means we should not count on the super-smart AI as a solution to problems (e.g. using one smart system to monitor another smart system). It is not an argument to be complacent.

People often misunderstand uncertainty:

  • Some think that uncertainty implies that non-action is reasonable, or at least action should wait till we know more. This is actually where the precautionary principle is sane: if there is a risk of something bad happening but you are not certain it will happen, you should still try to prevent it from happening or at least monitor what is going on.
  • Obviously some uncertain risks are unlikely enough that they can be ignored by rational people, but you need to have good reasons to think that the risk is actually that unlikely – uncertainty alone does not help.
  • Gaining more information sometimes reduces uncertainty in valuable ways, but the price of information can sometimes be too high, especially when there are intrinsically unknowable factors and noise clouding the situation.
  • Looking at the mean or expected case can be a mistake if there is a long tail of relatively unlikely but terrible possibilities: on the average day your house does not have a fire, but having insurance, a fire alarm and a fire extinguisher is a rational response.
  • Combinations of uncertain factors do not become less uncertain as they are combined (even if you describe them carefully and with scenarios): typically you get broader and heavier-tailed distributions, and should act on the tail risk.

FLI asks the intriguing question of how smart AI can get. I really want to know that too. But it is relatively unimportant for designing AI safety unless the ceiling is shockingly low; it is safer to assume it can be as smart as it wants to. Some AI safety schemes involve smart systems monitoring each other or performing very complex counterfactuals: these do hinge on an assumption of high intelligence (or whatever it takes to accurately model counterfactual worlds). But then the design criteria should be to assume that these things are hard to do well.

Under high uncertainty, assume Murphy’s law holds.

(But remember that good engineering and reasoning can bind Murphy – it is just that you cannot assume somebody else will do it for you.)

Cool risks outside the envelope of nature

How do we apply the precautionary principle to exotic, low-probability risks?

The CUORE collaboration at the INFN Gran Sasso National Laboratory recently set a world record by cooling a cubic meter 400 kg copper vessel down to 6 milliKelvins: it was the coldest cubic meter in the universe for over 15 days. Yay! Applause! (And the rest of this post should in no way be construed as a criticism of the experiment)

Cold and weird risks

CrystalsI have not been able to dig up the project documentation, but I would be astonished if there was any discussion of risk due to the experiment. After all, cooling things is rarely dangerous. We do not have any physical theories saying there could be anything risky here. No doubt there are risk assessment of liquid nitrogen or helium practical risks somewhere, but no analysis of any basic physics risks.

Compare this to the debates around the LHC, where critics at least could point to papers suggesting that strangelets, small black holes and vacuum decay were theoretically possible. Yet the LHC could argue back that particle processes like those occurring in the accelerator were already naturally occurring almost everywhere: if the LHC was risky, we ought to see plenty of explosions in the sky. Leaving aside the complications of correcting for anthropic bias, this kind of argument seems reasonably solid: if you do something that is within the envelope of what happens in the universe normally and there are no observed super-dangerous processes linked to it, then this activity is likely fine. We might wish for careful risk assessment, but given that the activity is already happening it can be viewed as just as benign as the normal activity of the universe.

However, the CUORE experiment is actually going outside of the envelope of what we think is going on in the universe. In the past, the universe has been hotter, so there would not have been any large masses at 6 milliKelvins. And with a 3 Kelvin background temperature, there would not be any natural objects this cold. (Since 1995 there have been small Bose-Einstein condensates in the hundred nanoKelvin range on Earth, but the argument is the same.)

How risky is it to generate such an outside of the envelope phenomenon? There is no evidence from the past. There is no cause for alarm given the known laws of physics. Yet this lack of evidence does not argue against risk either. Maybe there is an ice-9 like phase transition of matter below a certain temperature. Maybe it implodes into a black hole because of some macroscale quantum(gravity) effect. Maybe the alien spacegods get angry. There is an endless number of possible hypotheses that cannot be ruled out.

We might think that such “small theories” can safely be ignored. But we have some potential evidence that the universe may be riskier than it looks: the Fermi paradox, the apparent absence of alien intelligence. If we are alone, it is either because there are one or more steps in the evolution of life and intelligence that are very unlikely (the “great filter” is behind us), or there is a high likelihood that intelligence disappears without a trace (a future great filter). Now, we might freely assign our probabilities to (1) that there are aliens around, (2) that the filter is behind us, and (3) that it is ahead. However, given our ignorance we cannot rationally give zero probability to any of these possibilities, and probably not even give any of them less than 1% (since that is about the natural lowest error rate of humans on anything). Anybody saying one of them is less likely than one in a million is likely very overconfident. Yet a 1% risk of a future great filter implies a huge threat. It is a threat that not only reliably wipes out intelligent life, but also does it to civilizations aware of its potential existence!

We then have a slightly odd reason to be slightly concerned with experiments like CUORE. We know there is some probability that intelligence gets reliably wiped out. We know intelligence is likely to explore conditions not found in the natural universe. So a potential explanation could be that there is some threat in this exploration. The probability is not enormous – we might think the filter is behind us or the universe is teeming with aliens, and even if there is a future filter there are many possibilities for what it could be besides low-temperature physics – but nearly any non-infinitesimal probability multiplied by the value of our species (at least 7 billion lives) tends to lead to a too large risk.


A tad chillyAt this point the precautionary principle rears its stupid head (the ugly head is asleep). The stupid head argues that we should hence never do anything that is outside the natural envelope.

The ugly head would argue we should investigate before doing anything risky, but since in this case the empirical studying is causing the risk the head would hence advice just trying out theoretical risk scenarios – not very useful given that we are dealing with something where all potential risk comes from scenarios unconstrained by evidence!

We cannot obey the stupid head much, since most human activity is about pushing the envelope. We are trying to have more and happier people than has ever existed in the universe before. Maybe that is risky (compare to Stapledon’s Last and First Men where it turned out to be dangerous to have too much intelligence in one spot), but it is both practically hard to prevent and this kind of open-ended “let’s not do anything that has not happened in the past” seems unreasonable given that most events are new ones and generally do not lead to disasters. But the pushing of the envelope into radically new directions does carry undefinable risk. We cannot avoid that. What we can do is to discuss whether we are willing to take on such hard to pin down risk.

However, this example also shows a way precaution can break down. Nobody has, to my knowledge, worried about cooling down matter besides me. There is no concerned group urging precaution since there is no empirical nor normative reason to think there is anything wrong specifically with CUORE: we only have a general Fermi paradox-induced inchoate worry. Yet proper precaution requires considering weak possibilities. I suspect that most future big new disasters will turn out to have avoided precautionary considerations just because there was no obvious reason to invoke the principle.


Many people are scared more by uncertainty than actual risk. But we cannot escape it. Especially if we want to reduce existential risk, which tends to be more uncertain than most. This little essay is about some of the really tricky limits to what we can know about new risks. We should expect them to be unexpected. And we should expect that the standard decision methods will not behave sensibly.

As for the CUORE team, I wish them the best of luck to find neutrinoless double beta decay. But they should keep an eye open for weird anomalies too – they have a chance to peek outside the envelope of the natural in a well controlled setting, and that is valuable.