# The capability caution principle and the principle of maximal awkwardness

Capability Caution Principle: There being no consensus, we should avoid strong assumptions regarding upper limits on future AI capabilities.

It is an important meta-principle in careful design to avoid assuming the most reassuring possibility and instead design based on the most awkward possibility.

When inventing a cryptosystem, do not assume that the adversary is stupid and has limited resources: try to make something that can withstand a computationally and intellectually superior adversary. When testing a new explosive, do not assume it will be weak – stand as far away as possible. When trying to improve AI safety, do not assume AI will be stupid or weak, or that whoever implements it will be sane.

Often we think that the conservative choice is the pessimistic choice where nothing works. This is because “not working” is usually the most awkward possibility when building something. If I plan a project I should ensure that I can handle unforeseen delays and that my original plans and pathways have to be scrapped and replaced with something else. But from a safety or social impact perspective the most awkward situation is if something succeeds radically, in the near future, and we have to deal with the consequences.

Assuming the principle of maximal awkwardness is a form of steelmanning and the least convenient possible world.

This is an approach based on potential loss rather than probability. Most AI history tells us that wild dreams rarely, if ever, come true. But were we to get very powerful AI tools tomorrow it is not too hard to foresee a lot of damage and disruption. Even if you do not think the risk is existential you can probably imagine that autonomous hedge funds smarter than human traders, automated engineering in the hands of anybody and scalable automated identity theft could mess up the world system rather strongly. The fact that it might be unlikely is not as important as that the damage would be unacceptable. It is often easy to think that in uncertain cases the burden of proof is on the other party, rather than on the side where a mistaken belief would be dangerous.

As FLI stated it the principle goes both ways: do not assume the limits are super-high either. Maybe there is a complexity scaling making problem-solving systems unable to handle more than 7 things in “working memory” at the same time, limiting how deep their insights could be. Maybe social manipulation is not a tractable task. But this mainly means we should not count on the super-smart AI as a solution to problems (e.g. using one smart system to monitor another smart system). It is not an argument to be complacent.

People often misunderstand uncertainty:

• Some think that uncertainty implies that non-action is reasonable, or at least action should wait till we know more. This is actually where the precautionary principle is sane: if there is a risk of something bad happening but you are not certain it will happen, you should still try to prevent it from happening or at least monitor what is going on.
• Obviously some uncertain risks are unlikely enough that they can be ignored by rational people, but you need to have good reasons to think that the risk is actually that unlikely – uncertainty alone does not help.
• Gaining more information sometimes reduces uncertainty in valuable ways, but the price of information can sometimes be too high, especially when there are intrinsically unknowable factors and noise clouding the situation.
• Looking at the mean or expected case can be a mistake if there is a long tail of relatively unlikely but terrible possibilities: on the average day your house does not have a fire, but having insurance, a fire alarm and a fire extinguisher is a rational response.
• Combinations of uncertain factors do not become less uncertain as they are combined (even if you describe them carefully and with scenarios): typically you get broader and heavier-tailed distributions, and should act on the tail risk.

FLI asks the intriguing question of how smart AI can get. I really want to know that too. But it is relatively unimportant for designing AI safety unless the ceiling is shockingly low; it is safer to assume it can be as smart as it wants to. Some AI safety schemes involve smart systems monitoring each other or performing very complex counterfactuals: these do hinge on an assumption of high intelligence (or whatever it takes to accurately model counterfactual worlds). But then the design criteria should be to assume that these things are hard to do well.

Under high uncertainty, assume Murphy’s law holds.

(But remember that good engineering and reasoning can bind Murphy – it is just that you cannot assume somebody else will do it for you.)

# My adventures in demonology

Wired has an article about the CSER Existential Risk Conference in December 2016, rather flatteringly comparing us to superheroes. Plus a list of more or less likely risks we discussed. Calling them the “10 biggest threats” is perhaps exaggerating a fair bit: nobody is seriously worried about simulation shutdowns. But some of the others are worth working a lot more on.

## High-energy demons

I am cited as talking about existential risk from demon summoning. Since this is bound to be misunderstood, here is the full story:

As noted in the Wired list, we wrote a paper looking at the risk from the LHC, finding that there is a problem with analysing very unlikely (but high impact) risks: the probability of a mistake in the analysis overshadows the risk itself, making the analysis bad at bounding the risk. This can be handled by doing multiple independent risk bounds, which is a hassle, but it is the only (?) way to reliably conclude that things are safe.

I blogged a bit about the LHC issue before we wrote the paper, bringing up the problem of estimating probabilities for unprecedented experiments through the case of Taleb’s demon (which properly should be Taylor’s demon, but Stigler’s law of eponymy strikes again). That probably got me to have a demon association to the wider physics risk issues.

The issue of how to think about unprecedented risks without succumbing to precautionary paralysis is important: we cannot avoid doing new things, yet we should not be stupid about it. This is extra tricky when considering experiments that create things or conditions that are not found in nature.

## Not so serious?

A closely related issue is when it is reasonable to regard a proposed risk as non-serious. Predictions of risk from strangelets, black holes, vacuum decay and other “theoretical noise” caused by theoretical physics theories at least is triggered by some serious physics thinking, even if it is far out. Physicists have generally tended to ignore such risks, but when forced by anxious acceleratorphobes the arguments had to be nontrivial: the initial dismissal was not really well founded. Yet it seems totally reasonable to dismiss some risks. If somebody worries that the alien spacegods will take exception to the accelerator we generally look for a psychiatrist rather than take them seriously. Some theories have so low prior probability that it seems rational to ignore them.

But what is the proper de minimis boundary here? One crude way of estimating it is to say that risks of destroying the world with lower probability than one in 10 billion can safely be ignored – they correspond to a risk of less than one person in expectation. But we would not accept that for an individual chemistry experiment: if the chance of being blown up if someone did it was “less than 100%” but still far above some tiny number, they would presumably want to avoid risking their neck. And in the physics risk case the same risk is borne by every living human. Worse, by Bostrom’s astronomical waste argument, existential risks risks more than 1046 possible future lives. So maybe we should put the boundary at less than 10-46: any risk more likely must be investigated in detail. That will be a lot of work. Still, there are risks far below this level: the probability that all humans were to die from natural causes within a year is around 10-7.2e11, which is OK.

One can argue that the boundary does not really exist: Martin Peterson argues that setting it at some fixed low probability, that realisations of the risk cannot be ascertained, or that it is below natural risks do not truly work: the boundary will be vague.

## Demons lurking in the priors

Be as it may with the boundary, the real problem is that estimating prior probabilities is not always easy. They can vault over the vague boundary.

Hence my demon summoning example (from a blog post near Halloween I cannot find right now): what about the risk of somebody summoning a demon army? It might cause the end of the world. The theory “Demons are real and threatening” is not a hugely likely theory: atheists and modern Christians may assign it zero probability. But that breaks Cromwell’s rule: once you assign 0% to a probability no amount of evidence – including a demon army parading in front of you – will make you change your mind (or you are not applying probability theory correctly). The proper response is to assume some tiny probability $\epsilon$, conveniently below the boundary.

…except that there are a lot of old-fashioned believers who do think the theory “Demons are real and threatening” is a totally fine theory. Sure, most academic readers of this blog will not belong to this group and instead to the $\epsilon$ probability group. But knowing that there are people out there that think something different from your view should make you want to update your view in their direction a bit – after all, you could be wrong and they might know something you don’t. (Yes, they ought to move a bit in your direction too.) But now suppose you move 1% in the direction of the believers from your $\epsilon$ belief. You will now believe in the theory to $\epsilon + 1\% \approx 1\%$. That is, now you have a fairly good reason not to disregard the demon theory automatically. At least you should spend effort on checking it out. And once you are done with that you better start with the next crazy New Age theory, and the next conspiracy theory…

## Reverend Bayes doesn’t help the unbeliever (or believer)

One way out is to argue that the probability of believers being right is so low that it can be disregarded. If they have probability $\epsilon$ of being right, then the actual demon risk is of size $\epsilon$ and we can ignore it – updates due to the others do not move us. But that is a pretty bold statement about human beliefs about anything: humans can surely be wrong about things, but being that certain that a common belief is wrong seems to require better evidence.

The believer will doubtlessly claim seeing a lot of evidence for the divine, giving some big update $\Pr[belief|evidence]=\Pr[evidence|belief]\Pr[belief]/\Pr[evidence]$, but the non-believer will notice that the evidence is also pretty compatible with non-belief: $\frac{\Pr[evidence|belief]}{\Pr[evidence|nonbelief]}\approx 1$ – most believers seem to have strong priors for their belief that they then strengthen by selective evidence or interpretation without taking into account the more relevant ratio $\Pr[belief|evidence] / \Pr[nonbelief|evidence]$. And the believers counter that the same is true for the non-believers…

Insofar we are just messing around with our own evidence-free priors we should just assume that others might know something we don’t know (maybe even in a way that we do not even recognise epistemically) and update in their direction. Which again forces us to spend time investigating demon risk.

## OK, let’s give in…

Another way of reasoning is to say that maybe we should investigate all risks somebody can make us guess a non-negligible prior for. It is just that we should allocate our efforts proportional to our current probability guesstimates. Start with the big risks, and work our way down towards the crazier ones. This is a bit like the post about the best problems to work on: setting priorities is important, and we want to go for the ones where we chew off most uninvestigated risk.

If we work our way down the list this way it seems that demon risk will be analysed relatively early, but also dismissed quickly: within the religious framework it is not a likely existential risk in most religions. In reality few if any religious people hold the view that demon summoning is an existential risk, since they tend to think that the end of the world is a religious drama and hence not intended to be triggered by humans – only divine powers or fate gets to start it, not curious demonologists.

## That wasn’t too painful?

Have we defeated the demon summoning problem? Not quite. There is no reason for all those priors to sum to 1 – they are suggested by people with very different and even crazy views – and even if we normalise them we get a very long and heavy tail of weird small risks. We can easily use up any amount of effort on this, effort we might want to spend on doing other useful things like actually reducing real risks or doing fun particle physics.

There might be solutions to this issue by reasoning backwards: instead of looking at how X could cause Y that could cause Z that destroys the world we ask “If the world would be destroyed by Z, what would need to have happened to cause it?” Working backwards to Y, Y’, Y” and other possibilities covers a larger space than our initial chain from X. If we are successful we can now state what conditions are needed to get to dangerous Y-like states and how likely they are. This is a way of removing entire chunks of the risk landscape in efficient ways.

This is how I think we can actually handle these small, awkward and likely non-existent risks. We develop mental tools to efficiently get rid of lots of them in one fell sweep, leaving the stuff that needs to be investigated further. But doing this right… well, the devil lurks in the details. Especially the thicket of totally idiosyncratic risks that cannot be handled in a general way. Which is no reason not to push forward, armed with epsilons and Bayes’ rule.

That the unbeliever may have to update a bit in the believer direction may look like a win for the believers. But they, if they are rational, should do a small update into the unbeliever direction. The most important consequence is that now they need to consider existential risks due to non-supernatural causes like nuclear war, AI or particle physics. They would assign them a lower credence than the unbeliever, but as per the usual arguments for the super-importance of existential risk this still means they may have to spend effort on thinking about and mitigating these risks that they otherwise would have dismissed as something God would have prevented. This may be far more annoying to them than unbelievers having to think a bit about demonology.

Emlyn O’Regan makes some great points over at Google+, which I think are worth analyzing:

1. “Should you somehow incorporate the fact that the world has avoided destruction until now into your probabilities?”
2. “Ideas without a tech angle might be shelved by saying there is no reason to expect them to happen soon.” (since they depend on world properties that have remained unchanged.)
3. ” Ideas like demon summoning might be limited also by being shown to be likely to be the product of cognitive biases, rather than being coherent free-standing ideas about the universe.”

In the case of (1), observer selection effects can come into play. If there are no observers on a post demon-world (demons maybe don’t count) then we cannot expect to see instances of demon apocalypses in the past. This is why the cosmic ray argument for the safety of the LHC need to point to the survival of the Moon or other remote objects rather than the Earth to argue that being hit by cosmic rays over long periods prove that it is safe. Also, as noted by Emlyn, the Doomsday argument might imply that we should expect a relatively near-term end, given the length of our past: whether this matters or not depends a lot on how one handles observer selection theory.

In the case of (2), there might be development in summoning methods. Maybe medieval methods could not work, but modern computer-aided chaos magick is up to it. Or there could be rare “the stars are right” situations that made past disasters impossible. Still, if you understand the risk domain you may be able to show that the risk is constant and hence must have been low (or that we are otherwise living in a very unlikely world). Traditions that do not believe in a growth of esoteric knowledge would presumably accept that past failures are evidence of future inability.

(3) is an error theory: believers in the risk are believers not because of proper evidence but from faulty reasoning of some kind, so they are not our epistemic peers and we do not need to update in their direction. If somebody is trying to blow up a building with a bomb we call the police, but if they try to do it by cursing we may just watch with amusement: past evidence of the efficacy of magic at causing big effects is nonexistent. So we have one set of evidence-supported theories (physics) and another set lacking evidence (magic), and we make the judgement that people believing in magic are just deluded and can be ignored.

(Real practitioners may argue that there sure is evidence for magic, it is just that magic is subtle and might act through convenient coincidences that look like they could have happened naturally but occur too often or too meaningfully to be just chance. However, the skeptic will want to actually see some statistics for this, and in any case demon apocalypses look like they are way out of the league for this kind of coincidental magic).

Emlyn suggests that maybe we could scoop all the non-physics like human ideas due to brain architecture into one bundle, and assign them one epsilon of probability as a group. But now we have the problem of assigning an idea to this group or not: if we are a bit uncertain about whether it should have $\epsilon$ probability or a big one, then it will get at least some fraction of the big probability and be outside the group. We can only do this if we are really certain that we can assign ideas accurately, and looking at how many people psychoanalyse, sociologise or historicise topics in engineering and physics to “debunk” them without looking at actual empirical content, we should be wary of our own ability to do it.

So, in short, (1) and (2) do not reduce our credence in the risk enough to make it irrelevant unless we get a lot of extra information. (3) is decent at making us sceptical, but our own fallibility at judging cognitive bias and mistakes (which follows from claiming others are making mistakes!) makes error theories weaker than they look. Still, the really consistent lack of evidence of anything resembling the risk being real and that claims internal to the systems of ideas that accept the possibility imply that there should be smaller, non-existential, instances that should be observable (e.g. individual Fausts getting caught on camera visibly succeeding in summoning demons), and hence we can discount these systems strongly in favor of more boring but safe physics or hard-to-disprove but safe coincidental magic.

# What makes a watchable watchlist?

Stefan Heck managed to troll a lot of people into googling “how to join ISIS”. Very amusing, and now a lot of people think they are on a NSA watchlist.

This kind of prank is of course by why naive keyword-based watch lists are total failures. One prank and it gets overloaded. I would be shocked if any serious intelligence agency actually used them for real. Given that people’s Facebook likes give pretty good predictions of who they are (indeed, better than many friends know them) there are better methods if you happen to be a big intelligence agency.

Still, while text and other online behavior signal a lot about a person, it might not be a great tool for making proper watchlists since there is a lot of noise. For example, this paper extracts personality dimensions from online texts and looks at civilian mass murderers. They state:

Using this ranking procedure, it was found that all of the murderers’ texts were located within the highest ranked 33 places. It means that using only two simple measures for screening these texts, we can reduce the size of the population under inquiry to 0.013% of its original size, in order to manually identify all of the murderers’ texts.

At first, this sounds great. But for the US, that means the watchlist for being a mass murderer would currently have 41,000 entries. Given that over the past 150 years there has been about 150 mass murders in the US, this suggests that the precision is not going to be that great – most of those people are just normal people. The base rate problem crops up again and again when trying to find rare, scary people.

The deep problem is that there is not enough positive data points (the above paper used seven people) to make a reliable algorithm. The same issue cropped up with NSA’s SKYNET program – they also had seven positive examples and hundreds of thousands of negatives, and hence had massive overfitting (suggesting the Islamabad Al Jazeera bureau chief was a prime Al Qaeda suspect).

## Rational watchlists

The rare positive data point problem strikes any method, no matter what it is based on. Yes, looking at the social network around people might give useful information, but if you only have a few examples of bad people the system will now pick up on networks like the ones they had. This is also true for human learning: if you look too much for people like the ones that in the past committed attacks, you will focus too much on people like them and not enemies that look different. I was told by an anti-terrorism expert about a particular sign for veterans of Afghan guerrilla warfare: great if and only if such veterans are the enemy, but rather useless if the enemy can recruit others. Even if such veterans are a sizable fraction of the enemy the base rate problem may make you spend your resources on innocent “noise” veterans if the enemy is a small group. Add confirmation bias, and trouble will follow.

Note that actually looking for a small set of people on the watchlist gets around the positive data point problem: the system can look for them and just them, and this can be made precise. The problem is not watching, but predicting who else should be watched.

The point of a watchlist is that it represents a subset of something (whether people or stocks) that merits closer scrutiny. It should essentially be an allocation of attention towards items that need higher level analysis or decision-making. The U.S. Government’s Consolidated Terrorist Watch List requires nomination from various agencies, who presumably decide based on reasonable criteria (modulo confirmation bias and mistakes). The key problem is that attention is a limited resource, so adding extra items has a cost: less attention can be spent on the rest.

This is why automatic watchlist generation is likely to be a bad idea, despite much research. Mining intelligence to help an analyst figure out if somebody might fit a profile or merit further scrutiny is likely more doable. As long as analyst time is expensive it can easily be overwhelmed if something fills the input folder: HUMINT is less likely to do it than SIGINT, even if the analyst is just doing the preliminary nomination for a watchlist.

## The optimal Bayesian watchlist

One can analyse this in a Bayesian framework: assume each item has a value $x_i$ distributed as $f(x_i)$. The goal of the watchlist is to spend expensive investigatory resources to figure out the true values; say the cost is 1 per item. Then a watchlist of randomly selected items will have a mean value $V=E[x]-1$. Suppose a cursory investigation costing much less gives some indication about $x_i$, so that it is now known with some error: $y_i = x_i+\epsilon$. One approach is to select all items above a threshold $\theta$, making $V=E[x_i|y_i<\theta]-1$.

If we imagine that everything is Gaussian $x_i \sim N(\mu_x,\sigma_x^2), \epsilon \sim N(0,\sigma_\epsilon^2)$, then  $V=\int_\theta^\infty t \phi(\frac{t-\mu_x}{\sigma_x}) \Phi\left(\frac{t-\mu_x}{\sqrt{\sigma_x^2+\sigma_\epsilon^2}}\right)dt$. While one can ram through this using Owen’s useful work, here is a Monte Carlo simulation of what happens when we use $\mu_x=0, \sigma_x^2=1, \sigma_\epsilon^2=1$ (the correlation between x and y is 0.707, so this is not too much noise):

Note that in this case the addition of noise forces a far higher threshold than without noise (1.22 instead of 0.31). This is just 19% of all items, while in the noise-less case 37% of items would be worth investigating. As noise becomes worse the selection for a watchlist should become stricter: a really cursory inspection should not lead to insertion unless it looks really relevant.

Here we used a mild Gaussian distribution. In term of danger, I think people or things are more likely to be lognormal distributed since it is a product of many relatively independent factors. Using lognormal x and y leads to a situation where there is a maximum utility for some threshold. This is likely a problematic model, but clearly the shape of the distributions matter a lot for where the threshold should be.

Note that having huge resources can be a bane: if you build your watchlist from the top priority down as long as you have budget or manpower, the lower priority (but still above threshold!) entries will be more likely to be a waste of time and effort. The average utility will decline.

## Predictive validity matters more?

In any case, a cursory and cheap decision process is going to give so many so-so evaluations that one shouldn’t build the watchlist on it. Instead one should aim for a series of filters of increasing sophistication (and cost) to wash out the relevant items from the dross.

But even there there are pitfalls, as this paper looking at the pharma R&D industry shows:

We find that when searching for rare positives (e.g., candidates that will successfully complete clinical development), changes in the predictive validity of screening and disease models that many people working in drug discovery would regard as small and/or unknowable (i.e., an 0.1 absolute change in correlation coefficient between model output and clinical outcomes in man) can offset large (e.g., 10 fold, even 100 fold) changes in models’ brute-force efficiency.

Just like for drugs (an example where the watchlist is a set of candidate compounds), it might be more important for terrorist watchlists to aim for signs with predictive power of being a bad guy, rather than being correlated with being a bad guy. Otherwise anti-terrorism will suffer the same problem of declining productivity, despite ever more sophisticated algorithms.

# The hazard of concealing risk

Review of Man-made Catastrophes and Risk Information Concealment: Case Studies of Major Disasters and Human Fallibility by Dmitry Chernov and Didier Sornette (Springer).

I have recently begun to work on the problem of information hazards: when spreading true information is causing danger. Since we normally regard information as a good thing this is a bit unusual and understudied, and in the case of existential risk it is important to get things right at the first try.

However, concealing information can also produce risk. This book is an excellent series of case studies of major disasters, showing how the practice of hiding information contributed to make them possible, worse, and hinder rescue/recovery.

Chernov and Sornette focus mainly on technological disasters such as the Vajont Dam, Three Mile Island, Bhopal, Chernobyl, the Ufa train disaster, Fukushima and so on, but they also cover financial disasters, military disasters, production industry failures and concealment of product risk. In all of these cases there was plentiful concealment going on at multiple levels, from workers blocking alarms to reports being classified or deliberately mislaid to active misinformation campaigns.

When summed up, many patterns of information concealment recur again and again. They sketch out a model of the causes of concealment, with about 20 causes grouped into five major clusters: the external environment enticing concealment, risk communication channels blocked, an internal ecology stimulating concealment or ignorance, faulty risk assessment and knowledge management, and people having personal incentives to conceal.

The problem is very much systemic: having just one or two of the causative problems can be counteracted by good risk management, but when several causes start to act together they become much harder to deal with – especially since many corrode the risk management ability of the entire organisation. Once risks are hidden, it becomes harder to manage them (management, after all, is done through information). Conversely, they list examples of successful risk information management: risk concealment may be something that naturally tends to emerge, but it can be counteracted.

Chernov and Sornette also apply their model to some technologies they think show signs of risk concealment: shale energy, GMOs, real debt and liabilities of the US and China, and the global cyber arms race. They are not arguing that a disaster is imminent, but the patterns of concealment are a reason for concern: if they persist, they have potential to make things worse the day something breaks.

Is information concealment the cause of all major disasters? Definitely not: some disasters are just due to exogenous shocks or surprise failures of technology. But as Fukushima shows, risk concealment can make preparation brittle and handling the aftermath inefficient. There is also likely plentiful risk concealment in situations that will never come to attention because there is no disaster necessitating and enabling a thorough investigation. There is little to suggest that the examined disasters were all uniquely bad from a concealment perspective.

From an information hazard perspective, this book is an important rejoinder: yes, some information is risky. But lack of information can be dangerous too. Many of the reasons for concealment like national security secrecy, fear of panic, prevention of whistle-blowing, and personnel being worried about personally being held accountable for a serious fault are maladaptive information hazard management strategies. The worker not reporting a mistake is handling a personal information hazard, at the expense of the safety of the entire organisation. Institutional secrecy is explicitly intended to contain information hazards, but tends to compartmentalize and block relevant information flows.

A proper information hazard management strategy needs to take the concealment risk into account too: there is a risk cost of not sharing information. How these two risks should be rationally traded against each other is an important question to investigate.

# All models are wrong, some are useful – but how can you tell?

Our whitepaper about the systemic risk of risk modelling is now out. The topic is how the risk modelling process can make things worse – and ways of improving things. Cognitive bias meets model risk and social epistemology.

The basic story is that in insurance (and many other domains) people use statistical models to estimate risk, and then use these estimates plus human insight to come up with prices and decisions. It is well known (at least in insurance) that there is a measure of model risk due to the models not being perfect images of reality; ideally the users will take this into account. However, in reality (1) people tend to be swayed by models, (2) they suffer from various individual and collective cognitive biases making their model usage imperfect and correlates their errors, (3) the markets for models, industrial competition and regulation leads to fewer models being used than there could be. Together this creates a systemic risk: everybody makes correlated mistakes and decisions, which means that when a bad surprise happens – a big exogenous shock like a natural disaster or a burst of hyperinflation, or some endogenous trouble like a reinsurance spiral or financial bubble – the joint risk of a large chunk of the industry failing is much higher than it would have been if everybody had had independent, uncorrelated models. Cue bailouts or skyscrapers for sale.

Note that this is a generic problem. Insurance is just unusually self-aware about its limitations (a side effect of convincing everybody else that Bad Things Happen, not to mention seeing the rest of the financial industry running into major trouble). When we use models the model itself (the statistics and software) is just one part: the data fed into the model, the processes of building and tuning the model, how people use it in their everyday work, how the output leads to decisions, and how the eventual outcomes become feedback to the people involved – all of these factors are important parts in making model use useful. If there is no or too slow feedback people will not learn what behaviours are correct or not. If there are weak incentives to check errors of one type, but strong incentives for other errors, expect the system to become biased towards one side. It applies to climate models and military war-games too.

The key thing is to recognize that model usefulness is not something that is directly apparent: it requires a fair bit of expertise to evaluate, and that expertise is also not trivial to recognize or gain. We often compare models to other models rather than reality, and a successful career in predicting risk may actually be nothing more than good luck in avoiding rare but disastrous events.

What can we do about it? We suggest a scorecard as a first step: comparing oneself to some ideal modelling process is a good way of noticing where one could find room for improvement. The score does not matter as much as digging into one’s processes and seeing whether they have cruft that needs to be fixed – whether it is following standards mindlessly, employees not speaking up, basing decisions on single models rather than more broad views of risk, or having regulators push one into the same direction as everybody else. Fixing it may of course be tricky: just telling people to be less biased or to do extra error checking will not work, it has to be integrated into the organisation. But recognizing that there may be a problem and getting people on board is a great start.

In the end, systemic risk is everybody’s problem.

# “A lump of cadmium”

Stuart Armstrong sent me this email:

I have a new expression: “a lump of cadmium”.

Background: in WW2, Heisenberg was working on the German atomic reactor project (was he bad? see the fascinating play “Copenhagen” to find out!). His team almost finished a nuclear reactor. He thought that a reaction with natural uranium would be self-limiting (spoiler: it wouldn’t), so had no cadmium control rods or other means of stopping a chain reaction.

But, no worries: his team has “a lump of cadmium” that they could toss into the reactor if things got out of hand. So, now, if someone has a level of precaution woefully inadequate to the risk at hand, I will call it a lump of cadmium.

(Based on German Nuclear Program Before and During World War II by Andrew Wendorff)

It reminds me of the story that SCRAM (emergency nuclear reactor shutdowns) stands for “Safety Control Rod Axe Man“, a guy standing next to the rope suspending the control rods with an axe, ready to cut it. It has been argued it was liquid cadmium solution instead. Still, in the US project they did not assume the reaction was self stabilizing.

Going back to the primary citation, we read:

To understand it we must say something about Heisenberg’s concept of reactor design. He persuaded himself that a reactor designed with natural uranium and, say, a heavy water moderator would be self-stabilizing and could not run away. He noted that U(238) has absorption resonances in the 1-eV region, which means that a neutron with this kind of energy has a good chance of being absorbed and thus removed from the chain reaction. This is one of the challenges in reactor design—slowing the neutrons with the moderator without losing them all to absorption. Conversely, if the reactor begins to run away (become supercritical) , these resonances would broaden and neutrons would be more readily absorbed. Moreover, the expanding material would lengthen the mean free paths by decreasing the density and this expansion would also stop the chain reaction. In short, we might experience a nasty chemical explosion but not a nuclear holocaust. Whether Heisenberg realized the consequences of such a chemical explosion is not clear. In any event, no safety elements like cadmium rods were built into Heisenberg’s reactors. At best, a lump of cadmium was kepton hand in case things threatened to get out of control. He also never considered delayed neutrons, which, as we know, play an essential role in reactor safety. Because none of Heisenberg’s reactors went critical, this dubious strategy was never put to the test.
(Jeremy Bernstein, Heisenberg and the critical mass. Am. J. Phys. 70, 911 (2002); http://dx.doi.org/10.1119/1.1495409)

This reminds me a lot of the modelling errors we discuss in the “Probing the improbable” paper, especially of course the (ahem) energetic error giving Castle Bravo 15 megatons of yield instead of the predicted 4-8 megatons. Leaving out Li(7) from the calculations turned out to leave out the major contributor of energy.

Note that Heisenberg did have an argument for his safety, in fact two independent ones! The problem might have been that he was thinking in terms of mostly U(238) and then getting any kind of chain reaction going would be hard, so he was biased against the model of explosive chain reactions (but as the Bernstein paper notes, somebody in the project had correct calculations for explosive critical masses). Both arguments were flawed when dealing with reactors enriched in U(235). Coming at nuclear power from the perspective of nuclear explosions on the other hand makes it natural to consider how to keep things from blowing up.

We may hence end up with lumps of cadmium because we approach a risk from the wrong perspective. The antidote should always be to consider the risks from multiple angles, ideally a few adversarial ones. The more energy, speed or transformative power we expect something to produce, the more we should scrutinize existing safeguards for them being lumps of cadmium. If we think our project does not have that kind of power, we should both question why we are even doing it, and whether it might actually have some hidden critical mass.

# Canine mechanics and banking

There are some texts that are worth reading, even if you are outside the group they are intended for. Here is one that I think everybody should read at least the first half of:

Andrew G Haldane and Vasileios Madouros: The dog and the frisbee

Haldane, the Executive Director for Financial Stability at Bank of England, brings up the topic is how to act in situations of uncertainty, and the role of our models of reality in making the right decision. How complex should they be in the face of a complex reality? The answer, based on the literature on heuristics, biases and modelling, and the practical world of financial disasters, is simple: they should be simple.

Using too complex models means that they tend to overfit scarce data, weight data randomly, require significant effort to set up – and tends to promote overconfidence. As Haldane then moves on to his own main topic, banking regulation. Complex regulations – which are in a sense models of how banks ought to act – have the same problem, and also act as incentives for playing the rules to gain advantage. The end result is an enormous waste of everybody’s time and effort that does not give the desired reduction of banking risk.

It is striking how many people have been seduced by the siren call of complex regulation or models, thinking their ability to include every conceivable special case is a sign of strength. Finance and finance regulation are full of smart people who make the same mistake, as is science. If there is one thing I learned in computational biology is that your model better produce more nontrivial results than the number of parameters it has.

But coming up with simple rules or models is not easy: knowing what to include and what not to include requires expertise and effort. In many ways this may be why people like complex models, since there are no tricky judgement calls.

# Threat reduction Thursday

Today seems to have been “doing something about risk”-day. Or at least, “let’s investigate risk so we know what we ought to do”-day.
First, the World Economic Forum launched their 2015 risk perception report. (Full disclosure: I am on the advisory committee)
Second, Elon Musk donated \$10M to AI safety research. Yes, this is quite related to the FLI open letter.
Today has been a good day. Of course, it will be an even better day if and when we get actual results in risk mitigation.

# Objectively evil technology

George Dvorsky has a post on io9: 10 Horrifying Technologies That Should Never Be Allowed To Exist. It is a nice clickbaity overview of some very bad technologies:

1. Weaponized nanotechnology (he mainly mentions ecophagy, but one can easily come up with other nasties like ‘smart poisons’ that creep up on you or gremlin devices that prevent technology – or organisms – from functioning)
2. Conscious machines (making devices that can suffer is not a good idea)
3. Artificial superintelligence (modulo friendliness)
4. Time travel
5. Mind reading devices (because of totalitarian uses)
6. Brain hacking devices
7. Autonomous robots programmed to kill humans
8. Weaponized pathogens
9. Virtual prisons and punishment
10. Hell engineering (that is, effective production of super-adverse experiences; consider Iain M. Banks’ Surface Detail, or the various strange/silly/terrifying issues linked to Roko’s basilisk)

Some of the these technologies exist, like weaponized pathogens. Others might be impossible, like time travel. Some are embryonic like mind reading (we can decode some brainstates, but it requires spending a while in a big scanner as the input-output mapping is learned).

A commenter on the post asked “Who will have the responsibility of classifying and preventing “objectively evil” technology?” The answer is of course People Who Have Ph.D.s in Philosophy.

Unfortunately I haven’t got one, but that will not stop me.

## Existential risk as evil?

I wonder what unifies this list. Let’s see: 1, 3, 7, and 8 are all about danger: either the risk of a lot of death, or the risk of extinction. 2, 9 and 10 are all about disvalue: the creation of very negative states of experience. 5 and 6 are threats to autonomy.

4, time travel, is the odd one out: George suggests that it is dangerous, but this is based on fictional examples, and that contact between different civilizations has never ended well (which is arguable: Japan). I can imagine a consistent universe with time travel might be bad for people’s sense of free will, and if you have time loops you can do super-powerful computation (getting superintelligence risk), but I do not think of any kind of plausible physics where time travel itself is dangerous. Fiction just makes up dangers to make the plot move on.

In the existential risk framework, it is worth noting that extinction is not the only kind of existential risk. We could mess things up so that humanity’s full potential never gets realized (for example by being locked into a perennial totalitarian system that is actually resistant to any change), or that we make the world hellish. These are axiological existential risks. So the unifying aspect of these technologies is that they could cause existential risk, or at least bad enough approximations.

Ethically, existential threats count a lot. They seem to have priority over mere disasters and other moral problems in a wide range of moral systems (not just consequentialism). So technologies that strongly increase existential risk without giving a commensurate benefit (for example by reducing other existential risks more – consider a global surveillance state, which might be a decent defence against people developing bio-, nano- and info-risks at the price of totalitarian risk) are indeed impermissible. In reality technologies have dual uses and the eventual risk impact can be hard to estimate, but the principle is reasonable even if implementation will be a nightmare.

## Messy values

However, extinction risk is an easy category – even if some of the possible causes like superintelligence are weird and controversial, at least extinct means extinct. The value and autonomy risks are far trickier. First, we might be wrong about value: maybe suffering doesn’t actually count morally, we just think it does. So a technology that looks like it harms value badly like hell engineering actually doesn’t. This might seem crazy, but we should recognize that some things might be important but we do not recognize them. Francis Fukuyama thought transhumanist enhancement might harm some mysterious ‘Factor X’ (i.e. a “soul) giving us dignity that is not widely recognized. Nick Bostrom (while rejecting the Factor X argument) has suggested that there might be many “quiet values” important for diginity, taking second seat to the “loud” values like alleviation of suffering but still being important – a world where all quiet values disappear could be a very bad world even if there was no suffering (think Aldous Huxley’s Brave New World, for example). This is one reason why many superintelligence scenarios end badly: transmitting the full nuanced world of human values – many so quiet that we do not even recognize them ourselves before we lose them – is very hard. I suspect that most people find it unlikely that loud values like happiness or autonomy actually are parochial and worthless, but we could be wrong. This means that there will always be a fair bit of moral uncertainty about axiological existential risks, and hence about technologies that may threaten value. Just consider the argument between Fukuyama and us transhumanists.

Second, autonomy threats are also tricky because autonomy might not be all that it is cracked up to be in western philosophy. The atomic free-willed individual is rather divorced from the actual neural and social matrix creature. But even if one doesn’t buy autonomy as having intrinsic value, there are likely good cybernetic arguments for why maintaining individuals as individuals with their own minds is a good thing. I often point to David Brin’s excellent defence of the open society where he points out that societies where criticism and error correction are not possible will tend to become corrupt, inefficient and increasingly run by the preferences of the dominant cadre. In the end they will work badly for nearly everybody and have a fair risk of crashing. Tools like surveillance, thought reading or mind control would potentially break this beneficial feedback by silencing criticism. They might also instil identical preferences, which seems to be a recipe for common mode errors causing big breakdowns: monocultures are more vulnerable than richer ecosystems. Still, it is not obvious that these benefits could not exist in (say) a group-mind where individuality is also part of a bigger collective mind.

## Criteria and weasel-words

These caveats aside, I think the criteria for “objectively evil technology” could be

(1) It predictably increases existential risk substantially without commensurate benefits,

or,

(2) it predictably increases the amount of death, suffering or other forms of disvalue significantly without commensurate benefits.

There are unpredictable bad technologies, but they are not immoral to develop. However, developers do have a responsibility to think carefully about the possible implications or uses of their technology. And if your baby-tickling machine involves black holes you have a good reason to be cautious.

Of course, “commensurate” is going to be the tricky word here. Is a halving of nuclear weapons and biowarfare risk good enough to accept a doubling of superintelligence risk? Is a tiny probability existential risk (say from a physics experiment) worth interesting scientific findings that will be known by humanity through the entire future? The MaxiPOK principle would argue that the benefits do not matter or weigh rather lightly. The current gain-of-function debate show that we can have profound disagreements – but also that we can try to construct institutions and methods that regulate the balance, or inventions that reduce the risk. This also shows the benefit of looking at larger systems than the technology itself: a potentially dangerous technology wielded responsibly can be OK if the responsibility is reliable enough, and if we can bring a safeguard technology into place before the risky technology it might no longer be unacceptable.

The second weasel word is “significantly”. Do landmines count? I think one can make the case. According to the UN they kill 15,000 to 20,000 people per year. The number of traffic fatalities per year worldwide is about 1.2 million deaths – but we might think cars are actually so beneficial that it outweighs the many deaths.

## Intention?

The landmines are intended to harm (yes, the ideal use is to make people rationally stay the heck away from mined areas, but the harming is inherent in the purpose) while cars are not. This might lead to an amendment of the second criterion:

(2′) The technology  intentionally increases the amount of death, suffering or other forms of disvalue significantly without commensurate benefits.

This gets closer to how many would view things: technologies intended to cause harm are inherently evil. But being a consequentialist I think it let’s designers off the hook. Dr Guillotine believed his invention would reduce suffering (and it might have) but it also led to a lot more death. Dr Gatling invented his gun to “reduce the size of armies and so reduce the number of deaths by combat and disease, and to show how futile war is.” So the intention part is problematic.

Some people are concerned with autonomous weapons because they are non-moral agents making life-and-death decisions over people; they would use deontological principles to argue that making such amoral devices are wrong. But a landmine that has been designed to try to identify civilians and not blow up around them seems to be a better device than an indiscriminate device: the amorality of the decisionmaking is less of problematic than the general harmfulness of the device.

I suspect trying to bake in intentionality or other deontological concepts will be problematic. Just as human dignity (another obvious concept – “Devices intended to degrade human dignity are impermissible”) is likely a non-starter. They are still useful heuristics, though. We do not want too much brainpower spent on inventing better ways of harming or degrading people.

## Policy and governance: the final frontier

In the end, this exercise can be continued indefinitely. And no doubt it will.

Given the general impotence of ethical arguments to change policy (it usually picks up the pieces and explains what went wrong once it does go wrong) a more relevant question might be how a civilization can avoid developing things it has a good reason to suspect are a bad idea. I suspect the answer to that is going to be not just improvements in coordination and the ability to predict consequences, but some real innovations in governance under empirical and normative uncertainty.

But that is for another day.

# Truth and laughter

Slate Star Codex has another great post: If the media reported on other dangers like it does AI risk.

The new airborne superplague is said to be 100% fatal, totally untreatable, and able to spread across an entire continent in a matter of days. It is certainly fascinating to think about if your interests tend toward microbiology, and we look forward to continuing academic (and perhaps popular) discussion and debate on the subject.

Of course, one can argue that AI risk is less recognized as a serious issue than superplagues, meteors or economic depressions (although, given what news media have been writing recently about Ebola and 1950 DA, their level of understanding can be debated). There is disagreement on AI risk among people involved in the subject, with some rather bold claims of certainty among some, rational reasons to be distrustful of predictions, and plenty of vested interests and motivated thinking. But this internal debate is not the reason media makes a hash of things: it is not like there is an AI safety denialist movement pushing the message that worrying about AI risk is silly, or planting stupid arguments to discredit safety concerns. Rather, the whole issue is so out there that not only the presumed reader but the journalist too will not know what to make of it. It is hard to judge credibility, how good arguments are and the size of risks. So logic does not apply very strongly – anyway, it does not sell.

This is true for climate change and pandemics too. But here there is more of an infrastructure of concern, there are some standards (despite vehement disagreements) and the risks are not entirely unprecedented. There are more ways of dealing with the issue than referring to fiction or abstract arguments that tend to fly over the heads of most. The discussion has moved further from the frontiers of the thinkable not just among experts but also among journalists and the public.

How do discussions move from silly to mainstream? Part of it is mere exposure: if the issue comes up again and again, and other factors do not reinforce it as being beyond the pale, it will become more thinkable. This is how other issues creep up on the agenda too: small stakeholder groups drive their arguments, and if they are compelling they will eventually leak into the mainstream. High status groups have an advantage (uncorrelated to the correctness of arguments, except for the very rare groups that gain status from being documented as being right about a lot of things).

Another is demonstrations. They do not have to be real instances of the issue, but close enough to create an association: a small disease outbreak, an impressive AI demo, claims that the Elbonian education policy really works. They make things concrete, acting as a seed crystal for a conversation. Unfortunately these demonstrations do not have to be truthful either: they focus attention and update people’s probabilities, but they might be deeply flawed. Software passing a Turing test does not tell us much about AI. The safety of existing AI software or biohacking does not tell us much about their future safety. 43% of all impressive-sounding statistics quoted anywhere is wrong.

Truth likely makes argumentation easier (reality is biased in your favour, opponents may have more freedom to make up stuff but it is more vulnerable to disproof) and can produce demonstrations. Truth-seeking people are more likely to want to listen to correct argumentation and evidence, and even if they are a minority they might be more stable in their beliefs than people who just view beliefs as clothing to wear (of course, zealots are also very stable in their beliefs since they isolate themselves from inconvenient ideas and facts).

Truth alone can not efficiently win the battle of bringing an idea in from the silliness of the thinkability frontier to the practical mainstream. But I think humour can act as a lubricant: by showing the actual silliness of mainstream argumentation, we move them outwards towards the frontier, making a bit more space for other things to move inward. When demonstrations are wrong, joke about their flaws. When ideas are pushed merely because of status, poke fun at the hot air cushions holding them up.