The capability caution principle and the principle of maximal awkwardness

ShadowsThe Future of Life Institute discusses the

Capability Caution Principle: There being no consensus, we should avoid strong assumptions regarding upper limits on future AI capabilities.

It is an important meta-principle in careful design to avoid assuming the most reassuring possibility and instead design based on the most awkward possibility.

When inventing a cryptosystem, do not assume that the adversary is stupid and has limited resources: try to make something that can withstand a computationally and intellectually superior adversary. When testing a new explosive, do not assume it will be weak – stand as far away as possible. When trying to improve AI safety, do not assume AI will be stupid or weak, or that whoever implements it will be sane.

Often we think that the conservative choice is the pessimistic choice where nothing works. This is because “not working” is usually the most awkward possibility when building something. If I plan a project I should ensure that I can handle unforeseen delays and that my original plans and pathways have to be scrapped and replaced with something else. But from a safety or social impact perspective the most awkward situation is if something succeeds radically, in the near future, and we have to deal with the consequences.

Assuming the principle of maximal awkwardness is a form of steelmanning and the least convenient possible world.

This is an approach based on potential loss rather than probability. Most AI history tells us that wild dreams rarely, if ever, come true. But were we to get very powerful AI tools tomorrow it is not too hard to foresee a lot of damage and disruption. Even if you do not think the risk is existential you can probably imagine that autonomous hedge funds smarter than human traders, automated engineering in the hands of anybody and scalable automated identity theft could mess up the world system rather strongly. The fact that it might be unlikely is not as important as that the damage would be unacceptable. It is often easy to think that in uncertain cases the burden of proof is on the other party, rather than on the side where a mistaken belief would be dangerous.

As FLI stated it the principle goes both ways: do not assume the limits are super-high either. Maybe there is a complexity scaling making problem-solving systems unable to handle more than 7 things in “working memory” at the same time, limiting how deep their insights could be. Maybe social manipulation is not a tractable task. But this mainly means we should not count on the super-smart AI as a solution to problems (e.g. using one smart system to monitor another smart system). It is not an argument to be complacent.

People often misunderstand uncertainty:

  • Some think that uncertainty implies that non-action is reasonable, or at least action should wait till we know more. This is actually where the precautionary principle is sane: if there is a risk of something bad happening but you are not certain it will happen, you should still try to prevent it from happening or at least monitor what is going on.
  • Obviously some uncertain risks are unlikely enough that they can be ignored by rational people, but you need to have good reasons to think that the risk is actually that unlikely – uncertainty alone does not help.
  • Gaining more information sometimes reduces uncertainty in valuable ways, but the price of information can sometimes be too high, especially when there are intrinsically unknowable factors and noise clouding the situation.
  • Looking at the mean or expected case can be a mistake if there is a long tail of relatively unlikely but terrible possibilities: on the average day your house does not have a fire, but having insurance, a fire alarm and a fire extinguisher is a rational response.
  • Combinations of uncertain factors do not become less uncertain as they are combined (even if you describe them carefully and with scenarios): typically you get broader and heavier-tailed distributions, and should act on the tail risk.

FLI asks the intriguing question of how smart AI can get. I really want to know that too. But it is relatively unimportant for designing AI safety unless the ceiling is shockingly low; it is safer to assume it can be as smart as it wants to. Some AI safety schemes involve smart systems monitoring each other or performing very complex counterfactuals: these do hinge on an assumption of high intelligence (or whatever it takes to accurately model counterfactual worlds). But then the design criteria should be to assume that these things are hard to do well.

Under high uncertainty, assume Murphy’s law holds.

(But remember that good engineering and reasoning can bind Murphy – it is just that you cannot assume somebody else will do it for you.)

Ethics of brain emulations, New Scientist edition

Si elegansI have an opinion piece in New Scientist about the ethics of brain emulation. The content is similar to what I was talking about at IJCNN and in my academic paper (and the comic about it). Here are a few things that did not fit the text:

Ethics that got left out

Due to length constraints I had to cut the discussion about why animals might be moral patients. That made the essay look positively Benthamite in its focus on pain. In fact, I am agnostic on whether experience is necessary for being a moral patient. Here is the cut section:

Why should we care about how real animals are treated? Different philosophers have given different answers. Immanuel Kant did not think animals matter in themselves, but our behaviour towards them matters morally: a human who kicks a dog is cruel and should not do it. Jeremy Bentham famously argued that thinking does not matter, but the capacity to suffer: “…the question is not, Can they reason? nor, Can they talk? but, Can they suffer?” . Other philosophers have argued that it matters that animals experience being subjects of their own life, with desires and goals that make sense to them. While there is a fair bit of disagreement of what this means for our responsibilities to animals and what we may use them for, there is a widespread agreement that they are moral patients, something we ought to treat with some kind of care.

This is of course a super-quick condensation of a debate that fills bookshelves. It also leaves out Christine Korsgaard’s interesting Kantian work on animal rights, which as far as I can tell does not need to rely on particular accounts of consciousness and pain but rather interests. Most people would say that without consciousness or experience there is nobody that is harmed, but I am not entirely certain unconscious systems cannot be regarded as moral patients. There are for example people working in environmental ethics that ascribe moral patient-hood and partial rights to species or natural environments.

Big simulations: what are they good for?

Another interesting thing that had to be left out is comparisons of different large scale neural simulations.

(I am a bit uncertain about where the largest model in the Human Brain Project is right now; they are running more realistic models, so they will be smaller in terms of neurons. But they clearly have the ambition to best the others in the long run.)

Of course, one can argue which approach matters. Spaun is a model of cognition using low resolution neurons, while the slightly larger (in neurons) simulation from the Lansner lab was just a generic piece of cortex, showing some non-trivial alpha and gamma rhythms, and the even larger ones showing some interesting emergent behavior despite the lack of biological complexity in the neurons. Conversely, Cotterill’s CyberChild that I worry about in the opinion piece had just 21 neurons in each region but they formed a fairly complex network with many brain regions that in a sense is more meaningful as an organism than the near-disembodied problem-solver Spaun. Meanwhile SpiNNaker is running rings around the others in terms of speed, essentially running in real-time while the others have slowdowns by a factor of a thousand or worse.

The core of the matter is defining what one wants to achieve. Lots of neurons, biological realism, non-trivial emergent behavior, modelling a real neural system, purposeful (even conscious) behavior, useful technology, or scientific understanding? Brain emulation aims at getting purposeful, whole-organism behavior from running a very large, very complete biologically realistic simulation. Many robotics and AI people are happy without the biological realism and would prefer as small simulation as possible. Neuroscientists and cognitive scientists care about what they can learn and understand based on the simulations, rather than their completeness. They are all each pursuing something useful, but it is very different between the fields. As long as they remember that others are not pursuing the same aim they can get along.

What I hope: more honest uncertainty

What I hope happens is that computational neuroscientists think a bit about the issue of suffering (or moral patient-hood) in their simulations rather than slip into the comfortable “It is just a simulation, it cannot feel anything” mode of thinking by default.

It is easy to tell oneself that simulations do not matter because not only do we know how they work when we make them (giving us the illusion that we actually know everything there is to know about the system – obviously not true since we at least need to run them to see what happens), but institutionally it is easier to regard them as non-problems in terms of workload, conflicts and complexity (let’s not rock the boat at the planning meeting, right?) And once something is in the “does not matter morally” category it becomes painful to move it out of it – many will now be motivated to keep it there.

I rather have people keep an open mind about these systems. We do not understand experience. We do not understand consciousness. We do not understand brains and organisms as wholes, and there is much we do not understand about the parts either. We do not have agreement on moral patient-hood. Hence the rational thing to do, even when one is pretty committed to a particular view, is to be open to the possibility that it might be wrong. The rational response to this uncertainty is to get more information if possible, to hedge our bets, and try to avoid actions we might regret in the future.