# The capability caution principle and the principle of maximal awkwardness

Capability Caution Principle: There being no consensus, we should avoid strong assumptions regarding upper limits on future AI capabilities.

It is an important meta-principle in careful design to avoid assuming the most reassuring possibility and instead design based on the most awkward possibility.

When inventing a cryptosystem, do not assume that the adversary is stupid and has limited resources: try to make something that can withstand a computationally and intellectually superior adversary. When testing a new explosive, do not assume it will be weak – stand as far away as possible. When trying to improve AI safety, do not assume AI will be stupid or weak, or that whoever implements it will be sane.

Often we think that the conservative choice is the pessimistic choice where nothing works. This is because “not working” is usually the most awkward possibility when building something. If I plan a project I should ensure that I can handle unforeseen delays and that my original plans and pathways have to be scrapped and replaced with something else. But from a safety or social impact perspective the most awkward situation is if something succeeds radically, in the near future, and we have to deal with the consequences.

Assuming the principle of maximal awkwardness is a form of steelmanning and the least convenient possible world.

This is an approach based on potential loss rather than probability. Most AI history tells us that wild dreams rarely, if ever, come true. But were we to get very powerful AI tools tomorrow it is not too hard to foresee a lot of damage and disruption. Even if you do not think the risk is existential you can probably imagine that autonomous hedge funds smarter than human traders, automated engineering in the hands of anybody and scalable automated identity theft could mess up the world system rather strongly. The fact that it might be unlikely is not as important as that the damage would be unacceptable. It is often easy to think that in uncertain cases the burden of proof is on the other party, rather than on the side where a mistaken belief would be dangerous.

As FLI stated it the principle goes both ways: do not assume the limits are super-high either. Maybe there is a complexity scaling making problem-solving systems unable to handle more than 7 things in “working memory” at the same time, limiting how deep their insights could be. Maybe social manipulation is not a tractable task. But this mainly means we should not count on the super-smart AI as a solution to problems (e.g. using one smart system to monitor another smart system). It is not an argument to be complacent.

People often misunderstand uncertainty:

• Some think that uncertainty implies that non-action is reasonable, or at least action should wait till we know more. This is actually where the precautionary principle is sane: if there is a risk of something bad happening but you are not certain it will happen, you should still try to prevent it from happening or at least monitor what is going on.
• Obviously some uncertain risks are unlikely enough that they can be ignored by rational people, but you need to have good reasons to think that the risk is actually that unlikely – uncertainty alone does not help.
• Gaining more information sometimes reduces uncertainty in valuable ways, but the price of information can sometimes be too high, especially when there are intrinsically unknowable factors and noise clouding the situation.
• Looking at the mean or expected case can be a mistake if there is a long tail of relatively unlikely but terrible possibilities: on the average day your house does not have a fire, but having insurance, a fire alarm and a fire extinguisher is a rational response.
• Combinations of uncertain factors do not become less uncertain as they are combined (even if you describe them carefully and with scenarios): typically you get broader and heavier-tailed distributions, and should act on the tail risk.

FLI asks the intriguing question of how smart AI can get. I really want to know that too. But it is relatively unimportant for designing AI safety unless the ceiling is shockingly low; it is safer to assume it can be as smart as it wants to. Some AI safety schemes involve smart systems monitoring each other or performing very complex counterfactuals: these do hinge on an assumption of high intelligence (or whatever it takes to accurately model counterfactual worlds). But then the design criteria should be to assume that these things are hard to do well.

Under high uncertainty, assume Murphy’s law holds.

(But remember that good engineering and reasoning can bind Murphy – it is just that you cannot assume somebody else will do it for you.)

# AI, morality, ethics and metaethics

Next Sunday I will be debating AI ethics at Battle of Ideas. Here is a podcast where I talk AI, morality and ethics: https://soundcloud.com/institute-of-ideas/battle-cry-anders-sandberg-on-ethical-ai

# What distinguishes morals from ethics?

There is actually a shocking confusion about what the distinction between morals and ethics is. Differen.com says ethics is about rules of conduct produced by an external source while morals are an individual’s own principles of right and wrong. Grammarist.com says morals are principles on which one’s own judgement of right and wrong are based (abstract, subjective and personal), ethics are the principles of right conduct (practical, social and objective). Ian Welsh gives a soundbite: “morals are how you treat people you know.  Ethics are how you treat people you don’t know.” Paul Walker and Terry Lovat say ethics leans towards decisions based on individual character and subjective understanding of right and wrong, while morals is about widely shared communal or societal norms – here ethics is individual assessment of something being good or bad, while morality is inter-subjective community assessment.

Wikipedia distinguishes between ethics as a research field and the common human ability to think critically about moral values and direct actions appropriately, or a particular persons principles of values. Morality is the differentiation between things that are proper and improper, as well as a body of standards and principles in derived from a code of conduct in some philosophy, religion or culture… or derived from a standard a person believes to be universal.

Dictionary.com regards ethics as a system of moral principles, the rules of conduct recognized in some human environment, an individual’s moral principles (and the branch of philosophy). Morality is about conforming to the rules of right conduct, having moral quality or character, a doctrine or system of morals and a few other meanings. The Cambridge dictionary thinks ethics is the study of what is right or wrong, or the set of beliefs about it, while morality is a set of personal or social standards for good/bad behavior and character.

And so on.

I think most people try to include the distinction between shared systems of conduct and individual codes, and the distinction between things that are subjective, socially agreed on, and maybe objective. Plus that we all agree on that ethics is a philosophical research field.

# My take on it

I like to think of it as a AI issue. We have a policy function $\pi(s,a)$ that maps states and action pairs to a probability of acting that way; this is set using a value function $Q(s)$ where various states are assigned values. Morality in my sense is just the policy function and maybe the value function: they have been learned through interacting with the world in various ways.

Ethics in my sense is ways of selecting policies and values. We are able to not only change how we act but also how we evaluate things, and the information that does this change is not just reward signals that update value function directly, but also knowledge about the world, discoveries about ourselves, and interactions with others – in particular ideas that directly change the policy and value functions.

When I realize that lying rarely produces good outcomes (too much work) and hence reduce my lying, then I am doing ethics (similarly, I might be convinced about this by hearing others explain that lying is morally worse than I thought or convincing me about Kantian ethics). I might even learn that short-term pleasure is less valuable than other forms of pleasure, changing how I view sensory rewards.

Academic ethics is all about the kinds of reasons and patterns we should use to update our policies and values, trying to systematize them. It shades over into metaethics, which is trying to understand what ethics is really about (and what metaethics is about: it is its own meta-discipline, unlike metaphysics that has metametaphysics, which I think is its own meta-discipline).

I do not think I will resolve any confusion, but at least this is how I tend to use the terminology. Morals is how I act and evaluate, ethics is how I update how I act and evaluate, metaethics is how I try to think about my ethics.

# Energetics of the brain and AI

Lawrence Krauss is not worried about AI risk (ht to Luke Muelhauser); while much of his complacency is based on a particular view of the trustworthiness and level of common sense exhibited by possible future AI that is pretty impossible to criticise, he makes a particular claim:

First, let’s make one thing clear. Even with the exponential growth in computer storage and processing power over the past 40 years, thinking computers will require a digital architecture that bears little resemblance to current computers, nor are they likely to become competitive with consciousness in the near term. A simple physics thought experiment supports this claim:

Given current power consumption by electronic computers, a computer with the storage and processing capability of the human mind would require in excess of 10 Terawatts of power, within a factor of two of the current power consumption of all of humanity. However, the human brain uses about 10 watts of power. This means a mismatch of a factor of 1012, or a million million. Over the past decade the doubling time for Megaflops/watt has been about 3 years. Even assuming Moore’s Law continues unabated, this means it will take about 40 doubling times, or about 120 years, to reach a comparable power dissipation. Moreover, each doubling in efficiency requires a relatively radical change in technology, and it is extremely unlikely that 40 such doublings could be achieved without essentially changing the way computers compute.

This claim has several problems. First, there are few, if any, AI developers who think that we must stay with current architectures. Second, more importantly, the community concerned with superintelligence risk is generally agnostic about how soon smart AI could be developed: it doesn’t have to happen soon for us to have a tough problem in need of a solution, given how hard AI value alignment seems to be. And third, consciousness is likely irrelevant for instrumental intelligence; maybe the word is just used as a stand-in for some equally messy term like “mind”, “common sense” or “human intelligence”.

The interesting issue is however what energy requirements and computational power tells us about human and machine intelligence, and vice versa.

# Computer and brain emulation energy use

I have earlier on this blog looked at the energy requirements of the Singularity. To sum up, current computers are energy hogs requiring 2.5 TW of power globally, with an average cost around 25 nJ per operation. More efficient processors are certainly possible (a lot of the current ones are old and suboptimal). For example, current GPUs consume about a hundred Watts and have $10^{10}$ transistors, reaching performance in the 100 Gflops range, one nJ per flop. Koomey’s law states that the energy cost per operation halves every 1.57 years (not 3 years as Krauss says). So far the growth of computing capacity has grown at about the same pace as energy efficiency, making the two trends cancel each other. In the end, Landauer’s principle gives a lower bound of $kT\ln(2)$ J per irreversible operation; one can circumvent this by using reversible or quantum computation, but there are costs to error correction – unless we use extremely slow and cold systems in the current era computation will be energy-intensive.

I am not sure what brain model Krauss bases his estimate on, but 10 TW/25 nJ = $4\cdot 10^{20}$ operations per second (using slightly more efficient GPUs ups it to $10^{22}$ flops). Looking at the estimates of brain computational capacity in appendix A of my old roadmap, this is higher than most. The only estimate that seem to be in the same ballpark is (Thagard 2002), which argues that the number of computational elements in the brain are far greater than the number of neurons (possibly even individual protein molecules). This is a fairly strong claim, to say the least. Especially since current GPUs can do a somewhat credible job of end-to-end speech recognition and transcription: while that corresponds to a small part of a brain, it is hardly $10^{-11}$ of a brain.

Generally, assuming a certain number of operations per second in a brain and then calculating an energy cost will give you any answer you want. There are people who argue that what really matters is the tiny conscious bandwidth (maybe 40 bits/s or less) and that over a lifetime we may only learn a gigabit. I used $10^{22}$ to $10^{25}$ flops just to be on the safe side in one post. AIimpacts.org has collected several estimates, getting the median estimate $10^{18}$. They have also argued in favor of using TEPS (traversed edges per second) rather than flops, suggesting around $10^{14}$ TEPS for a human brain – a level that is soon within reach of some systems.

(Lots of apples-to-oranges comparisions here, of course. A single processor operation may or may not correspond to a floating point operation, let alone to what a GPU does or a TEPS. But we are in the land of order-of-magnitude estimates.)

# Brain energy use

We can turn things around: what does the energy use of human brains tell us about their computational capacity?

Ralph Merkle calculated back in 1989 that given 10 Watts of usable energy per human brain, and that the cost of each jump past a node of Ranvier cost $5\cdot 10^{-15}$ J, producing $2\cdot 10^{15}$ such operations. He estimated this was about equal to the number of synaptic operations, ending up with $10^{13}$$10^{16}$ operations per second.

A calculation I overheard at a seminar by Karlheinz Meier argued the brain uses 20 W power, has 100 billion neurons firing per second, uses $10^{-10}$ J per action potential, plus it has $10^{15}$ synapses receiving signals at about 1 Hz, and uses $10^{-14}$ J per synaptic transmission. One can also do it from the bottom to the top: there are $10^9$ ATP molecules per action potential, $10^5$ are needed for synaptic transmission. $10^{-19}$ J per ATP gives $10^{-10}$ J per action potential and $10^{-14}$ J per synaptic transmission. Both these converge on the same rough numbers, used to argue that we need much better hardware scaling if we ever want to get to this level of detail.

Digging deeper into neural energetics, maintaining resting potentials in neurons and glia account for 28% and 10% of the total brain metabolic cost, respectively, while the actual spiking activity is about 13% and transmitter release/recycling plus calcium movement is about 1%. Note how this is not too far from the equipartition in Meier’s estimate. Looking at total brain metabolism this constrains the neural firing rate: more than 3.1 spikes per second per neuron would consume more energy than the brain normally consumes (and this is likely an optimistic estimate). The brain simply cannot afford firing more than 1% of neurons at the same time, so it likely relies on rather sparse representations.

Unmyelinated axons require about 5 nJ/cm to transmit action potentials. In general, the brain gets around it through some current optimization, myelinisation (which also speeds up transmission at the price of increased error rate), and likely many clever coding strategies. Biology is clearly strongly energy constrained. In addition, cooling 20 W through a bloodflow of 750-1000 ml/min is relatively tight given that the arterial blood is already at body temperature.

20 W divided by $1.3\cdot 10^{-21}$ J (the Landauer limit at body temperature) suggests a limit of no more than $1.6\cdot 10^{22}$ irreversible operations per second. While a huge number, it is just a few orders higher than many of the estimates we have been juggling so far. If we say these operations are distributed across 100 billion neurons (which is at least within an order of magnitude of the real number) we get 160 billion operations per second per neuron; if we instead treat synapses (about 8000 per neuron) as the loci we get 20 million operations per second per synapse.

Running the full Hodgkin-Huxley neural model at 1 ms resolution requires about 1200 flops, or 1.2 million flops per second of simulation. If we treat a synapse as a compartment (very reasonable IMHO) that is just 16.6 times the Landauer limit: if the neural simulation had multiple digit precision and erased a few of them per operation we would bump into the Landauer limit straight away. Synapses are actually fairly computationally efficient! At least at body temperature: cryogenically cooled computers could of course do way better. And as Izikievich, the originator of the 1200 flops estimate, loves to point out, his model requires just 13 flops: maybe we do not need to model the ion currents like HH to get the right behavior, and can suddenly shave off two orders of magnitude.

# Information dissipation in neural networks

Just how much information is lost in neural processing?

A brain is a dynamical system changing internal state in a complicated way (let us ignore sensory inputs for the time being). If we start in a state somewhere within some predefined volume of state-space, over time the state will move to other states – and the initial uncertainty will grow. Eventually the possible volume we can find the state in will have doubled, and we will have lost one bit of information.

Things are a bit more complicated, since the dynamics can contract along some dimensions and diverge along others: this is described by the Lyapunov exponents. If the trajectory has exponent $\lambda$ in some direction nearby trajectories diverge like $|x_a(t)-x_b(t)| \propto |x_a(0)-x_b(0)| e^{\lambda t}$ in that direction. In a dissipative dynamical system the sum of the exponents is negative: in total, trajectories move towards some attractor set. However, if at least one of the exponents is positive, then this can be a strange attractor that the trajectories endlessly approach, yet they locally diverge from each other and gradually mix. So if you can only measure with a fixed precision at some point in time, you can not certainly tell where the trajectory was before (because of the contraction due to negative exponents has thrown away starting location information), nor exactly where it will be on the attractor in the future (because the positive exponents are amplifying your current uncertainty).

A measure of the information loss is the Kolmogorov-Sinai entropy, which is bounded by $K \leq \sum_{\lambda_i>0} \lambda_i$, the positive Lyapunov exponents (equality holds for Axiom A attractors). So if we calculate the KS-entropy of a neural system, we can estimate how much information is being thrown away per unit of time.

Monteforte and Wolf looked at one simple neural model, the theta-neuron (presentation). They found a KS-entropy of roughly 1 bit per neuron and spike over a fairly large range of parameters. Given the above estimates of about one spike per second per neuron, this gives us an overall information loss of $10^{11}$ bits/s in the brain, which is $1.3\cdot 10^{-10}$ W at the Landauer limit – by this account, we are some 11 orders of magnitude away from thermodynamic perfection. In this picture we should regard each action potential corresponding to roughly one irreversible yes/no decision: a not too unreasonable claim.

I begun to try to estimate the entropy and Lyapunov exponents of the Izikievich network to check for myself, but decided to leave this for another post. The reason is that calculating the Lyapunov exponents from time series is a pretty delicate thing, especially when there is noise. And the KS-dimension is even more noise-sensitive. In research on EEG data (where people have looked at the dimension of chaotic attractors and their entropies to distinguish different mental states and epilepsy) an approximate entropy measure is used instead.

It is worth noticing that one can look at cognition as a system with a large-scale dynamics that has one entropy (corresponding to shifting between different high-level mental states) and microscale dynamics with different entropy (corresponding to the neural information processing). It is a safe bet that the biggest entropy costs are on the microscale (fast, numerous simple states) than the macroscale (slow, few but complex states).

# Energy of AI

Where does this leave us in regards to the energy requirements of artificial intelligence?

Assuming the same amount of energy is needed for a human and machine to do a cognitive task is a mistake.

First, as the Izikievich neuron demonstrates, it might be that judicious abstraction easily saves two orders of magnitude of computation/energy.

Special purpose hardware can also save one or two orders of magnitude; using general purpose processors for fixed computations is very inefficient. This is of course why GPUs are so useful for many things: in many cases you just want to perform the same action on many pieces of data rather than different actions on the same piece.

But more importantly, on what level the task is implemented matters. Sorting or summing a list of a thousand elements is a fast computer operation that can be done in memory, but a hour-long task for a human: because of our mental architecture we need to represent the information in a far more redundant and slow way, not to mention perform individual actions on the seconds time-scale. A computer sort uses a tight representation more like our low-level neural circuitry. I have no doubt one could string together biological neurons to perform a sort or sum operation quickly, but cognition happens on a higher, more general level of the system (intriguing speculations about idiot savants aside).

While we have reason to admire brains, they are also unable to perform certain very useful computations. In artificial neural networks we often employ non-local matrix operations like inversion to calculate optimal weights: these computations are not possible to perform locally in a distributed manner. Gradient descent algorithms such as backpropagation are unrealistic in a biological sense, but clearly very successful in deep learning. There is no shortage of papers describing various clever approximations that would allow a more biologically realistic system to perform similar operations – in fact, the brains may well be doing it – but artificial systems can perform them directly, and by using low-level hardware intended for it, very efficiently.

When a deep learning system learns object recognition in an afternoon it beats a human baby by many months. When it learns to do analogies from 1.6 billion text snippets it beats human children by years. Yes, these are small domains, yet they are domains that are very important for humans and would presumably develop as quickly as possible in us.

Biology has many advantages in robustness and versatility, not to mention energy efficiency. But it is also fundamentally limited by what can be built out of cells with a particular kind of metabolism, that organisms need to build themselves from the inside, and the need of solving problems that exist in a particular biospheric environment.

# Conclusion

Unless one thinks the human way of thinking is the most optimal or most easily implementable way, we should expect de novo AI to make use of different, potentially very compressed and fast, processes. (Brain emulation makes sense if one either cannot figure out how else to do AI, or one wants to copy extant brains for their properties.) Hence, the costs of brain computation is merely a proof of existence that there are systems that effective – the same mental tasks could well be done by far less or far more efficient systems.

In the end, we may try to estimate fundamental energy costs of cognition to bound AI energy use. If human-like cognition takes a certain number of bit erasures per second, we would get some bound using Landauer (ignoring reversible computing, of course). But as the above discussion has showed, it may be that the actual computational cost needed is just some of the higher level representations rather than billions of neural firings: until we actually understand intelligence we cannot say. And by that point the question is moot anyway.

Many people have the intuition that the cautious approach is always to state “thing’s won’t work”. But this mixes up cautious with conservative (or even reactionary). A better cautious approach is to recognize that “things may work”, and then start checking the possible consequences. If we want a reassuring constraint on why certain things cannot happen it need to be tighter than energy estimates.

# Annoyed by annoyed AI: can we fight AI hype?

Recently the Wall Street Journal reported that an AI got testy with its programmer when he asked about ethics. This is based on a neat paper by Vinyals and Le on training recurrent networks to do conversations based on past example conversations, essentially making neural network chatbots. I posted about that particular dialogue myself to the office list, joking about AI not digging ethics.

However, others got annoyed by the annoyed AI. Mbeckman on Slashdot posted WSJ Overstates the Case Of the Testy A.I. and George Dvorsky on Io9 followed up with No, an AI Did Not Just “Lash Out” at Its Human Programmer.

Mbeckman wrote: “AI hype to the public has gotten progressively more strident in recent years, misleading lay people into believing researchers are much further along than they really are — by orders of magnitude. I’d love to see legitimate A.I. researchers condemn this kind of hucksterism.

Of course, the question is what the hucksterism actually is, who is doing it, and what to do about it.

## Who are the hucksters?

A huckster is “One who uses aggressive, showy, and sometimes devious methods to promote or sell a product”.

### The paper?

In this case the paper is not doing anything obviously hucksterish – it has several dialogue examples where the software responds to VPN or browser issues, general knowledge Q&A, chatting and opinions. That the system thinks Cleopatra is very regal or that it manages to instruct the user into doing an apt-update for a browser is neat, but not terribly impressive. Many of the dialogues are clearly Eliza-level associations with little actual thinking, but amusing random combinations:

Human:what is the purpose of life ?
Machine:to serve the greater good .
Human:what is the purpose of living ?
Machine:to live forever .
Human:what is the purpose of existence ?
Machine: to find out what happens when we get to the planet earth .
Human:where are you now ?
Machine:i ’m in the middle of nowhere .
Human:what is the purpose of dying ?
Machine:to have a life .
Human:what is the purpose of being intelligent?
Machine:to find out what it is .
Human:what is the purpose of emotions ?
Machine:i don ’t know

If this is hucksterism, then nearly any computer science paper is hucksterism. After all, they tend to show the successful runs of software and generally overestimate the utility of the algorithm or method.

### Wall Street Journal?

Mbeckman probably felt that the WSJ was more guilty. After all, the title and opening suggest there is some kind of attitude going on. But there is actually rather little editorializing: rather, a somewhat bland overview of machine learning with an amusing dialogue example thrown in. It could have been Eliza instead, and the article would have made sense too (“AI understands programmer’s family problems”). There is an element of calculation here: AI is hot, and the dialogue can be used as a hook to make a story that both mentions real stuff and provides a bit of entertainment. But again, this is not so much aggressive promotion of a product/idea as opportunitistic promotion.

### Media in general?

I suspect that the real target of Mbeckman’s wrath is the unnamed sources of AI hype. There is no question that AI is getting hyped these days. Big investments by major corporations, sponsored content demystifying it, Business Insider talking about how to invest into it, corporate claims of breakthroughs that turn out to be mistakes/cheating, invitations to governments to join the bandwagon, the whole discussion about AI safety where people quote and argue about Hawking’s and Musk’s warnings (rather than going to the sources reviewing the main thinking), and of course a bundle of films. The nature of hype is that it is promotion, especially based on exaggerated claims. This is of course where the hucksterism accusation actually bites.

## Hype: it is everybody’s fault

But while many of the agents involved do exaggerate their own products, hype is also a social phenomenon. In many ways it is similar to an investment bubble. Some triggers occur (real technology breakthroughs, bold claims, a good story) and media attention flows to the field. People start investing in the field, not just with money, but with attention, opinion and other contributions. This leads to more attention, and the cycle feeds itself. Like an investment bubble overconfidence is rewarded (you get more attention and investment) while sceptics do not gain anything (of course, you can participate as a sharp-tounged sceptic: everybody loves to claim they listen to critical voices! But now you are just as much part of the hype as the promoters). Finally the bubble bursts, fashion shifts, or attention just wanes and goes somewhere else. Years later, whatever it was may reach the plateau of productivity.

The problem with this image is that it is everybody’s fault. Sure, tech gurus are promoting their things, but nobody is forced to naively believe them. Many of the detractors are feeding the hype by feeding it attention. There is ample historical evidence: I assume the Dutch tulip bubble is covered in Economics 101 everywhere, and AI has a history of terribly destructive hype bubbles… yet few if any learn from it (because this time it is different, because of reasons!)

### Fundamentals

In the case of AI, I do think there have been real changes that give good reason to expect big things. Since the 90s when I was learning the field computing power and sizes of training data have expanded enormously, making methods that looked like dead ends back them actually blossom. There has also been conceptual improvements in machine learning, among other things killing off neural networks as a separate field (we bio-oriented researchers reinvented ourselves as systems biologists, while the others just went with statistical machine learning). Plus surprise innovations that have led to a cascade of interest – the kind of internal innovation hype that actually does produce loads of useful ideas. The fact that papers and methods that surprise experts in the field are arriving at a brisk pace is evidence of progress. So in a sense, the AI hype has been triggered by something real.

I also think that the concerns about AI that float around have been triggered by some real insights. There was minuscule AI safety work done before the late 1990s inside AI; most was about robots not squishing people. The investigations of amateurs and academics did bring up some worrying concepts and problems, at first at the distal “what if we succeed?” end and later also when investigating the more proximal impact of cognitive computing on society through drones, autonomous devices, smart infrastructures, automated jobs and so on. So again, I think the “anti-AI hype” has also been triggered by real things.

### Copy rather than check

But once the hype cycle starts, just like in finance, fundamentals matter less and less. This of course means that views and decisions become based on copying others rather than truth-seeking. And idea-copying is subject to all sorts of biases: we notice things that fit with earlier ideas we have held, we give weight to easily available images (such as frequently mentioned scenarios) and emotionally salient things, detail and nuance are easily lost when a message is copied, and so on.

### Science fact

This feeds into the science fact problem: to a non-expert, it is hard to tell what the actual state of art is. The sheer amount of information, together with multiple contradictory opinions, makes it tough to know what is actually true. Just try figuring out what kind of fat is good for your heart (if any). There is so much reporting on the issue, that you can easily find support for any side, and evaluating the quality of the support requires expert knowledge. But even figuring out who is an expert in a contested big field can be hard.

In the case of AI, it is also very hard to tell what will be possible or not. Expert predictions are not that great, nor different from amateur predictions. Experts certainly know what can be done today, but given the number of surprises we are seeing this might not tell us much. Many issues are also interdisciplinary, making even confident and reasoned predictions by a domain expert problematic since factors they know little about also matters (consider the the environmental debates between ecologists and economists – both have half of the puzzle, but often do not understand that the other half is needed).

### Bubble inflation forces

Different factors can make hype more or less intense. During summer “silly season” newspapers copy entertaining stories from each other (some stories become perennial, like the “BT soul-catcher chip” story that emerged in 1996 and is still making its rounds). Here easy copying and lax fact checking boost the effect. During a period with easy credit financial and technological bubbles become more intense. I suspect that what is feeding the current AI hype bubble is a combination of the usual technofinancial drivers (we may be having dotcom 2.0, as some think), but also cultural concerns with employment in a society that is automating, outsourcing, globalizing and disintermediating rapidly, plus very active concerns with surveillance, power and inequality. AI is in a sense a natural lightening rod for these concerns, and they help motivate interest and hence hype.

## So here we are.

AI professionals are annoyed because the public fears stuff that is entirely imaginary, and might invoke the dreaded powers of legislators or at least threaten reputation, research grants and investment money. At the same time, if they do not play up the coolness of their ideas they will not be noticed. AI safety people are annoyed because the rather subtle arguments they are trying to explain to the AI professionals get wildly distorted into “Genius Scientists Say We are Going to be Killed by the TERMINATOR!!!” and the AI professionals get annoyed and refuse to listen. Yet the journalists are eagerly asking for comments, and sometimes they get things right, so it is tempting to respond. The public are annoyed because they don’t get the toys they are promised, and it simultaneously looks like Bad Things are being invented for no good reason. But of course they will forward that robot wedding story. The journalists are annoyed because they actually do not want to feed hype. And so on.

What should we do? “Don’t feed the trolls” only works when the trolls are identifiable and avoidable. Being a bit more cautious, critical and quiet is not bad: the world is full of overconfident hucksters, and learning to recognize and ignore them is a good personal habit we should appreciate. But it only helps society if most people avoid feeding the hype cycle: a bit like the unilateralist’s curse, nearly everybody need to be rational and quiet to starve the bubble. And since there are prime incentives for hucksterism in industry, academia and punditry that will go to those willing to do it, we can expect hucksters to show up anyway.

The marketplace of ideas could do with some consumer reporting. We can try to build institutions to counter problems: good ratings agencies can tell us whether something is overvalued, maybe a federal robotics commission can give good overviews of the actual state of the art. Reputation systems, science blogging marking what is peer reviewed, various forms of fact-checking institutions can help improve epistemic standards a bit.

AI safety people could of course pipe down and just tell AI professionals about their concerns, keeping the public out of it by doing it all in a formal academic/technical way. But a pure technocratic approach will likely bite us in the end, since (1) incentives to ignore long term safety issues with no public/institutional support exist, and (2) the public gets rather angry when it finds that “the experts” have been talking about important things behind their back. It is better to try to be honest and try to say the highest-priority true things as clearly as possible to the people who need to hear it, or ask.

AI professionals should recognize that they are sitting on a hype-generating field, and past disasters give much reason for caution. Insofar they regard themselves as professionals, belonging to a skilled social community that actually has obligations towards society, they should try to manage expectations. It is tough, especially since the field is by no means as unified professionally as (say) lawyers and doctors. They should also recognize that their domain knowledge both obliges them to speak up against stupid claims (just like Mbeckman urged), but that there are limits to what they know: talking about the future or complex socioecotechnological problems requires help from other kinds of expertise.

And people who do not regard themselves as either? I think training our critical thinking and intellectual connoisseurship might be the best we can do. Some of that is individual work, some of it comes from actual education, some of it from supporting better epistemic institutions – have you edited Wikipedia this week? What about pointing friends towards good media sources?

In the end, I think the AI system got it right: “What is the purpose of being intelligent? To find out what it is”. We need to become better at finding out what is, and only then can we become good at finding out what intelligence is.

# Don’t be evil and make things better

I am one of the signatories of an open letter calling for a stronger aim at socially beneficial artificial intelligence.

It might seem odd to call for something like that: who in their right mind would not want AI to be beneficial? But when we look at the field (and indeed, many other research fields) the focus has traditionally been on making AI more capable. Besides some pure research interest and no doubt some “let’s create life”-ambition, the bulk of motivation has been to make systems that do something useful (or push in the direction of something useful).

“Useful” is normally defined in term of performing some task – translation, driving, giving medical advice – rather than having a good impact on the world. Better done tasks are typically good locally – people get translations more cheaply, things get transported, advice may be better – but have more complex knock-on effects: fewer translators, drivers or doctors needed, or that their jobs get transformed, plus potential risks from easy (but possibly faulty) translation, accidents and misuse of autonomous vehicles, or changes in liability. Way messier. Even if the overall impact is great, socially disruptive technologies that appear surprisingly fast can cause trouble, emergent misbehaviour and bad design choices can lead to devices that amplify risk (consider high frequency trading, badly used risk models, or anything that empowers crazy people). Some technologies may also lend themselves to centralizing power (surveillance, autonomous weapons) but reduce accountability (learning algorithms internalizing discriminatory assumptions in an opaque way). These considerations should of course be part of any responsible engineering and deployment, even if handling them is by no means solely the job of the scientist or programmer. Doing it right will require far more help from other disciplines.

The most serious risks come from the very smart systems that may emerge further into the future: they either amplify human ability in profound ways, or they are autonomous  themselves. In both cases they make achieving goals easier, but do not have any constraints on what goals are sane, moral or beneficial. Solving the problem of how to keep such systems safe is a hard problem we ought to start on early. One of the main reasons for the letter is that so little effort has gone into better ways of controlling complex, adaptive and possibly self-improving technological systems. It makes sense even if one doesn’t worry about superintelligence or existential risk.

This is why we have to change some research priorities. In many cases it is just putting problems on the agenda as useful to work on: they are important but understudied, and a bit of investigation will likely go a long way. In some cases it is more a matter of signalling that different communities need to talk more to each other. And in some instances we really need to have our act together before big shifts occur – if unemployment soars to 50%, engineering design-ahead enables big jumps in tech capability, brains get emulated, or systems start self-improving we will not have time to carefully develop smart policies.

My experience with talking to the community is that there is not a big split between AI and AI safety practitioners: they roughly want the same thing. There might be a bigger gap between the people working on the theoretical, far out issues and the people working on the applied here-and-now stuff. I suspect they can both learn from each other. More research is, of course, needed.

# More robots, and how to take over the world with guaranteed minimum income

I was just watching “Humans Need Not apply” by CGPGrey,

when I noticed a tweet from Wendy Grossman, who I participated with in a radio panel about robotics (earlier notes on the discussion). She has some good points inspired by our conversation in her post, robots without software.

I think she has a key observation: much of the problem lies in the interaction between the automation and humans. On the human side, that means getting the right information and feedback into the machine side. From the machine side, it means figuring out what humans – those opaque and messy entities who change behaviour for internal reasons – want. At the point where the second demand is somehow resolved we will not only have really useful automation, but also essentially a way of resolving AI safety/ethics. But before that, we will have a situation of only partial understanding , and plenty of areas where either side will not be able to mesh well. Which either forces humans to adapt to machines, or machines to get humans to think that what they really wanted was what they got served. That is risky.

## Global GMI stability issues

Incidentally, I have noted that many people hearing the current version of the machines will take our jobs story bring up the idea of a guaranteed minimum income as a remedy. If nobody has a job but there is a GMI we can still live a good life (especially since automation would make most things rather cheap). This idea has a long history, and Hans Moravec suggested it in his book Robot (1998) in regard to a future where AI-run corporations would be running the economy. It can be appealing even from a libertarian standpoint since it does away with a lot of welfare and tax bureaucracy (even Hayek might have been a fan).

I’m not enough of an economist to analyse it properly, but I suspect the real problem is stability when countries compete on tax: if Foobonia has a lower corporate tax rate than Baristan and the Democratic Republic of Baaz, then companies will move there – still making money by selling stuff to people in Baristan and Baaz. The more companies there are in Foobonia, the less taxes are needed to keep the citizens wealthy. In fact, as I mentioned in my earlier post, having fewer citizens might make the remaining more well off (things like this have happened on a smaller scale). The ideal situation would be to have the lowest taxes in the world and just one citizen. Or none, so the AI parliament can use the entire budget to improve the future prosperity and safety of Foobonia.

In our current world tax competition is only one factor determining where companies go. Not every company moves to Bahamas, Chile, Estonia or the UAE. One factor is other legal issues and logistics, but a big part is that you need to have people actually working in your company. Human capital is distributed very unevenly, and it is rarely where you want it (and the humans often do not want to move, for social reasons). But in an automated world machine capital will exist wherever you buy it so it can be placed where the taxes are cheaper. There will be a need to perform some services and transport goods in other areas, but unless they are taxed (hence driving up the price for your citizens) this is going to be a weaker constraint than now. How much weaker, I do not know – it would be interesting to see it investigated properly.

The core problem remains that if humans are largely living off the rents from a burgeoning economy there better exist stabilizing safeguards so these rents remain, and stabilizers that keep the safeguards stable. This is a non-trivial legal/economical problem, especially since one failure mode might be that some countries become zero citizen countries with huge economic growth and gradually accumulating investments everywhere (a kind of robotic Piketty situation, where everything in the end ends up owned by the AI consortium/sovereign wealth fund with the strongest growth). In short, it seems to require something just as tricky to develop as the friendly superintelligence program.

In any case, I suspect much of the reason people suggest GMI is that it is an already existing idea and not too strange. Hence it is thinkable and proposable. But there might be far better ideas out there for how to handle a world with powerful automation. One should not just stick with a local optimum idea when there might be way more stable and useful ideas further out.