# The capability caution principle and the principle of maximal awkwardness

Capability Caution Principle: There being no consensus, we should avoid strong assumptions regarding upper limits on future AI capabilities.

It is an important meta-principle in careful design to avoid assuming the most reassuring possibility and instead design based on the most awkward possibility.

When inventing a cryptosystem, do not assume that the adversary is stupid and has limited resources: try to make something that can withstand a computationally and intellectually superior adversary. When testing a new explosive, do not assume it will be weak – stand as far away as possible. When trying to improve AI safety, do not assume AI will be stupid or weak, or that whoever implements it will be sane.

Often we think that the conservative choice is the pessimistic choice where nothing works. This is because “not working” is usually the most awkward possibility when building something. If I plan a project I should ensure that I can handle unforeseen delays and that my original plans and pathways have to be scrapped and replaced with something else. But from a safety or social impact perspective the most awkward situation is if something succeeds radically, in the near future, and we have to deal with the consequences.

Assuming the principle of maximal awkwardness is a form of steelmanning and the least convenient possible world.

This is an approach based on potential loss rather than probability. Most AI history tells us that wild dreams rarely, if ever, come true. But were we to get very powerful AI tools tomorrow it is not too hard to foresee a lot of damage and disruption. Even if you do not think the risk is existential you can probably imagine that autonomous hedge funds smarter than human traders, automated engineering in the hands of anybody and scalable automated identity theft could mess up the world system rather strongly. The fact that it might be unlikely is not as important as that the damage would be unacceptable. It is often easy to think that in uncertain cases the burden of proof is on the other party, rather than on the side where a mistaken belief would be dangerous.

As FLI stated it the principle goes both ways: do not assume the limits are super-high either. Maybe there is a complexity scaling making problem-solving systems unable to handle more than 7 things in “working memory” at the same time, limiting how deep their insights could be. Maybe social manipulation is not a tractable task. But this mainly means we should not count on the super-smart AI as a solution to problems (e.g. using one smart system to monitor another smart system). It is not an argument to be complacent.

People often misunderstand uncertainty:

• Some think that uncertainty implies that non-action is reasonable, or at least action should wait till we know more. This is actually where the precautionary principle is sane: if there is a risk of something bad happening but you are not certain it will happen, you should still try to prevent it from happening or at least monitor what is going on.
• Obviously some uncertain risks are unlikely enough that they can be ignored by rational people, but you need to have good reasons to think that the risk is actually that unlikely – uncertainty alone does not help.
• Gaining more information sometimes reduces uncertainty in valuable ways, but the price of information can sometimes be too high, especially when there are intrinsically unknowable factors and noise clouding the situation.
• Looking at the mean or expected case can be a mistake if there is a long tail of relatively unlikely but terrible possibilities: on the average day your house does not have a fire, but having insurance, a fire alarm and a fire extinguisher is a rational response.
• Combinations of uncertain factors do not become less uncertain as they are combined (even if you describe them carefully and with scenarios): typically you get broader and heavier-tailed distributions, and should act on the tail risk.

FLI asks the intriguing question of how smart AI can get. I really want to know that too. But it is relatively unimportant for designing AI safety unless the ceiling is shockingly low; it is safer to assume it can be as smart as it wants to. Some AI safety schemes involve smart systems monitoring each other or performing very complex counterfactuals: these do hinge on an assumption of high intelligence (or whatever it takes to accurately model counterfactual worlds). But then the design criteria should be to assume that these things are hard to do well.

Under high uncertainty, assume Murphy’s law holds.

(But remember that good engineering and reasoning can bind Murphy – it is just that you cannot assume somebody else will do it for you.)

# AI, morality, ethics and metaethics

Next Sunday I will be debating AI ethics at Battle of Ideas. Here is a podcast where I talk AI, morality and ethics: https://soundcloud.com/institute-of-ideas/battle-cry-anders-sandberg-on-ethical-ai

# What distinguishes morals from ethics?

There is actually a shocking confusion about what the distinction between morals and ethics is. Differen.com says ethics is about rules of conduct produced by an external source while morals are an individual’s own principles of right and wrong. Grammarist.com says morals are principles on which one’s own judgement of right and wrong are based (abstract, subjective and personal), ethics are the principles of right conduct (practical, social and objective). Ian Welsh gives a soundbite: “morals are how you treat people you know.  Ethics are how you treat people you don’t know.” Paul Walker and Terry Lovat say ethics leans towards decisions based on individual character and subjective understanding of right and wrong, while morals is about widely shared communal or societal norms – here ethics is individual assessment of something being good or bad, while morality is inter-subjective community assessment.

Wikipedia distinguishes between ethics as a research field and the common human ability to think critically about moral values and direct actions appropriately, or a particular persons principles of values. Morality is the differentiation between things that are proper and improper, as well as a body of standards and principles in derived from a code of conduct in some philosophy, religion or culture… or derived from a standard a person believes to be universal.

Dictionary.com regards ethics as a system of moral principles, the rules of conduct recognized in some human environment, an individual’s moral principles (and the branch of philosophy). Morality is about conforming to the rules of right conduct, having moral quality or character, a doctrine or system of morals and a few other meanings. The Cambridge dictionary thinks ethics is the study of what is right or wrong, or the set of beliefs about it, while morality is a set of personal or social standards for good/bad behavior and character.

And so on.

I think most people try to include the distinction between shared systems of conduct and individual codes, and the distinction between things that are subjective, socially agreed on, and maybe objective. Plus that we all agree on that ethics is a philosophical research field.

# My take on it

I like to think of it as a AI issue. We have a policy function $\pi(s,a)$ that maps states and action pairs to a probability of acting that way; this is set using a value function $Q(s)$ where various states are assigned values. Morality in my sense is just the policy function and maybe the value function: they have been learned through interacting with the world in various ways.

Ethics in my sense is ways of selecting policies and values. We are able to not only change how we act but also how we evaluate things, and the information that does this change is not just reward signals that update value function directly, but also knowledge about the world, discoveries about ourselves, and interactions with others – in particular ideas that directly change the policy and value functions.

When I realize that lying rarely produces good outcomes (too much work) and hence reduce my lying, then I am doing ethics (similarly, I might be convinced about this by hearing others explain that lying is morally worse than I thought or convincing me about Kantian ethics). I might even learn that short-term pleasure is less valuable than other forms of pleasure, changing how I view sensory rewards.

Academic ethics is all about the kinds of reasons and patterns we should use to update our policies and values, trying to systematize them. It shades over into metaethics, which is trying to understand what ethics is really about (and what metaethics is about: it is its own meta-discipline, unlike metaphysics that has metametaphysics, which I think is its own meta-discipline).

I do not think I will resolve any confusion, but at least this is how I tend to use the terminology. Morals is how I act and evaluate, ethics is how I update how I act and evaluate, metaethics is how I try to think about my ethics.

# What makes a watchable watchlist?

Stefan Heck managed to troll a lot of people into googling “how to join ISIS”. Very amusing, and now a lot of people think they are on a NSA watchlist.

This kind of prank is of course by why naive keyword-based watch lists are total failures. One prank and it gets overloaded. I would be shocked if any serious intelligence agency actually used them for real. Given that people’s Facebook likes give pretty good predictions of who they are (indeed, better than many friends know them) there are better methods if you happen to be a big intelligence agency.

Still, while text and other online behavior signal a lot about a person, it might not be a great tool for making proper watchlists since there is a lot of noise. For example, this paper extracts personality dimensions from online texts and looks at civilian mass murderers. They state:

Using this ranking procedure, it was found that all of the murderers’ texts were located within the highest ranked 33 places. It means that using only two simple measures for screening these texts, we can reduce the size of the population under inquiry to 0.013% of its original size, in order to manually identify all of the murderers’ texts.

At first, this sounds great. But for the US, that means the watchlist for being a mass murderer would currently have 41,000 entries. Given that over the past 150 years there has been about 150 mass murders in the US, this suggests that the precision is not going to be that great – most of those people are just normal people. The base rate problem crops up again and again when trying to find rare, scary people.

The deep problem is that there is not enough positive data points (the above paper used seven people) to make a reliable algorithm. The same issue cropped up with NSA’s SKYNET program – they also had seven positive examples and hundreds of thousands of negatives, and hence had massive overfitting (suggesting the Islamabad Al Jazeera bureau chief was a prime Al Qaeda suspect).

## Rational watchlists

The rare positive data point problem strikes any method, no matter what it is based on. Yes, looking at the social network around people might give useful information, but if you only have a few examples of bad people the system will now pick up on networks like the ones they had. This is also true for human learning: if you look too much for people like the ones that in the past committed attacks, you will focus too much on people like them and not enemies that look different. I was told by an anti-terrorism expert about a particular sign for veterans of Afghan guerrilla warfare: great if and only if such veterans are the enemy, but rather useless if the enemy can recruit others. Even if such veterans are a sizable fraction of the enemy the base rate problem may make you spend your resources on innocent “noise” veterans if the enemy is a small group. Add confirmation bias, and trouble will follow.

Note that actually looking for a small set of people on the watchlist gets around the positive data point problem: the system can look for them and just them, and this can be made precise. The problem is not watching, but predicting who else should be watched.

The point of a watchlist is that it represents a subset of something (whether people or stocks) that merits closer scrutiny. It should essentially be an allocation of attention towards items that need higher level analysis or decision-making. The U.S. Government’s Consolidated Terrorist Watch List requires nomination from various agencies, who presumably decide based on reasonable criteria (modulo confirmation bias and mistakes). The key problem is that attention is a limited resource, so adding extra items has a cost: less attention can be spent on the rest.

This is why automatic watchlist generation is likely to be a bad idea, despite much research. Mining intelligence to help an analyst figure out if somebody might fit a profile or merit further scrutiny is likely more doable. As long as analyst time is expensive it can easily be overwhelmed if something fills the input folder: HUMINT is less likely to do it than SIGINT, even if the analyst is just doing the preliminary nomination for a watchlist.

## The optimal Bayesian watchlist

One can analyse this in a Bayesian framework: assume each item has a value $x_i$ distributed as $f(x_i)$. The goal of the watchlist is to spend expensive investigatory resources to figure out the true values; say the cost is 1 per item. Then a watchlist of randomly selected items will have a mean value $V=E[x]-1$. Suppose a cursory investigation costing much less gives some indication about $x_i$, so that it is now known with some error: $y_i = x_i+\epsilon$. One approach is to select all items above a threshold $\theta$, making $V=E[x_i|y_i<\theta]-1$.

If we imagine that everything is Gaussian $x_i \sim N(\mu_x,\sigma_x^2), \epsilon \sim N(0,\sigma_\epsilon^2)$, then  $V=\int_\theta^\infty t \phi(\frac{t-\mu_x}{\sigma_x}) \Phi\left(\frac{t-\mu_x}{\sqrt{\sigma_x^2+\sigma_\epsilon^2}}\right)dt$. While one can ram through this using Owen’s useful work, here is a Monte Carlo simulation of what happens when we use $\mu_x=0, \sigma_x^2=1, \sigma_\epsilon^2=1$ (the correlation between x and y is 0.707, so this is not too much noise):

Note that in this case the addition of noise forces a far higher threshold than without noise (1.22 instead of 0.31). This is just 19% of all items, while in the noise-less case 37% of items would be worth investigating. As noise becomes worse the selection for a watchlist should become stricter: a really cursory inspection should not lead to insertion unless it looks really relevant.

Here we used a mild Gaussian distribution. In term of danger, I think people or things are more likely to be lognormal distributed since it is a product of many relatively independent factors. Using lognormal x and y leads to a situation where there is a maximum utility for some threshold. This is likely a problematic model, but clearly the shape of the distributions matter a lot for where the threshold should be.

Note that having huge resources can be a bane: if you build your watchlist from the top priority down as long as you have budget or manpower, the lower priority (but still above threshold!) entries will be more likely to be a waste of time and effort. The average utility will decline.

## Predictive validity matters more?

In any case, a cursory and cheap decision process is going to give so many so-so evaluations that one shouldn’t build the watchlist on it. Instead one should aim for a series of filters of increasing sophistication (and cost) to wash out the relevant items from the dross.

But even there there are pitfalls, as this paper looking at the pharma R&D industry shows:

We find that when searching for rare positives (e.g., candidates that will successfully complete clinical development), changes in the predictive validity of screening and disease models that many people working in drug discovery would regard as small and/or unknowable (i.e., an 0.1 absolute change in correlation coefficient between model output and clinical outcomes in man) can offset large (e.g., 10 fold, even 100 fold) changes in models’ brute-force efficiency.

Just like for drugs (an example where the watchlist is a set of candidate compounds), it might be more important for terrorist watchlists to aim for signs with predictive power of being a bad guy, rather than being correlated with being a bad guy. Otherwise anti-terrorism will suffer the same problem of declining productivity, despite ever more sophisticated algorithms.

# How small is the wiki?

Recently I encountered a specialist Wiki. I pressed “random page” a few times, and got a repeat page after 5 tries. How many pages should I expect this small wiki to have?

We can compare this to the German tank problem. Note that it is different; in the tank problem we have a maximum sample (maybe like the web pages on the site were numbered), while here we have number of samples before repetition.

We can of course use Bayes theorem for this. If I get a repeat after $k$ random samples, the posterior distribution of $N$, the number of pages, is $P(N|k) = P(k|N)P(N)/P(k)$.

If I randomly sample from $N$ pages, the probability of getting a repeat on my second try is $1/N$, on my third try $2/N$, and so on: $P(k|N)=(k-1)/N$. Of course, there has to be more pages than $k-1$, otherwise a repeat must have happened before step $k$, so this is valid for $k \leq N+1$. Otherwise, $P(k|N)=0$ for $k>N+1$.

The prior $P(N)$ needs to be decided. One approach is to assume that websites have a power-law distributed number of pages. The majority are tiny, and then there are huge ones like Wikipedia; the exponent is close to 1. This gives us $P(N) = N^{-\alpha}/\zeta(\alpha)$. Note the appearance of the Riemann zeta function as a normalisation factor.

We can calculate $P(k)$ by summing over the different possible $N$: $P(k)=\sum_{N=1}^\infty P(k|N)P(N) = \frac{k-1}{\zeta(\alpha)}\sum_{N=k-1}^\infty N^{-(\alpha+1)}$ $=\frac{k-1}{\zeta(\alpha)}(\zeta(\alpha+1)-\sum_{i=1}^{k-2}i^{-(\alpha+1)})$.

Putting it all together we get $P(N|k)=N^{-(\alpha+1)}/(\zeta(\alpha+1) -\sum_{i=1}^{k-2}i^{-(\alpha+1)})$ for $N\geq k-1$. The posterior distribution of number of pages is another power-law. Note that the dependency on $k$ is rather subtle: it is in the support of the distribution, and the upper limit of the partial sum.

What about the expected number of pages in the wiki? $E(N|k)=\sum_{N=1}^\infty N P(N|k) = \sum_{N=k-1}^\infty N^{-\alpha}/(\zeta(\alpha+1) -\sum_{i=1}^{k-2}i^{-(\alpha+1)})$ $=\frac{\zeta(\alpha)-\sum_{i=1}^{k-2} i^{-\alpha}}{\zeta(\alpha+1)-\sum_{i=1}^{k-2}i^{-(\alpha+1)}}$. The expectation is the ratio of the zeta functions of $\alpha$ and $\alpha+1$, minus the first $k-2$ terms of their series.

So, what does this tell us about the wiki I started with? Assuming $\alpha=1.1$ (close to the behavior of big websites), it predicts $E(N|k)\approx 21.28$. If one assumes a higher $\alpha=2$ the number of pages would be 7 (which was close to the size of the wiki when I looked at it last night – it has grown enough today for k to equal 13 when I tried it today).

So, can we derive a useful rule of thumb for the expected number of pages? Dividing by $k$ shows that $E(N|k)$ approaches proportionality, especially for larger $\alpha$:

So a good rule of thumb is that if you get $k$ pages before a repeat, expect between $2k$ and $4k$ pages on the site. However, remember that we are dealing with power-laws, so the variance can be surprisingly high.

Yesterday I gave a talk at the joint Bloomberg-London Futurist meeting “The state of the future” about the future of decisionmaking. Parts were updates on my policymaking 2.0 talk (turned into this chapter), but I added a bit more about individual decisionmaking, rationality and forecasting.

The big idea of the talk: ensemble methods really work in a lot of cases. Not always, not perfectly, but they should be among the first tools to consider when trying to make a robust forecast or decision. They are Bayes’ broadsword:

## Forecasting

One of my favourite experts on forecasting is J Scott Armstrong. He has stressed the importance of evidence based forecasting, including checking how well different methods work. The general answer is: not very well, yet people keep on using them. He has been pointing this out since the 70s. It also turns out that expertise only gets you so far: expert forecasts are not very reliable either, and the accuracy levels out quickly with increasing level of expertise. One implication is that one should at least get cheap experts since they are about as good as the pricey ones. It is also known that simple models for forecasting tends to be more accurate than complex ones, especially in complex and uncertain situations (see also Haldane’s “The Dog and the Frisbee”). Another important insight is that it is often better to combine different methods than try to select the one best method.

Another classic look at prediction accuracy is Philip Tetlock’s Expert Political Judgment (2005) where he looked at policy expert predictions. They were only slightly more accurate than chance, worse than basic extrapolation algorithms, and there was a negative link to fame: high profile experts have an incentive to be interesting and dramatic, but not right. However, he noticed some difference between “hedgehogs” (people with One Big Theory) and “foxes” (people using multiple theories), with the foxes outperforming hedgehogs.

OK, so in forecasting it looks like using multiple methods, theories and data sources (including experts) is a way to get better results.

## Statistical machine learning

A standard problem in machine learning is to classify something into the right category from data, given a set of training examples. For example, given medical data such as age, sex, and blood test results, diagnose what a particular disease a patient might suffer from. The key problem is that it is non-trivial to construct a classifier that works well on data different from the training data. It can work badly on new data, even if it works perfectly on the training examples. Two classifiers that perform equally well during training may perform very differently in real life, or even for different data.

The obvious solution is to combine several classifiers and average (or vote about) their decisions: ensemble based systems. This reduces the risk of making a poor choice, and can in fact improve overall performance if they can specialize for different parts of the data. This also has other advantages: very large datasets can be split into manageable chunks that are used to train different components of the ensemble, tiny datasets can be “stretched” by random resampling to make an ensemble trained on subsets, outliers can be managed by “specialists”, in data fusion different types of data can be combined, and so on. Multiple weak classifiers can be combined into a strong classifier this way.

The method benefits from having diverse classifiers that are combined: if they are too similar in their judgements, there is no advantage. Estimating the right weights to give to them is also important, otherwise a truly bad classifier may influence the output.

The iconic demonstration of the power of this approach was the Netflix Prize, where different teams competed to make algorithms that predicted user ratings of films from previous ratings. As part of the rules the algorithms were made public, spurring innovation. When the competition concluded in 2009, the leading teams all consisted of ensemble methods where component algorithms were from past teams. The two big lessons were (1) that a combination of not just the best algorithms, but also less accurate algorithms, were the key to winning, and (2) that organic organization allows the emergence of far better performance than having strictly isolated teams.

## Group cognition

Condorcet’s jury theorem is perhaps the classic result in group problem solving: if a group of people hold a majority vote, and each has a probability p>1/2 of voting for the correct choice, then the probability the group will vote correctly is higher than p and will tend to approach 1 as the size of the group increases. This presupposes that votes are independent, although stronger forms of the theorem have been proven. (In reality people may have different preferences so there is no clear “right answer”)

By now the pattern is likely pretty obvious. Weak decision-makers (the voters) are combined through a simple procedure (the vote) into better decision-makers.

Group problem solving is known to be pretty good at smoothing out individual biases and errors. In The Wisdom of Crowds Surowiecki suggests that the ideal crowd for answering a question in a distributed fashion has diversity of opinion, independence (each member has an opinion not determined by the other’s), decentralization (members can draw conclusions based on local knowledge), and the existence of a good aggregation process turning private judgements into a collective decision or answer.

Perhaps the grandest example of group problem solving is the scientific process, where peer review, replication, cumulative arguments, and other tools make error-prone and biased scientists produce a body of findings that over time robustly (if sometimes slowly) tends towards truth. This is anything but independent: sometimes a clever structure can improve performance. However, it can also induce all sorts of nontrivial pathologies – just consider the detrimental effects status games have on accuracy or focus on the important topics in science.

Small group problem solving on the other hand is known to be great for verifiable solutions (everybody can see that a proposal solves the problem), but unfortunately suffers when dealing with “wicked problems” lacking good problem or solution formulation. Groups also have scaling issues: a team of N people need to transmit information between all N(N-1)/2 pairs, which quickly becomes cumbersome.

One way of fixing these problems is using software and formal methods.

The Good Judgement Project (partially run by Tetlock and with Armstrong on the board of advisers) participated in the IARPA ACE program to try to improve intelligence forecasts. They used volunteers and checked their forecast accuracy (not just if they got things right, but if claims that something was 75% likely actually came true 75% of the time). This led to a plethora of fascinating results. First, accuracy scores based on the first 25 questions in the tournament predicted subsequent accuracy well: some people were consistently better than others, and it tended to remain constant. Training (such a debiasing techniques) and forming teams also improved performance. Most impressively, using the top 2% “superforecasters” in teams really outperformed the other variants. The superforecasters were a diverse group, smart but by no means geniuses, updating their beliefs frequently but in small steps.

The key to this success was that a computer- and statistics-aided process found the good forecasters and harnessed them properly (plus, the forecasts were on a shorter time horizon than the policy ones Tetlock analysed in his previous book: this both enables better forecasting, plus the all-important feedback on whether they worked).

Another good example is the Galaxy Zoo, an early crowd-sourcing project in galaxy classification (which in turn led to the Zooniverse citizen science project). It is not just that participants can act as weak classifiers and combined through a majority vote to become reliable classifiers of galaxy type. Since the type of some galaxies is agreed on by domain experts they can used to test the reliability of participants, producing better weightings. But it is possible to go further, and classify the biases of participants to create combinations that maximize the benefit, for example by using overly “trigger happy” participants to find possible rare things of interest, and then check them using both conservative and neutral participants to become certain. Even better, this can be done dynamically as people slowly gain skill or change preferences.

The right kind of software and on-line “institutions” can shape people’s behavior so that they form more effective joint cognition than they ever could individually.

## Conclusions

The big idea here is that it does not matter that individual experts, forecasting methods, classifiers or team members are fallible or biased, if their contributions can be combined in such a way that the overall output is robust and less biased. Ensemble methods are examples of this.

While just voting or weighing everybody equally is a decent start, performance can be significantly improved by linking it to how well the participants perform. Humans can easily be motivated by scoring (but look out for disalignment of incentives: the score must accurately reflect real performance and must not be gameable).

In any case, actual performance must be measured. If we cannot tell if some method is more accurate than something else, then either accuracy does not matter (because it cannot be distinguished or we do not really care), or we will not get the necessary feedback to improve it. It is known from the expertise literature that one of the key factors for it to be possible to become an expert on a task is feedback.

Having a flexible structure that can change is a good approach to handling a changing world. If people have disincentives to change their mind or change teams, they will not update beliefs accurately.

I got a good question after the talk: if we are supposed to keep our models simple, how can we use these complicated ensembles? The answer is of course that there is a difference between using a complex and a complicated approach. The methods that tend to be fragile are the ones with too many free parameters, too much theoretical burden: they are the complex “hedgehogs”. But stringing together a lot of methods and weighting them appropriately merely produces a complicated model, a “fox”. Component hedgehogs are fine as long as they are weighed according to how well they actually perform.

(In fact, adding together many complex things can make the whole simpler. My favourite example is the fact that the Kolmogorov complexity of integers grows boundlessly on average, yet the complexity of the set of all integers is small – and actually smaller than some integers we can easily name. The whole can be simpler than its parts.)

In the end, we are trading Occam’s razor for a more robust tool: Bayes’ Broadsword. It might require far more strength (computing power/human interaction) to wield, but it has longer reach. And it hits hard.

## Appendix: individual classifiers

I used Matlab to make the illustration of the ensemble classification. Here are some of the component classifiers. They are all based on the examples in the Matlab documentation. My ensemble classifier is merely a maximum vote between the component classifiers that assign a class to each point.

# The Biosphere Code

Yesterday I contributed to a piece of manifesto writing, producing the Biosphere Code Manifesto. The Guardian has a version on its blog. Not quite as dramatic as Marinetti’s Futurist Manifesto but perhaps more constructive:

Principle 1. With great algorithmic powers come great responsibilities

Those implementing and using algorithms should consider the impacts of their algorithms.

Principle 2. Algorithms should serve humanity and the biosphere at large.

Algorithms should be considerate of human needs and the biosphere, and facilitate transformations towards sustainability by supporting ecologically responsible innovation.

Principle 3. The benefits and risks of algorithms should be distributed fairly

Algorithm developers should consider issues relating to the distribution of risks and opportunities more seriously. Developing algorithms that provide benefits to the few and present risks to the many are both unjust and unfair.

Principle 4. Algorithms should be flexible, adaptive and context-aware

Algorithms should be open, malleable and easy to reprogram if serious repercussions or unexpected results emerge. Algorithms should be aware of their external effects and be able to adapt to unforeseen changes.

Principle 5. Algorithms should help us expect the unexpected

Algorithms should be used in such a way that they enhance our shared capacity to deal with shocks and surprises – including problems caused by errors or misbehaviors in other algorithms.

Principle 6. Algorithmic data collection should be open and meaningful

Data collection should be transparent and respectful of public privacy. In order to avoid hidden biases, the datasets which feed into algorithms should be validated.

Principle 7. Algorithms should be inspiring, playful and beautiful

Algorithms should be used to enhance human creativity and playfulness, and to create new kinds of art. We should encourage algorithms that facilitate human collaboration, interaction and engagement – with each other, with society, and with nature.

# The algorithmic world

The basic insight is that the geosphere, ecosphere, anthroposphere and technosphere are getting deeply entwined, and algorithms are becoming a key force in regulating this global system.

Some algorithms enable new activities (multimedia is impossible without FFT and CRC), change how activities are done (data centres happen because virtualization and MapReduce make them scale well), or enable faster algorithmic development (compilers and libraries). Algorithms used for decision support are particularly important. Logistics algorithms (routing, linear programming, scheduling, and optimization) affect the scope and efficiency of the material economy. Financial algorithms the scope and efficiency of the economy itself. Intelligence algorithms (data collection, warehousing, mining, network analysis but also human expert judgement combination methods), statistics gathering and risk models affect government policy. Recommender systems (“You May Also Enjoy…”) and advertising influence consumer demand.

Since these algorithms are shared, their properties will affect a multitude of decisions and individuals in the same way even if they think they are acting independently. There are spillover effects from the groups that use algorithms to other stakeholders from the algorithm-caused  actions. And algorithms have a multitude of non-trivial failure modes: machine learning can create opaque bias or sudden emergent misbehaviour, human over-reliance on algorithms can cause accidents or large-scale misallocation of resources, some algorithms produce systemic risks, and others embody malicious behaviours. In short, code – whether in computers or as a formal praxis in an organisation – matters morally.

# What is the point?

Could a code like the Biosphere Code actually do anything useful? Isn’t this yet another splashy “wouldn’t it be nice if everybody were moral and rational in engineering/politics/international relations?”

I think it is a first step towards something useful.

There are engineering ethics codes, even for software engineers. But algorithms are created in many domains, including by non-engineers. We can not and should not prevent people from thinking, proposing, and trying new algorithms: that would be like attempts to regulate science, art, and thought. But we can as societies create incentives to do constructive things and avoid known destructive things. In order to do so, we should recognize that we need to work on the incentives and start gathering information.

Algorithms and their large-scale results must be studied and measured: we cannot rely on theory, despite its seductive power since there are profound theoretical limitations about our predictive abilities in the world of algorithms, as well as obvious practical limitations. Algorithms also do not exist in a vacuum: the human or biosphere context is an active part of what is going on. An algorithm can be totally correct and yet be misused in a harmful way because of its framing.

But even in the small, if we can make one programmer think a bit more about what they are doing and choosing a better algorithm than they otherwise would have done, the world is better off. In fact, a single programmer can have surprisingly large impact.

I am more optimistic than that. Recognizing algorithms as the key building blocks that they are for our civilization, what peculiarities they have, and learning better ways of designing and using them has transformative power. There are disciplines dealing with parts of this, but the whole requires considering interdisciplinary interactions that are currently rarely explored.

Let’s get started!

# The moral responsibility of office software

On practical ethics Ben and me blog about user design ethics: when you make software that a lot of people use, even tiny flaws such as delays mean significant losses when summed over all users, and affordances can entice many people to do the wrong thing. So be careful and perfectionist!

This is in many ways the fundamental problem of the modern era. Since successful things get copied into millions or billions, the impact of a single choice can become tremendous. One YouTube clip or one tweet, and suddenly the attention of millions of people will descend on someone. One bug, and millions of computers are vulnerable. A clever hack, and suddenly millions can do it too.

We ought to be far more careful, yet that is hard to square with a free life. Most of the time, it also does not matter since we get lost in the noise with our papers, tweets or companies – the logic of the power law means the vast majority will never matter even a fraction as much as the biggest.

# Ethics for neural networks

I am currently attending IJCNN 2015 in Killarney. Yesterday I gave an invited talk “Ethics and large-scale neural networks: when do we need to start caring for neural networks, rather than about them?” The bulk of the talk was based on my previous WBE ethics paper, looking at the reasons we cannot be certain neural networks have experience or not, leading to my view that we hence ought to handle them with the same care as the biological originals they mimic. Yup, it is the one T&F made a lovely comic about – which incidentally gave me an awesome poster at the conference.

When I started, I looked a bit at ethics in neural network science/engineering. As I see it, there are three categories of ethical issues specific to the topic rather than being general professional ethics issues:

• First, the issues surrounding applications such as privacy, big data, surveillance, killer robots etc.
• Second, the issue that machine learning allows machines to learn the wrong things.
• Third, machines as moral agents or patients.

The first category is important, but I leave that for others to discuss. It is not necessarily linked to neural networks per se, anyway. It is about responsibility for technology and what one works on.

## Learning wrong

The second category is fun. Learning systems are not fully specified by their creators – which is the whole point! This means that their actual performance is open-ended (within the domain of possible responses). And from that follows that they can learn things we do not want.

One example is inadvertent discrimination, where the network learns something that would be called racism, sexism or something similar if it happened in a human. One can consider a credit rating neural network trained on customer data to estimate the probability of a customer defaulting. It may develop an internal representation that gets activated by customer’s race and is linked to a negative evaluation of the rating. There is no deliberate programming of racism, just something that emerges from the data – where the race:economy link may well be due to factors in society that are structurally racist.

A similar, real case is advertising algorithms selecting ads online for users in ways that shows some ads for some groups but not others – which, in the case of education, may serve to perpetuate disadvantages or prejudices.

A recent example was the Google Photo captioning system, which captioned a black couple as gorillas. Obvious outrage ensued, and a Google representative tweeted that this was “high on my list of bugs you *never* want to see happen ::shudder::”. The misbehaviour was quickly fixed.

Mislabelling somebody or something else might merely have been amusing: calling some people gorillas will often be met by laughter. But it becomes charged and ethically relevant in a culture like the current American one. This is nothing the recognition algorithm knows about: from its perspective mislabelling chairs is as bad as mislabelling humans. Adding a culturally sensitive loss function to the training is nontrivial. Ad hoc corrections against particular cases – like this one – will only help when a scandalous mislabelling already occurs: we will not know what is misbehaviour until we see it.

[ Incidentally, this suggests a way for automatic insult generation: use computer vision to find matching categories, and select the one that is closest but has the lowest social status (perhaps detected using sentiment analysis). It will be hilarious for the five seconds until somebody takes serious offence. ]

It has been suggested that the behavior was due to training data being biased towards white people, making the model subtly biased. If there are few examples of a category it might be suppressed or overused as a response. This can be very hard to fix, since many systems and data sources have a patchy spread in social space. But maybe we need to pay more attention to the issue of whether data is socially diverse enough. It is worth recognizing that since a machine learning system may be used by very many users once it has been trained, it has the power to project its biased view of the world to many: getting things right in a universal system, rather than something used by a few, may be far more important than it looks. We may also have to have enough online learning over time so such systems update their worldview based on how culture evolves.

## Moral actors, proxies and patients

Making machines that act in a moral context is even iffier.

My standard example is of course the autonomous car, which may find itself in situations that would count as moral choices for a human. Here the issue is who sets the decision scheme: presumably they would be held accountable insofar they could predict the consequences of their code or be identified. I have argued that it is good to have the car try to behave as its “driver” would, but it will still be limited by the sensory and cognitive abilities of the vehicle. Moral proxies are doable, even if they are not moral agents.

The manufacture and behavior of killer robots is of course even more contentious. Even if we think they can be acceptable in principle and have a moral system that we think would be the right one to implement, actually implementing it for certain may prove exceedingly hard. Verification of robotics is hard; verification of morally important actions based on real-world data is even worse. And one cannot shirk the responsibility to do so if one deploys the system.

Note that none of this presupposes real intelligence or truly open-ended action abilities. They just make an already hard problem tougher. Machines that can only act within a well-defined set of constraints can be further constrained to not go into parts of state- or action-space we know are bad (but as discussed above, even captioning images is a sufficiently big space that we will find surprise bad actions).

As I mentioned above, the bulk of the talk was my argument that whole brain emulation attempts can produce systems we have good reasons to be careful with: we do not know if they are moral agents, but they are intentionally architecturally and behaviourally close to moral agents.

A new aspect I got the chance to discuss is the problem about non-emulation neural networks. When do we need to consider them? Brian Tomasik has written a paper about whether we should regard reinforcement learning agents as moral patients (see also this supplement). His conclusion is that these programs mimic core motivation/emotion cognitive systems that almost certainly matter for real moral patients’ patient-hood (an organism without a reward system or learning would presumably lose much or all of its patient-hood), and there is a nonzero chance that they are fully or partially sentient.

But things get harder for other architectures. A deep learning network with just a feedforward architecture is presumably unable to be conscious, since many theories of consciousness presuppose some forms of feedback – and that is not possible in that architecture. But at the conference there have been plenty of recurrent networks that have all sorts of feedback. Whether they can have experiential states appears tricky to answer. In some cases we may argue they are too small to matter, but again we do not know if level of consciousness (or moral considerability) necessarily has to follow brain size.

They also inhabit a potentially alien world where their representations could be utterly unrelated to what we humans understand or can express. One might say, paraphrasing Wittgenstein, that if a neural network could speak we would not understand it. However, there might be ways of making their internal representations less opaque. Methods such as inceptionism, deep visualization, or t-SNE can actually help discern some of what is going on on the inside. If we were to discover a set of concepts that were similar to human or animal concepts, we may have reason to thread a bit more carefully – especially if there were concepts linked to some of them in the same way “suffering concepts” may be linked to other concepts. This looks like a very relevant research area, both for debugging our learning systems, but also for mapping out the structures of animal, human and machine minds.

In the end, if we want safe and beneficial smart systems, we better start figuring out how to understand them better.

# Don’t be evil and make things better

I am one of the signatories of an open letter calling for a stronger aim at socially beneficial artificial intelligence.

It might seem odd to call for something like that: who in their right mind would not want AI to be beneficial? But when we look at the field (and indeed, many other research fields) the focus has traditionally been on making AI more capable. Besides some pure research interest and no doubt some “let’s create life”-ambition, the bulk of motivation has been to make systems that do something useful (or push in the direction of something useful).

“Useful” is normally defined in term of performing some task – translation, driving, giving medical advice – rather than having a good impact on the world. Better done tasks are typically good locally – people get translations more cheaply, things get transported, advice may be better – but have more complex knock-on effects: fewer translators, drivers or doctors needed, or that their jobs get transformed, plus potential risks from easy (but possibly faulty) translation, accidents and misuse of autonomous vehicles, or changes in liability. Way messier. Even if the overall impact is great, socially disruptive technologies that appear surprisingly fast can cause trouble, emergent misbehaviour and bad design choices can lead to devices that amplify risk (consider high frequency trading, badly used risk models, or anything that empowers crazy people). Some technologies may also lend themselves to centralizing power (surveillance, autonomous weapons) but reduce accountability (learning algorithms internalizing discriminatory assumptions in an opaque way). These considerations should of course be part of any responsible engineering and deployment, even if handling them is by no means solely the job of the scientist or programmer. Doing it right will require far more help from other disciplines.

The most serious risks come from the very smart systems that may emerge further into the future: they either amplify human ability in profound ways, or they are autonomous  themselves. In both cases they make achieving goals easier, but do not have any constraints on what goals are sane, moral or beneficial. Solving the problem of how to keep such systems safe is a hard problem we ought to start on early. One of the main reasons for the letter is that so little effort has gone into better ways of controlling complex, adaptive and possibly self-improving technological systems. It makes sense even if one doesn’t worry about superintelligence or existential risk.

This is why we have to change some research priorities. In many cases it is just putting problems on the agenda as useful to work on: they are important but understudied, and a bit of investigation will likely go a long way. In some cases it is more a matter of signalling that different communities need to talk more to each other. And in some instances we really need to have our act together before big shifts occur – if unemployment soars to 50%, engineering design-ahead enables big jumps in tech capability, brains get emulated, or systems start self-improving we will not have time to carefully develop smart policies.

My experience with talking to the community is that there is not a big split between AI and AI safety practitioners: they roughly want the same thing. There might be a bigger gap between the people working on the theoretical, far out issues and the people working on the applied here-and-now stuff. I suspect they can both learn from each other. More research is, of course, needed.