The moral responsibility of office software

Si elegansOn practical ethics Ben and me blog about user design ethics: when you make software that a lot of people use, even tiny flaws such as delays mean significant losses when summed over all users, and affordances can entice many people to do the wrong thing. So be careful and perfectionist!

This is in many ways the fundamental problem of the modern era. Since successful things get copied into millions or billions, the impact of a single choice can become tremendous. One YouTube clip or one tweet, and suddenly the attention of millions of people will descend on someone. One bug, and millions of computers are vulnerable. A clever hack, and suddenly millions can do it too.

We ought to be far more careful, yet that is hard to square with a free life. Most of the time, it also does not matter since we get lost in the noise with our papers, tweets or companies – the logic of the power law means the vast majority will never matter even a fraction as much as the biggest.

Ethics for neural networks

PosterI am currently attending IJCNN 2015 in Killarney. Yesterday I gave an invited talk “Ethics and large-scale neural networks: when do we need to start caring for neural networks, rather than about them?” The bulk of the talk was based on my previous WBE ethics paper, looking at the reasons we cannot be certain neural networks have experience or not, leading to my view that we hence ought to handle them with the same care as the biological originals they mimic. Yup, it is the one T&F made a lovely comic about – which incidentally gave me an awesome poster at the conference.

When I started, I looked a bit at ethics in neural network science/engineering. As I see it, there are three categories of ethical issues specific to the topic rather than being general professional ethics issues:

  • First, the issues surrounding applications such as privacy, big data, surveillance, killer robots etc.
  • Second, the issue that machine learning allows machines to learn the wrong things.
  • Third, machines as moral agents or patients.

The first category is important, but I leave that for others to discuss. It is not necessarily linked to neural networks per se, anyway. It is about responsibility for technology and what one works on.

Learning wrong

The second category is fun. Learning systems are not fully specified by their creators – which is the whole point! This means that their actual performance is open-ended (within the domain of possible responses). And from that follows that they can learn things we do not want.

One example is inadvertent discrimination, where the network learns something that would be called racism, sexism or something similar if it happened in a human. One can consider a credit rating neural network trained on customer data to estimate the probability of a customer defaulting. It may develop an internal representation that gets activated by customer’s race and is linked to a negative evaluation of the rating. There is no deliberate programming of racism, just something that emerges from the data – where the race:economy link may well be due to factors in society that are structurally racist.

A similar, real case is advertising algorithms selecting ads online for users in ways that shows some ads for some groups but not others – which, in the case of education, may serve to perpetuate disadvantages or prejudices.

A recent example was the Google Photo captioning system, which captioned a black couple as gorillas. Obvious outrage ensued, and a Google representative tweeted that this was “high on my list of bugs you *never* want to see happen ::shudder::”. The misbehaviour was quickly fixed.

Mislabelling somebody or something else might merely have been amusing: calling some people gorillas will often be met by laughter. But it becomes charged and ethically relevant in a culture like the current American one. This is nothing the recognition algorithm knows about: from its perspective mislabelling chairs is as bad as mislabelling humans. Adding a culturally sensitive loss function to the training is nontrivial. Ad hoc corrections against particular cases – like this one – will only help when a scandalous mislabelling already occurs: we will not know what is misbehaviour until we see it.

[ Incidentally, this suggests a way for automatic insult generation: use computer vision to find matching categories, and select the one that is closest but has the lowest social status (perhaps detected using sentiment analysis). It will be hilarious for the five seconds until somebody takes serious offence. ]

It has been suggested that the behavior was due to training data being biased towards white people, making the model subtly biased. If there are few examples of a category it might be suppressed or overused as a response. This can be very hard to fix, since many systems and data sources have a patchy spread in social space. But maybe we need to pay more attention to the issue of whether data is socially diverse enough. It is worth recognizing that since a machine learning system may be used by very many users once it has been trained, it has the power to project its biased view of the world to many: getting things right in a universal system, rather than something used by a few, may be far more important than it looks. We may also have to have enough online learning over time so such systems update their worldview based on how culture evolves.

Moral actors, proxies and patients

Si elegansMaking machines that act in a moral context is even iffier.

My standard example is of course the autonomous car, which may find itself in situations that would count as moral choices for a human. Here the issue is who sets the decision scheme: presumably they would be held accountable insofar they could predict the consequences of their code or be identified. I have argued that it is good to have the car try to behave as its “driver” would, but it will still be limited by the sensory and cognitive abilities of the vehicle. Moral proxies are doable, even if they are not moral agents.

The manufacture and behavior of killer robots is of course even more contentious. Even if we think they can be acceptable in principle and have a moral system that we think would be the right one to implement, actually implementing it for certain may prove exceedingly hard. Verification of robotics is hard; verification of morally important actions based on real-world data is even worse. And one cannot shirk the responsibility to do so if one deploys the system.

Note that none of this presupposes real intelligence or truly open-ended action abilities. They just make an already hard problem tougher. Machines that can only act within a well-defined set of constraints can be further constrained to not go into parts of state- or action-space we know are bad (but as discussed above, even captioning images is a sufficiently big space that we will find surprise bad actions).

As I mentioned above, the bulk of the talk was my argument that whole brain emulation attempts can produce systems we have good reasons to be careful with: we do not know if they are moral agents, but they are intentionally architecturally and behaviourally close to moral agents.

A new aspect I got the chance to discuss is the problem about non-emulation neural networks. When do we need to consider them? Brian Tomasik has written a paper about whether we should regard reinforcement learning agents as moral patients (see also this supplement). His conclusion is that these programs mimic core motivation/emotion cognitive systems that almost certainly matter for real moral patients’ patient-hood (an organism without a reward system or learning would presumably lose much or all of its patient-hood), and there is a nonzero chance that they are fully or partially sentient.

But things get harder for other architectures. A deep learning network with just a feedforward architecture is presumably unable to be conscious, since many theories of consciousness presuppose some forms of feedback – and that is not possible in that architecture. But at the conference there have been plenty of recurrent networks that have all sorts of feedback. Whether they can have experiential states appears tricky to answer. In some cases we may argue they are too small to matter, but again we do not know if level of consciousness (or moral considerability) necessarily has to follow brain size.

They also inhabit a potentially alien world where their representations could be utterly unrelated to what we humans understand or can express. One might say, paraphrasing Wittgenstein, that if a neural network could speak we would not understand it. However, there might be ways of making their internal representations less opaque. Methods such as inceptionism, deep visualization, or t-SNE can actually help discern some of what is going on on the inside. If we were to discover a set of concepts that were similar to human or animal concepts, we may have reason to thread a bit more carefully – especially if there were concepts linked to some of them in the same way “suffering concepts” may be linked to other concepts. This looks like a very relevant research area, both for debugging our learning systems, but also for mapping out the structures of animal, human and machine minds.

In the end, if we want safe and beneficial smart systems, we better start figuring out how to understand them better.