Checking my predictions for 2016

Last year I made a number of predictions for 2016 to see how well calibrated I am. Here is the results:

Prediction Correct?
No nuclear war: 99% 1
No terrorist attack in the USA will kill > 100 people: 95% 1 (Orlando: 50)
I will be involved in at least one published/accepted-to-publish research paper by the end of 2015: 95% 1
Vesuvius will not have a major eruption: 95% 1
I will remain at my same job through the end of 2015: 90% 1
MAX IV in Lund delivers X-rays: 90% 1
Andart II will remain active: 90% 1
Israel will not get in a large-scale war (ie >100 Israeli deaths) with any Arab state: 90% 1
US will not get involved in any new major war with death toll of > 100 US soldiers: 90% 1
New Zeeland has not decided to change current flag at end of year: 85% 1
No multi-country Ebola outbreak: 80% 1
Assad will remain President of Syria: 80% 1
ISIS will control less territory than it does right now: 80% 1
North Korea’s government will survive the year without large civil war/revolt: 80% 1
The US NSABB will allow gain of function funding: 80% 1 [Their report suggests review before funding, currently it is up to the White House to respond. ]

 

US presidential election: democratic win: 75% 0
A general election will be held in Spain: 75% 1
Syria’s civil war will not end this year: 75% 1
There will be no NEO with Torino Scale >0 on 31 Dec 2016: 75% 0 (2016 XP23 showed up on the scale according to JPL, but NEODyS Risk List gives it a zero.)
The Atlantic basin ACE will be below 96.2: 70% 0 (ACE estimate on Jan 1 is 132)
Sweden does not get a seat on the UN Security Council: 70% 0
Bitcoin will end the year higher than $200: 70% 1
Another major eurozone crisis: 70% 0
Brent crude oil will end the year lower than $60 a barrel: 70% 1
I will actually apply for a UK citizenship: 65% 0
UK referendum votes to stay in EU: 65% 0
China will have a GDP growth above 5%: 65% 1
Evidence for supersymmetry: 60% 0
UK larger GDP than France: 60% 1 (although it is a close call; estimates put France at 2421.68 and UK at 2848.76 – quite possibly this might change)
France GDP growth rate less than 2%: 60% 1
I will have made significant progress (4+ chapters) on my book: 55% 0
Iran nuclear deal holding: 50% 1
Apple buys Tesla: 50% 0
The Nikkei index ends up above 20,000: 50% 0 (nearly; the Dec 20 max was 19,494)

Overall, my Brier score is 0.1521. Which doesn’t feel too bad.

Plotting the results (where I bin together things in [0.5,0.55], [0.5,0.65], [0.7 0.75], [0.8,0.85], [0.9,0.99] bins) give this calibration plot:

Plot of average correctness of my predictions for 2016 as a function of confidence.
Plot of average correctness of my predictions for 2016 as a function of confidence (blue). Red line is perfect calibration.

Overall, I did great on my “sure bets” and fairly weakly on my less certain bets. I did not have enough questions to make this very statistically solid (coming up with good prediction questions is hard!), but the overall shape suggests that I am a bit overconfident, which is not surprising.

Time to come up with good 2017 prediction questions.

A crazy futurist writes about crazy futurists

Arjen the doomsayerWarren Ellis’ Normal is a little story about the problem of being serious about the future.

As I often point out, most people in the futures game are basically in the entertainment industry: telling wonderful or frightening stories that allow us to feel part of a bigger sweep of history, reflect a bit, and then return to the present with the reassurance that we have some foresight. Relatively little future studies is about finding decision-relevant insights and then acting on it. It exists, but it is not the bulk of future-oriented people. Taking the future seriously might require colliding with your society as you try to tell it it is going the wrong way. Worse, the conclusions might tell you that your own values and goals are wrong.

Normal takes place at a sanatorium for mad futurists in the wilds of Oregon. The idea is that if you spend too much time thinking too seriously about the big and horrifying things in the future mental illness sets in. So when futurists have nervous breakdowns they get sent by their sponsors to Normal to recover. They are useful, smart, and dedicated people but since the problems they deal with are so strange their conditions are equally unusual. The protagonist arrives just in time to encounter a bizarre locked room mystery – exactly the worst kind of thing for a place like Normal with many smart and fragile minds – driving him to investigate what is going on.

As somebody working with the future, I think the caricatures of these futurists (or rather their ideas) are spot on. There are the urbanists, the singularitarians, the neoreactionaries, the drone spooks, and the invented professional divisions. Of course, here they are mad in a way that doesn’t allow them to function in society which softballs the views: singletons and Molochs are serious real ideas that should make your stomach lurch.

The real people I know who take the future seriously are overall pretty sane. I remember a documentary filmmaker at a recent existential risk conference mildly complaining that people where so cheerful and well-adapted: doubtless some darkness and despair would have made a far more compelling imagery than chummy academics trying to salvage the bioweapons convention. Even the people involved in developing the Mutually Assured Destruction doctrine seem to have been pretty healthy. People who go off on the deep end tend to do it not because of The Future but because of more normal psychological fault lines. Maybe we are not taking the future seriously enough, but I suspect it is more a case of an illusion of control: we know we are at least doing something.

This book convinced me that I need to seriously start working on my own book project, the “glass is half full” book. Much of our research at FHI seems to be relentlessly gloomy: existential risk, AI risk, all sorts of unsettling changes to the human condition that might slurp us down into a valueless attractor asymptoting towards the end of time. But that is only part of it: there are potential futures so bright that we do not just need sunshades, but we have problems with even managing the positive magnitude in an intellectually useful way. The reason we work on existential risk is that we (1) think there is enormous positive potential value at stake, and (2) we think actions can meaningfully improve chances. That is no pessimism, quite the opposite. I can imagine Ellis or one of his characters skeptically looking at me across the table at Normal and accusing me of solutionism and/or a manic episode. Fine. I should lay out my case in due time, with enough logos, ethos and pathos to convince them (Muhahaha!).

I think the fundamental horror at the core of Normal – and yes, I regard this more as a horror story than a techno-thriller or satire – is the belief that The Future is (1) pretty horrifying and (2) unstoppable. I think this is a great conceit for a story and a sometimes necessary intellectual tonic to consider. But it is bad advice for how to live a functioning life or actually make a saner future.

 

Settling Titan, Schneier’s Law, and scenario thinking

Charles Wohlforth and Amanda R. Hendrix want us to colonize Titan. The essay irritated me in an interesting manner.

Full disclosure: they interviewed me while they were writing their book Beyond Earth: Our Path to a New Home in the Planets, which I have not read yet, and I will only be basing the following on the SciAm essay. It is not really about settling Titan either, but something that bothers me with a lot of scenario-making.

A weak case for Titan and against Luna and Mars

titan2dmapBasically the essay outlines reasons why other locations in the solar system are not good: Mercury too hot, Venus way too hot, Mars and Luna have too much radiation. Only Titan remains, with a cold environment but not too much radiation.

A lot of course hinges on the assumptions:

We expect human nature to stay the same. Human beings of the future will have the same drives and needs we have now. Practically speaking, their home must have abundant energy, livable temperatures and protection from the rigors of space, including cosmic radiation, which new research suggests is unavoidably dangerous for biological beings like us.

I am not that confident in that we will remain biological or vulnerable to radiation. But even if we decide to accept the assumptions, the case against the Moon and Mars is odd:

Practically, a Moon or Mars settlement would have to be built underground to be safe from this radiation.Underground shelter is hard to build and not flexible or easy to expand. Settlers would need enormous excavations for room to supply all their needs for food, manufacturing and daily life.

So making underground shelters is much harder than settling Titan, where buildings need to be isolated against a -179 C atmosphere and ice ground full with complex and quite likely toxic hydrocarbons. They suggest that there is no point in going to the moon to live in an underground shelter when you can do it on Earth, which is not too unreasonable – but is there a point in going to live inside an insulated environment on Titan either? The actual motivations would likely be less of a desire for outdoor activities and more scientific exploration, reducing existential risk, and maybe industrialization.

Also, while making underground shelters in space may be hard, it does not look like an insurmountable problem. The whole concern is a bit like saying submarines are not practical because the cold of the depths of the ocean will give the crew hypothermia – true, unless you add heating.

I think this is similar to Schneier’s law:

Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can’t break.

It is not hard to find a major problem with a possible plan that you cannot see a reasonable way around. That doesn’t mean there isn’t one.

Settling for scenarios

9 of Matter: The Planet GardenMaybe Wohlforth and Hendrix spent a lot of time thinking about lunar excavation issues and consistent motivations for settlements to reach a really solid conclusion, but I suspect that they came to the conclusion relatively lightly. It produces an interesting scenario: Titan is not the standard target when we discuss where humanity ought to go, and it is an awesome environment.

Similarly the “humans will be humans” scenario assumptions were presumably chosen not after a careful analysis of relative likelihood of biological and postbiological futures, but just because it is similar to the past and makes an interesting scenario. Plus human readers like reading about humans rather than robots. All together it makes for a good book.

Clearly I have different priors compared to them on the ease and rationality of Lunar/Martian excavation and postbiology. Or even giving us D. radiodurans genes.

In The Age of Em Robin Hanson argues that if we get the brain emulation scenario space settlement will be delayed until things get really weird: while postbiological astronauts are very adaptable, so much of the mainstream of civilization will be turning inward towards a few dense centers (for economics and communications reasons). Eventually resource demand, curiosity or just whatever comes after the Age of Ems may lead to settling the solar system. But that process will be pretty different even if it is done by mentally human-like beings that do need energy and protection. Their ideal environments would be energy-gradient rich, with short communications lags: Mercury, slowly getting disassembled into a hot Dyson shell, might be ideal. So here the story will be no settlement, and then wildly exotic settlement that doesn’t care much about the scenery.

But even with biological humans we can imagine radically different space settlement scenarios, such as the Gerhard O’Neill scenario where planetary surfaces are largely sidestepped for asteroids and space habitats. This is Jeff Bezo’s vision rather than Elon Musk’s and Wohlforth/Hendrix’s. It also doesn’t tell the same kind of story: here our new home is not in the planets but between them.

My gripe is not against settling Titan, or even thinking it is the best target because of some reasons. It is against settling too easily for nice scenarios.

Beyond the good story

Sometimes we settle for scenarios because they tell a good story. Sometimes because they are amenable to study among other, much less analyzable possibilities. But ideally we should aim at scenarios that inform us in a useful way about options and pathways we have.

That includes making assumptions wide enough to cover relevant options, even the less glamorous or tractable ones.

That requires assuming future people will be just as capable (or more) at solving problems: just because I can’t see a solution to X doesn’t mean it is not trivially solved in the future.

(Maybe we could call it the “Manure Principle” after the canonical example of horse manure being seen as a insoluble urban planning problem at the previous turn of century and then neatly getting resolved by unpredicted trams and cars – and just like Schneier’s law and Stigler’s law the reality is of course more complex than the story.)

In standard scenario literature there are often admonitions not just to select a “best case scenario”, “worst case scenario” and “business as usual scenario” – scenario planning comes into its own when you see nontrivial, mixed value possibilities. In particular, we want decision-relevant scenarios that make us change what we will do when we hear about them (rather than good stories, which entertain but do not change our actions). But scenarios on their own do not tell us how to make these decisions: they need to be built from our rationality and decision theory applied to their contents. Easy scenarios make it trivial to choose (cake or death?), but those choices would have been obvious even without the scenarios: no forethought needed except to bring up the question. Complex scenarios force us to think in new ways about relevant trade-offs.

The likelihood of complex scenarios is of course lower than simple scenarios (the conjunction fallacy makes us believe much more in rich stories). But if they are seen as tools for developing decisions rather than information about the future, then their individual probability is less of an issue.

In the end, good stories are lovely and worth having, but for thinking and deciding carefully we should not settle for just good stories or the scenarios that feel neat.

 

 

How much should we spread out across future scenarios?

Robin Hanson mentions that some people take him to task for working on one scenario (WBE) that might not be the most likely future scenario (“standard AI”); he responds by noting that there are perhaps 100 times more people working on standard AI than WBE scenarios, yet the probability of AI is likely not a hundred times higher than WBE. He also notes that there is a tendency for thinkers to clump onto a few popular scenarios or issues. However:

In addition, due to diminishing returns, intellectual attention to future scenarios should probably be spread out more evenly than are probabilities. The first efforts to study each scenario can pick the low hanging fruit to make faster progress. In contrast, after many have worked on a scenario for a while there is less value to be gained from the next marginal effort on that scenario.

This is very similar to my own thinking about research effort. Should we focus on things that are likely to pan out, or explore a lot of possibilities just in case one of the less obvious cases happens? Given that early progress is quick and easy, we can often get a noticeable fraction of whatever utility the topic has by just a quick dip. The effective altruist heuristic of looking at neglected fields also is based on this intuition.

A model

But under what conditions does this actually work? Here is a simple model:

There are N possible scenarios, one of which (j) will come about. They have probability P_i. We allocate a unit budget of effort to the scenarios: \sum a_i = 1. For the scenario that comes about, we get utility \sqrt{a_j} (diminishing returns).

Here is what happens if we allocate proportional to a power of the scenarios, a_i \propto P_i^\alpha. \alpha=0 corresponds to even allocation, 1 proportional to the likelihood, >1 to favoring the most likely scenarios. In the following I will run Monte Carlo simulations where the probabilities are randomly generated each instantiation. The outer bluish envelope represents the 95% of the outcomes, the inner ranges from the lower to the upper quartile of the utility gained, and the red line is the expected utility.

Utility of allocating effort as a power of the probability of scenarios. Red line is expected utility, deeper blue envelope is lower and upper quartiles, lighter blue 95% interval.

This is the N=2 case: we have two possible scenarios with probability p and 1-p (where p is uniformly distributed in [0,1]). Just allocating evenly gives us 1/\sqrt{2} utility on average, but if we put in more effort on the more likely case we will get up to 0.8 utility. As we focus more and more on the likely case there is a corresponding increase in variance, since we may guess wrong and lose out. But 75% of the time we will do better than if we just allocated evenly. Still, allocating nearly everything to the most likely case means that one does lose out on a bit of hedging, so the expected utility declines slowly for large \alpha.

Utility of allocating effort as a power of the probability of scenarios. Red line is expected utility, deeper blue envelope is lower and upper quartiles, lighter blue 95% interval. 100 possible scenarios, with uniform probability on the simplex.
Utility of allocating effort as a power of the probability of scenarios. Red line is expected utility, deeper blue envelope is lower and upper quartiles, lighter blue 95% interval. 100 possible scenarios, with uniform probability on the simplex.

The  N=100 case (where the probabilities are allocated based on a flat Dirichlet distribution) behaves similarly, but the expected utility is smaller since it is less likely that we will hit the right scenario.

What is going on?

This doesn’t seem to fit Robin’s or my intuitions at all! The best we can say about uniform allocation is that it doesn’t produce much regret: whatever happens, we will have made some allocation to the possibility. For large N this actually works out better than the directed allocation for a sizable fraction of realizations, but on average we get less utility than betting on the likely choices.

The problem with the model is of course that we actually know the probabilities before making the allocation. In reality, we do not know the likelihood of AI, WBE or alien invasions. We have some information, and we do have priors (like Robin’s view that P_{AI} < 100 P_{WBE}), but we are not able to allocate perfectly.  A more plausible model would give us probability estimates instead of the actual probabilities.

We know nothing

Let us start by looking at the worst possible case: we do not know what the true probabilities are at all. We can draw estimates from the same distribution – it is just that they are uncorrelated with the true situation, so they are just noise.

Utility of allocating effort as a power of the probability of scenarios, but the probabilities are just random guesses. Red line is expected utility, deeper blue envelope is lower and upper quartiles, lighter blue 95% interval. 100 possible scenarios, with uniform probability on the simplex.
Utility of allocating effort as a power of the probability of scenarios, but the probabilities are just random guesses. Red line is expected utility, deeper blue envelope is lower and upper quartiles, lighter blue 95% interval. 100 possible scenarios, with uniform probability on the simplex.

In this case uniform distribution of effort is optimal. Not only does it avoid regret, it has a higher expected utility than trying to focus on a few scenarios (\alpha>0). The larger N is, the less likely it is that we focus on the right scenario since we know nothing. The rationality of ignoring irrelevant information is pretty obvious.

Note that if we have to allocate a minimum effort to each investigated scenario we will be forced to effectively increase our \alpha above 0. The above result gives the somewhat optimistic conclusion that the loss of utility compared to an even spread is rather mild: in the uniform case we have a pretty low amount of effort allocated to the winning scenario, so the low chance of being right in the nonuniform case is being balanced by having a slightly higher effort allocation on the selected scenarios. For high \alpha there is a tail of rare big “wins” when we hit the right scenario that drags the expected utility upwards, even though in most realizations we bet on the wrong case. This is very much the hedgehog predictor story: ocasionally they have analysed the scenario that comes about in great detail and get intensely lauded, despite looking at the wrong things most of the time.

We know a bit

We can imagine that knowing more should allow us to gradually interpolate between the different results: the more you know, the more you should focus on the likely scenarios.

Optimal alpha as a function of how much information we have about the true probabilities. N=2.
Optimal alpha as a function of how much information we have about the true probabilities (noise due to Monte Carlo and discrete steps of alpha). N=2 (N=100 looks similar).

If we take the mean of the true probabilities with some randomly drawn probabilities (the “half random” case) the curve looks quite similar to the case where we actually know the probabilities: we get a maximum for \alpha\approx 2. In fact, we can mix in just a bit (\beta) of the true probability and get a fairly good guess where to allocate effort (i.e. we allocate effort as a_i \propto (\beta P_i + (1-\beta)Q_i)^\alpha where Q_i is uncorrelated noise probabilities). The optimal alpha grows roughly linearly with \beta, \alpha_{opt} \approx 4\beta in this case.

We learn

Adding a bit of realism, we can consider a learning process: after allocating some effort \gamma to the different scenarios we get better information about the probabilities, and can now reallocate. A simple model may be that the standard deviation of noise behaves as 1/\sqrt{\tilde{a}_i} where \tilde{a}_i is the effort placed in exploring the probability of scenario i. So if we begin by allocating uniformly we will have noise at reallocation of the order of 1/\sqrt{\gamma/N}. We can set \beta(\gamma)=\sqrt{\gamma/N}/C, where C is some constant denoting how tough it is to get information. Putting this together with the above result we get \alpha_{opt}(\gamma)=\sqrt{2\gamma/NC^2}. After this exploration, now we use the remaining 1-\gamma effort to work on the actual scenarios.

Expected utility as a function of amount of probability-estimating effort (gamma) for C=1 (hard to update probabilities), C=0.1 and C=0.01 (easy to update). N=100.
Expected utility as a function of amount of probability-estimating effort (gamma) for C=1 (hard to update probabilities), C=0.1 and C=0.01 (easy to update). N=100.

This is surprisingly inefficient. The reason is that the expected utility declines as \sqrt{1-\gamma} and the gain is just the utility difference between the uniform case \alpha=0 and optimal \alpha_{opt}, which we know is pretty small. If C is small (i.e. a small amount of effort is enough to figure out the scenario probabilities) there is an optimal nonzero  \gamma. This optimum \gamma decreases as C becomes smaller. If C is large, then the best approach is just to spread efforts evenly.

Conclusions

So, how should we focus? These results suggest that the key issue is knowing how little we know compared to what can be known, and how much effort it would take to know significantly more.

If there is little more that can be discovered about what scenarios are likely, because our state of knowledge is pretty good, the world is very random,  or improving knowledge about what will happen will be costly, then we should roll with it and distribute effort either among likely scenarios (when we know them) or spread efforts widely (when we are in ignorance).

If we can acquire significant information about the probabilities of scenarios, then we should do it – but not overdo it. If it is very easy to get information we need to just expend some modest effort and then use the rest to flesh out our scenarios. If it is doable but costly, then we may spend a fair bit of our budget on it. But if it is hard, it is better to go directly on the object level scenario analysis as above. We should not expect the improvement to be enormous.

Here I have used a square root diminishing return model. That drives some of the flatness of the optima: had I used a logarithm function things would have been even flatter, while if the returns diminish more mildly the gains of optimal effort allocation would have been more noticeable. Clearly, understanding the diminishing returns, number of alternatives, and cost of learning probabilities better matters for setting your strategy.

In the case of future studies we know the number of scenarios are very large. We know that the returns to forecasting efforts are strongly diminishing for most kinds of forecasts. We know that extra efforts in reducing uncertainty about scenario probabilities in e.g. climate models also have strongly diminishing returns. Together this suggests that Robin is right, and it is rational to stop clustering too hard on favorite scenarios. Insofar we learn something useful from considering scenarios we should explore as many as feasible.

Predictions for 2016

The ever readable Slate Star Codex has a post about checking how accurate the predictions for 2015 were; overall Scott Alexander seems pretty well calibrated. Being a born follower I decided to make a bunch of predictions to check my calibration in a year’s time.

Here is my list of predictions, with my confidence (some predictions obviously stolen):

  • No nuclear war: 99%
  • No terrorist attack in the USA will kill > 100 people: 95%
  • I will be involved in at least one published/accepted-to-publish research paper by the end of 2015: 95%
  • Vesuvius will not have a major eruption: 95%
  • I will remain at my same job through the end of 2015: 90%
  • MAX IV in Lund delivers X-rays: 90%
  • Andart II will remain active: 90%
  • Israel will not get in a large-scale war (ie >100 Israeli deaths) with any Arab state: 90%
  • US will not get involved in any new major war with death toll of > 100 US soldiers: 90%
  • New Zeeland has not decided to change current flag at end of year: 85%
  • No multi-country Ebola outbreak: 80%
  • Assad will remain President of Syria: 80%
  • ISIS will control less territory than it does right now: 80%
  • North Korea’s government will survive the year without large civil war/revolt: 80%
  • The US NSABB will allow gain of function funding: 80%
  • US presidential election: democratic win: 75%
  • A general election will be held in Spain: 75%
  • Syria’s civil war will not end this year: 75%
  • There will be no NEO with Torino Scale >0 on 31 Dec 2016: 75%
  • The Atlantic basin ACE will be below 96.2: 70%
  • Sweden does not get a seat on the UN Security Council: 70%
  • Bitcoin will end the year higher than $200: 70%
  • Another major eurozone crisis: 70%
  • Brent crude oil will end the year lower than $60 a barrel: 70%
  • I will actually apply for a UK citizenship: 65%
  • UK referendum votes to stay in EU: 65%
  • China will have a GDP growth above 5%: 65%
  • Evidence for supersymmetry: 60%
  • UK larger GDP than France: 60%
  • France GDP growth rate less than 2%: 60%
  • I will have made significant progress (4+ chapters) on my book: 55%
  • Iran nuclear deal holding: 50%
  • Apple buys Tesla: 50%
  • The Nikkei index ends up above 20,000: 50%

The point is to have enough that we can see how my calibration works.

Looking for topics leads to amusing finds like the predictions of Nostradamus for 2015. Given that language barriers remain, the dead remain dead, lifespans are less than 200, there has not been a Big One in western US nor has Vesuvius erupted, and taxes still remain, I think we can conclude he was wrong or the ability to interpret him accurately is near zero. Which of course makes his quatrains equally useless.

Bayes’ Broadsword

Yesterday I gave a talk at the joint Bloomberg-London Futurist meeting “The state of the future” about the future of decisionmaking. Parts were updates on my policymaking 2.0 talk (turned into this chapter), but I added a bit more about individual decisionmaking, rationality and forecasting.

The big idea of the talk: ensemble methods really work in a lot of cases. Not always, not perfectly, but they should be among the first tools to consider when trying to make a robust forecast or decision. They are Bayes’ broadsword:

Bayesbroadsword

Forecasting

One of my favourite experts on forecasting is J Scott Armstrong. He has stressed the importance of evidence based forecasting, including checking how well different methods work. The general answer is: not very well, yet people keep on using them. He has been pointing this out since the 70s. It also turns out that expertise only gets you so far: expert forecasts are not very reliable either, and the accuracy levels out quickly with increasing level of expertise. One implication is that one should at least get cheap experts since they are about as good as the pricey ones. It is also known that simple models for forecasting tends to be more accurate than complex ones, especially in complex and uncertain situations (see also Haldane’s “The Dog and the Frisbee”). Another important insight is that it is often better to combine different methods than try to select the one best method.

Another classic look at prediction accuracy is Philip Tetlock’s Expert Political Judgment (2005) where he looked at policy expert predictions. They were only slightly more accurate than chance, worse than basic extrapolation algorithms, and there was a negative link to fame: high profile experts have an incentive to be interesting and dramatic, but not right. However, he noticed some difference between “hedgehogs” (people with One Big Theory) and “foxes” (people using multiple theories), with the foxes outperforming hedgehogs.

OK, so in forecasting it looks like using multiple methods, theories and data sources (including experts) is a way to get better results.

Statistical machine learning

A standard problem in machine learning is to classify something into the right category from data, given a set of training examples. For example, given medical data such as age, sex, and blood test results, diagnose what a particular disease a patient might suffer from. The key problem is that it is non-trivial to construct a classifier that works well on data different from the training data. It can work badly on new data, even if it works perfectly on the training examples. Two classifiers that perform equally well during training may perform very differently in real life, or even for different data.

The obvious solution is to combine several classifiers and average (or vote about) their decisions: ensemble based systems. This reduces the risk of making a poor choice, and can in fact improve overall performance if they can specialize for different parts of the data. This also has other advantages: very large datasets can be split into manageable chunks that are used to train different components of the ensemble, tiny datasets can be “stretched” by random resampling to make an ensemble trained on subsets, outliers can be managed by “specialists”, in data fusion different types of data can be combined, and so on. Multiple weak classifiers can be combined into a strong classifier this way.

The method benefits from having diverse classifiers that are combined: if they are too similar in their judgements, there is no advantage. Estimating the right weights to give to them is also important, otherwise a truly bad classifier may influence the output.

Iris data classified using an ensemble of classification methods.
Iris data classified using an ensemble of classification methods (LDA, NBC, various kernels, decision tree). Note how the combination of classifiers also roughly indicates the overall reliability of classifications in a region.

The iconic demonstration of the power of this approach was the Netflix Prize, where different teams competed to make algorithms that predicted user ratings of films from previous ratings. As part of the rules the algorithms were made public, spurring innovation. When the competition concluded in 2009, the leading teams all consisted of ensemble methods where component algorithms were from past teams. The two big lessons were (1) that a combination of not just the best algorithms, but also less accurate algorithms, were the key to winning, and (2) that organic organization allows the emergence of far better performance than having strictly isolated teams.

Group cognition

Condorcet’s jury theorem is perhaps the classic result in group problem solving: if a group of people hold a majority vote, and each has a probability p>1/2 of voting for the correct choice, then the probability the group will vote correctly is higher than p and will tend to approach 1 as the size of the group increases. This presupposes that votes are independent, although stronger forms of the theorem have been proven. (In reality people may have different preferences so there is no clear “right answer”)

Probability that groups of different sizes will reach the correct decision as a function of the individual probability of voting right.
Probability that groups of different sizes will reach the correct decision as a function of the individual probability of voting right.

By now the pattern is likely pretty obvious. Weak decision-makers (the voters) are combined through a simple procedure (the vote) into better decision-makers.

Group problem solving is known to be pretty good at smoothing out individual biases and errors. In The Wisdom of Crowds Surowiecki suggests that the ideal crowd for answering a question in a distributed fashion has diversity of opinion, independence (each member has an opinion not determined by the other’s), decentralization (members can draw conclusions based on local knowledge), and the existence of a good aggregation process turning private judgements into a collective decision or answer.

Perhaps the grandest example of group problem solving is the scientific process, where peer review, replication, cumulative arguments, and other tools make error-prone and biased scientists produce a body of findings that over time robustly (if sometimes slowly) tends towards truth. This is anything but independent: sometimes a clever structure can improve performance. However, it can also induce all sorts of nontrivial pathologies – just consider the detrimental effects status games have on accuracy or focus on the important topics in science.

Small group problem solving on the other hand is known to be great for verifiable solutions (everybody can see that a proposal solves the problem), but unfortunately suffers when dealing with “wicked problems” lacking good problem or solution formulation. Groups also have scaling issues: a team of N people need to transmit information between all N(N-1)/2 pairs, which quickly becomes cumbersome.

One way of fixing these problems is using software and formal methods.

The Good Judgement Project (partially run by Tetlock and with Armstrong on the board of advisers) participated in the IARPA ACE program to try to improve intelligence forecasts. They used volunteers and checked their forecast accuracy (not just if they got things right, but if claims that something was 75% likely actually came true 75% of the time). This led to a plethora of fascinating results. First, accuracy scores based on the first 25 questions in the tournament predicted subsequent accuracy well: some people were consistently better than others, and it tended to remain constant. Training (such a debiasing techniques) and forming teams also improved performance. Most impressively, using the top 2% “superforecasters” in teams really outperformed the other variants. The superforecasters were a diverse group, smart but by no means geniuses, updating their beliefs frequently but in small steps.

The key to this success was that a computer- and statistics-aided process found the good forecasters and harnessed them properly (plus, the forecasts were on a shorter time horizon than the policy ones Tetlock analysed in his previous book: this both enables better forecasting, plus the all-important feedback on whether they worked).

Another good example is the Galaxy Zoo, an early crowd-sourcing project in galaxy classification (which in turn led to the Zooniverse citizen science project). It is not just that participants can act as weak classifiers and combined through a majority vote to become reliable classifiers of galaxy type. Since the type of some galaxies is agreed on by domain experts they can used to test the reliability of participants, producing better weightings. But it is possible to go further, and classify the biases of participants to create combinations that maximize the benefit, for example by using overly “trigger happy” participants to find possible rare things of interest, and then check them using both conservative and neutral participants to become certain. Even better, this can be done dynamically as people slowly gain skill or change preferences.

The right kind of software and on-line “institutions” can shape people’s behavior so that they form more effective joint cognition than they ever could individually.

Conclusions

The big idea here is that it does not matter that individual experts, forecasting methods, classifiers or team members are fallible or biased, if their contributions can be combined in such a way that the overall output is robust and less biased. Ensemble methods are examples of this.

While just voting or weighing everybody equally is a decent start, performance can be significantly improved by linking it to how well the participants perform. Humans can easily be motivated by scoring (but look out for disalignment of incentives: the score must accurately reflect real performance and must not be gameable).

In any case, actual performance must be measured. If we cannot tell if some method is more accurate than something else, then either accuracy does not matter (because it cannot be distinguished or we do not really care), or we will not get the necessary feedback to improve it. It is known from the expertise literature that one of the key factors for it to be possible to become an expert on a task is feedback.

Having a flexible structure that can change is a good approach to handling a changing world. If people have disincentives to change their mind or change teams, they will not update beliefs accurately.

I got a good question after the talk: if we are supposed to keep our models simple, how can we use these complicated ensembles? The answer is of course that there is a difference between using a complex and a complicated approach. The methods that tend to be fragile are the ones with too many free parameters, too much theoretical burden: they are the complex “hedgehogs”. But stringing together a lot of methods and weighting them appropriately merely produces a complicated model, a “fox”. Component hedgehogs are fine as long as they are weighed according to how well they actually perform.

(In fact, adding together many complex things can make the whole simpler. My favourite example is the fact that the Kolmogorov complexity of integers grows boundlessly on average, yet the complexity of the set of all integers is small – and actually smaller than some integers we can easily name. The whole can be simpler than its parts.)

In the end, we are trading Occam’s razor for a more robust tool: Bayes’ Broadsword. It might require far more strength (computing power/human interaction) to wield, but it has longer reach. And it hits hard.

Appendix: individual classifiers

I used Matlab to make the illustration of the ensemble classification. Here are some of the component classifiers. They are all based on the examples in the Matlab documentation. My ensemble classifier is merely a maximum vote between the component classifiers that assign a class to each point.

Iris data classified using a naive Bayesian classifier assuming Gaussian distributions.
Iris data classified using a naive Bayesian classifier assuming Gaussian distributions.
Iris data classified using a decision tree.
Iris data classified using a decision tree.
Iris data classified using Gaussian kernels.
Iris data classified using Gaussian kernels.
Iris data classified using linear discriminant analysis.
Iris data classified using linear discriminant analysis.

 

Energy requirements of the singularity

Infinity of Forces: The BeanstalkAfter a recent lecture about the singularity I got asked about its energy requirements. It is a good question. As my inquirer pointed out, humanity uses more and more energy and it generally has an environmental cost. If it keeps on growing exponentially, something has to give. And if there is a real singularity, how do you handle infinite energy demands?

First I will look at current trends, then different models of the singularity.

I will not deal directly with environmental costs here. They are relative to some idea of a value of an environment, and there are many ways to approach that question.

Current trends

Current computers are energy hogs. Currently general purpose computing consumes about one Petawatt-hour per year, with the entire world production somewhere above 22 Pwh.  While large data centres may be obvious, the vast number of low-power devices may be an even more significant factor; up to 10% of our electricity use may be due to ICT.

Together they perform on the order of 10^{20} operations per second, or somewhere in the zettaFLOPS range.

Koomey’s law states that the number of computations per joule of energy dissipated has been doubling approximately every 1.57 years. This might speed up as the pressure to make efficient computing for wearable devices and large data centres makes itself felt. Indeed, these days performance per watt is often more important than performance per dollar.

Meanwhile, general-purpose computing capacity has a growth rate of 58% per annum, doubling every 18 months. Since these trends cancel rather neatly, the overall energy need is not changing significantly.

The push for low-power computing may make computing greener, and it might also make other domains more efficient by moving tasks to the virtual world, making them efficient and allowing better resource allocation. On the other hand, as things become cheaper and more efficient usage tends to go up, sometimes outweighing the gain. Which trend wins out in the long run is hard to predict.

Semilog plot of global energy consumption over time.
Semilog plot of global energy (all types) consumption over time.

Looking at overall energy use trends it looks like overall energy use increases exponentially (but has stayed at roughly the same per capita level since the 1970s). In fact, plotting it on a semilog graph suggests that it is increasing faster than exponential (otherwise it would be a straight line). This is presumably due to a combination of population increase and increased energy use. The best fit exponential has a doubling time of 44.8 years.

Electricity use is also roughly exponential, with a doubling time of 19.3 years. So we might be shifting more and more to electricity, and computing might be taking over more and more of that.

Extrapolating wildly, we would need the total solar input on Earth in about 300 years and the total solar luminosity in 911 years. In about 1,613 years we would have used up the solar system’s mass energy. So, clearly, long before then these trends will break one way or another.

Physics places a firm boundary due to the Landauer principle: in order to erase on bit of information k T \ln(2) joules of energy have to be dissipated. Given current efficiency trends we will reach this limit around 2048.

The principle can be circumvented using reversible computation, either classical or quantum. But as I often like to point out, it still bites in the form of the need for error correction (erasing accidentally flipped bits) and formatting new computational resources (besides the work in turning raw materials into bits). We should hence expect a radical change in computation within a few decades, even if the cost per computation and second continues to fall exponentially.

What kind of singularity?

But how many joules of energy does a technological singularity actually need? It depends on what kind of singularity. In my own list of singularity meanings we have the following kinds:

A. Accelerating change
B. Self improving technology
C. Intelligence explosion
D. Emergence of superintelligence
E. Prediction horizon
F. Phase transition
G. Complexity disaster
H. Inflexion point
I. Infinite progress

Case A, acceleration, at first seems to imply increasing energy demands, but if efficiency grows faster they could of course go down.

Eric Chaisson has argued that energy rate density, how fast and densely energy get used (watts per kilogram), might be an indicator of complexity and growing according to a universal tendency. By this account, we should expect the singularity to have an extreme energy rate density – but it does not have to be using enormous amounts of energy if it is very small and light.

He suggests energy rate density may increase as Moore’s law, at least in our current technological setting. If we assume this to be true, then we would have \Phi(t) = \exp(kt) = P(t)/M(t), where P(t) is the power of the system and M(t) is the mass of the system at time t. One can maintain exponential growth by reducing the mass as well as increasing the power.

However, waste heat will need to be dissipated. If we use the simplest model where a radius R system with density \rho radiates it away into space, then the temperature will be T=[\rho \Phi R/3 \sigma]^{1/4}, or, if we have a maximal acceptable temperature, R < 3\sigma T^4 / \rho \Phi. So the system needs to become smaller as \Phi increases. If we use active heat transport instead (as outlined in my previous post), covering the surface with heat pipes that can remove X watts/square meter, then R < 3 X / \Phi \rho. Again, the radius will be inversely proportional to \Phi. This is similar to our current computers, where the CPU is a tiny part surrounded by cooling and energy supply.

If we assume the waste heat is just due to erasing bits, the rate of computation will be I = P/kT \ln(2) = \Phi M / kT\ln(2) = [4 \pi \rho /3 k \ln(2)] \Phi R^3 / T bits per second. Using the first cooling model gives us I \propto T^{11}/ \Phi^2 – a massive advantage for running extremely hot and dense computation. In the second cooling model I \propto \Phi^{-2}: in both cases higher energy rate densities make it harder to compute when close to the thermodynamic limit. Hence there might be an upper limit to how much we may want to push \Phi.

Also, a system with mass M will use up its own mass-energy in time Mc^2/P = c^2/\Phi: the higher the rate, the faster it will run out (and it is independent of size!). If the system is expanding at speed v it will gain and use up mass at a rate M'= 4\pi\rho v t^2 - M\Phi(t)/c^2; if \Phi grows faster than quadratic with time it will eventually run out of mass to use. Hence the exponential growth must eventually reduce simply because of the finite lightspeed.

The Chaisson scenario does not suggest a “sustainable” singularity. Rather, it suggests a local intense transformation involving small, dense nuclei using up local resources. However, such local “detonations” may then spread, depending on the long-term goals of involved entities.

Cases B, C, D (intelligence explosions, superintelligence) have an unclear energy profile. We do not know how complex code would become or what kind of computational search is needed to get to superintelligence. It could be that it is more a matter of smart insights, in which case the needs are modest, or a huge deep learning-like project involving massive amounts of data sloshing around, requiring a lot of energy.

Case E, a prediction horizon, is separate from energy use. As this essay shows, there are some things we can say about superintelligent computational systems based on known physics that likely remains valid no matter what.

Case F, phase transition, involves a change in organisation rather than computation, for example the formation of a global brain out of previously uncoordinated people. However, this might very well have energy implications. Physical phase transitions involve discontinuities of the derivatives of the free energy. If the phases have different entropies (first order transitions) there has to be some addition or release of energy. So it might actually be possible that a societal phase transition requires a fixed (and possibly large) amount of energy to reorganize everything into the new order.

There are also second order transitions. These are continuous do not have a latent heat, but show divergent susceptibilities (how much the system responds to an external forcing). These might be more like how we normally imagine an ordering process, with local fluctuations near the critical point leading to large and eventually dominant changes in how things are ordered. It is not clear to me that this kind of singularity would have any particular energy requirement.

Case G, complexity disaster, is related to superexponential growth, such as the city growth model of Bettancourt, West et al. or the work on bubbles and finite time singularities by Didier Sornette. Here the rapid growth rate leads to a crisis, or more accurately a series of crises increasingly rapidly succeeding each other until a final singularity. Beyond that the system must behave in some different manner. These models typically predict rapidly increasing resource use (indeed, this is the cause of the crisis sequence as one kind of growth runs into resource scaling problems and is replaced with another one), although as Sornette points out the post-singularity state might well be a stable non-rivalrous knowledge economy.

Case H, an inflexion point, is very vanilla. It would represent the point where our civilization is halfway from where we started to where we are going. It might correspond to “peak energy” where we shift from increasing usage to decreasing usage (for whatever reason), but it does not have to. It could just be that we figure out most physics and AI in the next decades, become a spacefaring posthuman civilization, and expand for the next few billion years, using ever more energy but not having the same intense rate of knowledge growth as during the brief early era when we went from hunter gatherers to posthumans.

Case I, infinite growth, is not normally possible in the physical universe. Information can as far as we know not be stored beyond densities set by the Bekenstein bound (I \leq k_I MR where k_I\approx 2.577\cdot 10^{43} bits per kg per meter), and we only have access to a volume 4 \pi c^3 t^3/3 with mass density \rho, so the total information growth must be bounded by I \leq 4 \pi k_I c^4 \rho t^4/3. It grows quickly, but still just polynomially.

The exception to the finitude of growth is if we approach the boundaries of spacetime. Frank J. Tipler’s omega point theory shows how information processing could go infinite in a finite (proper) time in the right kind of collapsing universe with the right kind of physics. It doesn’t look like we live in one, but the possibility is tantalizing: could we arrange the right kind of extreme spacetime collapse to get the right kind of boundary for a mini-omega? It would be way beyond black hole computing and never be able to send back information, but still allow infinite experience. Most likely we are stuck in finitude, but it won’t hurt poking at the limits.

Conclusions

Indefinite exponential growth is never possible for physical properties that have some resource limitation, whether energy, space or heat dissipation. Sooner or later they will have to shift to a slower rate of growth – polynomial for expanding organisational processes (forced to this by the dimensionality of space, finite lightspeed and heat dissipation), and declining growth rate for processes dependent on a non-renewable resource.

That does not tell us much about the energy demands of a technological singularity. We can conclude that it cannot be infinite. It might be high enough that we bump into the resource, thermal and computational limits, which may be what actually defines the singularity energy and time scale. Technological singularities may also be small, intense and localized detonations that merely use up local resources, possibly spreading and repeating. But it could also turn out that advanced thinking is very low-energy (reversible or quantum) or requires merely manipulation of high level symbols, leading to a quiet singularity.

My own guess is that life and intelligence will always expand to fill whatever niche is available, and use the available resources as intensively as possible. That leads to instabilities and depletion, but also expansion. I think we are – if we are lucky and wise – set for a global conversion of the non-living universe into life, intelligence and complexity, a vast phase transition of matter and energy where we are part of the nucleating agent. It might not be sustainable over cosmological timescales, but neither is our universe itself. I’d rather see the stars and planets filled with new and experiencing things than continue a slow dance into the twilight of entropy.

…contemplate the marvel that is existence and rejoice that you are able to do so. I feel I have the right to tell you this because, as I am inscribing these words, I am doing the same.
– Ted Chiang, Exhalation

 

Risky and rewarding robots

Robot playpenYesterday I participated in recording a radio program about robotics, and I noted that the participants were approaching the issue from several very different angles:

  • Robots as symbols: what we project things on them, what this says about humanity, how we change ourselves in respect to them, the role of hype and humanity in our thinking about them.
  • Robots as practical problem: how do you make a safe and trustworthy autonomous device that hangs around people? How do we handle responsibility for complex distributed systems that can generate ‘new’ behaviour?
  • Automation and jobs: what kinds of jobs are threatened or changed by automation? How does it change society, and how do we steer it in desirable directions – and what are they?
  • Long-term risks: how do we handle the potential risks from artificial general intelligence, especially given that many people think there are absolutely no problem and others are convinced that this could be existential if we do not figure out enough before it emerges?

In many cases the discussion got absurd because we talked past each other due to our different perspectives, but there were also some nice synergies. Trying to design automation without taking the anthropological and cultural aspects into account will lead to something that either does not work well with people or forces people to behave more machinelike. Not taking past hype cycles into account when trying to estimate future impact leads to overconfidence. Assuming that just because there has been hype in the past nothing will change is equally overconfident. The problems of trustworthiness and responsibility distribution become truly important when automating many jobs: when the automation is an essential part of the organisation, there needs to be mechanisms to trust it and to avoid dissolution of responsibility. Currently robot ethics is more about how humans are impacted by robots rather than ethics for robots, but the latter will become quite essential if we get closer to AGI.

Jobs

Robot on break

I focused on jobs, starting from the Future of Employment paper. Maarten Goos and Alan Manning pointed out that automation seems to lead to a polarisation into “lovely and lousy jobs“: more non-routine manual jobs (lousy), more non-routine cognitive jobs (lovely). The paper strongly supports this, showing that a large chunk of occupations that rely on routine tasks might be possible to automate but things requiring hand-eye coordination, human dexterity, social ability, creativity and intelligence – especially applied flexibly – are pretty safe.

Overall, the economist’s view is relatively clear: automation that embodies skills and ability to do labour can only affect the distribution of jobs and how much certain skills are valued and paid compared with others. There is no rule that if task X can be done by a machine it will be done by a machine: handmade can still pay premium, and the law of comparative advantage might mean it is not worth using the machine to do X when it can do the even more profitable task Y. Still, being entirely dependent on doing X for your living is likely a bad situation.

Also, we often underestimate the impact of “small” parts of tasks that in formal analysis don’t seem to matter. Underwriters are on paper eminently replaceable… except that the ability to notice “Hey! Those numbers don’t make sense” or judge the reliability of risk models is quite hard to implement, and actually may constitute most of their value. We care about hard to automate things like social interaction and style. And priests, politicians, prosecutors and prostitutes are all fairly secure because their jobs might inherently require being a human or representing a human.

However, the development of AI ability is not a continuous predictable curve. We get sudden surprises like the autonomous cars (just a few years ago most people believed autonomous cars were a very hard, nearly impossible problem) or statistical translation. Confluences of technology conspire to change things radically (consider the digital revolution of printing, both big and small, in the 80s that upended the world for human printers). And since we know we are simultaneously overhyping and missing trends, this should not give us a sense of complacency at all. Just because we have always failed to automate X in the past doesn’t mean X might not suddenly turn out to be automateable tomorrow: relying on X being stably in the human domain is a risky assumption, especially when thinking about career choices.

Scaling

Robin, supply, demand and robots

Robots also have another important property: we can make a lot of them if we have a reason. If there is a huge demand for humans doing X we need to retrain or have children who grow up to be Xers. That makes the price go up a lot. Robots can be manufactured relatively easily, and scaling up the manufacturing is cheaper: even if X-robots are fairly expensive, making a lot more X-robots might be cheaper than trying to get humans if X suddenly matters.

This scaling is a bit worrisome, since robots implement somebody’s action plan (maybe badly, maybe dangerously creatively): they are essentially an extension of somebody or something’s preferences. So if we could make robot soldiers, the group or side that could make the most would have a potential huge strategic advantage. Making innovations in fast manufacture becomes important, in turn leading to a situation where there is an incentive for an arms race in being able to get an army by a press of a button. This is where I think atomically precise manufacturing is potentially risky: it might enable very quick builds, and that is potentially destabilizing. But even just automatic production (remember, this is a scenario where some robotics is good enough to implement useful military action, so manufacturing robotics will be advanced too). Also, countries running mostly on export on raw materials, if they automate the production there might not be much of a need of most of the population… An economist would say the population might be used for other profitable activities, but many nasty resource-driven governments do not invest in their human capital very much. In fact, they tend to see it as a security problem.

Of course, if we ever get to the level where intellectual tasks and services close to the human scale can be done, the same might apply to more developed economies too. But at that point we are so close to automating the task of making robots and AI better that I expect an intelligence explosion to occur before any social explosions. A society where nobody needs to work might sound nice and might be very worth striving for, but in order to get there we need at the very least get close to general AI and solve its safety problems.

See also this essay: commercializing the robot ecosystem in the anthropocene.