Popper vs. Macrohistory: what can we really say about the long-term future?

Talk I gave at the Oxford Karl Popper Society:

The quick summary: Physical eschatology, futures studies and macrohistory try to talk about the long-term future in different ways. Karl Popper launched a broadside against historicism, the approach to the social sciences which assumes that historical prediction is their principal aim. While the main target was the historicism supporting socialism and fascism, the critique has also scared away many from looking at the future – a serious problem for making the social sciences useful. In the talk I look at various aspects of Popper’s critique and how damaging they are. Some parts are fairly unproblematic because they demand too high precision or determinism, and can be circumvented by using a more Bayesian approach. His main point about knowledge growth making the future impossible to determine still stands and is a major restriction on what we can say – yet there are some ways to reason about the future even with this restriction. The lack of ergodicity of history may be a new problem to recognize: we should not think it would repeat if we re-run it. That does not rule out local patterns, but the overall endpoint appears random… or perhaps selectable. Except that doing it may turn out to be very, very hard.

My main conclusions are that longtermist views like most Effective Altruism are not affected much by the indeterminacy of Popper’s critique (or the non-ergodicity issue); here the big important issue is how much we can affect the future. That seems to be an open question, well worth pursuing. Macrohistory may be set for a comeback, especially if new methodologies in experimental history, big data history, or even Popper’s own “technological social science” were developed. That one cannot reach certitude does not prevent relevant and reliable (enough) input to decisions in some domains. Knowing which domains that are is another key research issue. In futures studies the critique is largely internalized by now, but it might be worth telling other disciplines about it. To me the most intriguing conclusion is that physical eschatology needs to take the action of intelligent life into account – and that means accepting some pretty far-reaching indeterminacy and non-ergodicity on vast scales.

Thinking long-term, vast and slow

John Fowler "Long Way Down" https://www.flickr.com/photos/snowpeak/10935459325
John Fowler “Long Way Down” https://www.flickr.com/photos/snowpeak/10935459325

This spring Richard Fisher at BBC Future has commissioned a series of essays about long-termism: Deep Civilisation. I really like this effort (and not just because I get the last word):

“Deep history” is fascinating because it gives us a feeling of the vastness of our roots – not just the last few millennia, but a connection to our forgotten stone-age ancestors, their hominin ancestors, the biosphere evolving over hundreds of millions and billions of years, the planet, and the universe. We are standing on top of a massive sedimentary cliff of past, stretching down to an origin unimaginably deep below.

Yet the sky above, the future, is even more vast and deep. Looking down the 1,857 m into Grand Canyon is vertiginous. Yet above us the troposphere stretches more than five times further up, followed by an even vaster stratosphere and mesosphere, in turn dwarfed by the thermosphere… and beyond the exosphere fades into the endlessness of deep space. The deep future is in many ways far more disturbing since it is moving and indefinite.

That also means there is a fair bit of freedom in shaping it. It is not very easy to shape. But if we want to be more than just some fossils buried inside the rocks we better do it.

What kinds of grand futures are there?

I have been working for about a year on a book on “Grand Futures” – the future of humanity, starting to sketch a picture of what we could eventually achieve were we to survive, get our act together, and reach our full potential. Part of this is an attempt to outline what we know is and isn’t physically possible to achieve, part of it is an exploration of what makes a future good.

Here are some things that appear to be physically possible (not necessarily easy, but doable):

  • Societies of very high standards of sustainable material wealth. At least as rich (and likely far above) current rich nation level in terms of what objects, services, entertainment and other lifestyle ordinary people can access.
  • Human enhancement allowing far greater health, longevity, well-being and mental capacity, again at least up to current optimal levels and likely far, far beyond evolved limits.
  • Sustainable existence on Earth with a relatively unchanged biosphere indefinitely.
  • Expansion into space:
    • Settling habitats in the solar system, enabling populations of at least 10 trillion (and likely many orders of magnitude more)
    • Settling other stars in the milky way, enabling populations of at least 1029 people
    • Settling over intergalactic distances, enabling populations of at least 1038 people.
  • Survival of human civilisation and the species for a long time.
    • As long as other mammalian species – on the order of a million years.
    • As long as Earth’s biosphere remains – on the order of a billion years.
    • Settling the solar system – on the order of 5 billion years
    • Settling the Milky Way or elsewhere – on the order of trillions of years if dependent on sunlight
    • Using artificial energy sources – up to proton decay, somewhere beyond 1032 years.
  • Constructing Dyson spheres around stars, gaining energy resources corresponding to the entire stellar output, habitable space millions of times Earth’s surface, telescope, signalling and energy projection abilities that can reach over intergalactic distances.
  • Moving matter and objects up to galactic size, using their material resources for meaningful projects.
  • Performing more than a google (10100) computations, likely far more thanks to reversible and quantum computing.

While this might read as a fairly overwhelming list, it is worth noticing that it does not include gaining access to an infinite amount of matter, energy, or computation. Nor indefinite survival. I also think faster than light travel is unlikely to become possible. If we do not try to settle remote galaxies within 100 billion years accelerating expansion will move them beyond our reach. This is a finite but very large possible future.

What kinds of really good futures may be possible? Here are some (not mutually exclusive):

  • Survival: humanity survives as long as it can, in some form.
  • “Modest futures”: humanity survives for as long as is appropriate without doing anything really weird. People have idyllic lives with meaningful social relations. This may include achieving close to perfect justice, sustainability, or other social goals.
  • Gardening: humanity maintains the biosphere of Earth (and possibly other planets), preventing them from crashing or going extinct. This might include artificially protecting them from a brightening sun and astrophysical disasters, as well as spreading life across the universe.
  • Happiness: humanity finds ways of achieving extreme states of bliss or other positive emotions. This might include local enjoyment, or actively spreading minds enjoying happiness far and wide.
  • Abolishing suffering: humanity finds ways of curing negative emotions and suffering without precluding good states. This might include merely saving humanity, or actively helping all suffering beings in the universe.
  • Posthumanity: humanity deliberately evolves or upgrades itself into forms that are better, more diverse or otherwise useful, gaining access to modes of existence currently not possible to humans but equally or more valuable.
  • Deep thought: humanity develops cognitive abilities or artificial intelligence able to pursue intellectual pursuits far beyond what we can conceive of in science, philosophy, culture, spirituality and similar but as yet uninvented domains.
  • Creativity: humanity plays creatively with the universe, making new things and changing the world for its own sake.

I have no doubt I have missed many plausible good futures.

Note that there might be moral trades, where stay-at-homes agree with expansionists to keep Earth an idyllic world for modest futures and gardening while the others go off to do other things, or long-term oriented groups agreeing to give short-term oriented groups the universe during the stelliferous era in exchange for getting it during the cold degenerate era trillions of years in the future. Real civilisations may also have mixtures of motivations and sub-groups.

Note that the goals and the physical possibilities play out very differently: modest futures do not reach very far, while gardener civilisations may seek to engage in megascale engineering to support the biosphere but not settle space. Meanwhile the happiness-maximizers may want to race to convert as much matter as possible to hedonium, while the deep thought-maximizers may want to move galaxies together to create permanent hyperclusters filled with computation to pursue their cultural goals.

I don’t know what goals are right, but we can examine what they entail. If we see a remote civilization doing certain things we can make some inferences on what is compatible with the behaviour. And we can examine what we need to do today to have the best chances of getting to a trajectory towards some of these goals: avoiding getting extinct, improve our coordination ability, and figure out if we need to perform some global coordination in the long run that we need to agree on before spreading to the stars.

Checking my predictions for 2016

Last year I made a number of predictions for 2016 to see how well calibrated I am. Here is the results:

Prediction Correct?
No nuclear war: 99% 1
No terrorist attack in the USA will kill > 100 people: 95% 1 (Orlando: 50)
I will be involved in at least one published/accepted-to-publish research paper by the end of 2015: 95% 1
Vesuvius will not have a major eruption: 95% 1
I will remain at my same job through the end of 2015: 90% 1
MAX IV in Lund delivers X-rays: 90% 1
Andart II will remain active: 90% 1
Israel will not get in a large-scale war (ie >100 Israeli deaths) with any Arab state: 90% 1
US will not get involved in any new major war with death toll of > 100 US soldiers: 90% 1
New Zeeland has not decided to change current flag at end of year: 85% 1
No multi-country Ebola outbreak: 80% 1
Assad will remain President of Syria: 80% 1
ISIS will control less territory than it does right now: 80% 1
North Korea’s government will survive the year without large civil war/revolt: 80% 1
The US NSABB will allow gain of function funding: 80% 1 [Their report suggests review before funding, currently it is up to the White House to respond. ]

 

US presidential election: democratic win: 75% 0
A general election will be held in Spain: 75% 1
Syria’s civil war will not end this year: 75% 1
There will be no NEO with Torino Scale >0 on 31 Dec 2016: 75% 0 (2016 XP23 showed up on the scale according to JPL, but NEODyS Risk List gives it a zero.)
The Atlantic basin ACE will be below 96.2: 70% 0 (ACE estimate on Jan 1 is 132)
Sweden does not get a seat on the UN Security Council: 70% 0
Bitcoin will end the year higher than $200: 70% 1
Another major eurozone crisis: 70% 0
Brent crude oil will end the year lower than $60 a barrel: 70% 1
I will actually apply for a UK citizenship: 65% 0
UK referendum votes to stay in EU: 65% 0
China will have a GDP growth above 5%: 65% 1
Evidence for supersymmetry: 60% 0
UK larger GDP than France: 60% 1 (although it is a close call; estimates put France at 2421.68 and UK at 2848.76 – quite possibly this might change)
France GDP growth rate less than 2%: 60% 1
I will have made significant progress (4+ chapters) on my book: 55% 0
Iran nuclear deal holding: 50% 1
Apple buys Tesla: 50% 0
The Nikkei index ends up above 20,000: 50% 0 (nearly; the Dec 20 max was 19,494)

Overall, my Brier score is 0.1521. Which doesn’t feel too bad.

Plotting the results (where I bin together things in [0.5,0.55], [0.5,0.65], [0.7 0.75], [0.8,0.85], [0.9,0.99] bins) give this calibration plot:

Plot of average correctness of my predictions for 2016 as a function of confidence.
Plot of average correctness of my predictions for 2016 as a function of confidence (blue). Red line is perfect calibration.

Overall, I did great on my “sure bets” and fairly weakly on my less certain bets. I did not have enough questions to make this very statistically solid (coming up with good prediction questions is hard!), but the overall shape suggests that I am a bit overconfident, which is not surprising.

Time to come up with good 2017 prediction questions.

A crazy futurist writes about crazy futurists

Arjen the doomsayerWarren Ellis’ Normal is a little story about the problem of being serious about the future.

As I often point out, most people in the futures game are basically in the entertainment industry: telling wonderful or frightening stories that allow us to feel part of a bigger sweep of history, reflect a bit, and then return to the present with the reassurance that we have some foresight. Relatively little future studies is about finding decision-relevant insights and then acting on it. It exists, but it is not the bulk of future-oriented people. Taking the future seriously might require colliding with your society as you try to tell it it is going the wrong way. Worse, the conclusions might tell you that your own values and goals are wrong.

Normal takes place at a sanatorium for mad futurists in the wilds of Oregon. The idea is that if you spend too much time thinking too seriously about the big and horrifying things in the future mental illness sets in. So when futurists have nervous breakdowns they get sent by their sponsors to Normal to recover. They are useful, smart, and dedicated people but since the problems they deal with are so strange their conditions are equally unusual. The protagonist arrives just in time to encounter a bizarre locked room mystery – exactly the worst kind of thing for a place like Normal with many smart and fragile minds – driving him to investigate what is going on.

As somebody working with the future, I think the caricatures of these futurists (or rather their ideas) are spot on. There are the urbanists, the singularitarians, the neoreactionaries, the drone spooks, and the invented professional divisions. Of course, here they are mad in a way that doesn’t allow them to function in society which softballs the views: singletons and Molochs are serious real ideas that should make your stomach lurch.

The real people I know who take the future seriously are overall pretty sane. I remember a documentary filmmaker at a recent existential risk conference mildly complaining that people where so cheerful and well-adapted: doubtless some darkness and despair would have made a far more compelling imagery than chummy academics trying to salvage the bioweapons convention. Even the people involved in developing the Mutually Assured Destruction doctrine seem to have been pretty healthy. People who go off on the deep end tend to do it not because of The Future but because of more normal psychological fault lines. Maybe we are not taking the future seriously enough, but I suspect it is more a case of an illusion of control: we know we are at least doing something.

This book convinced me that I need to seriously start working on my own book project, the “glass is half full” book. Much of our research at FHI seems to be relentlessly gloomy: existential risk, AI risk, all sorts of unsettling changes to the human condition that might slurp us down into a valueless attractor asymptoting towards the end of time. But that is only part of it: there are potential futures so bright that we do not just need sunshades, but we have problems with even managing the positive magnitude in an intellectually useful way. The reason we work on existential risk is that we (1) think there is enormous positive potential value at stake, and (2) we think actions can meaningfully improve chances. That is no pessimism, quite the opposite. I can imagine Ellis or one of his characters skeptically looking at me across the table at Normal and accusing me of solutionism and/or a manic episode. Fine. I should lay out my case in due time, with enough logos, ethos and pathos to convince them (Muhahaha!).

I think the fundamental horror at the core of Normal – and yes, I regard this more as a horror story than a techno-thriller or satire – is the belief that The Future is (1) pretty horrifying and (2) unstoppable. I think this is a great conceit for a story and a sometimes necessary intellectual tonic to consider. But it is bad advice for how to live a functioning life or actually make a saner future.

 

Settling Titan, Schneier’s Law, and scenario thinking

Charles Wohlforth and Amanda R. Hendrix want us to colonize Titan. The essay irritated me in an interesting manner.

Full disclosure: they interviewed me while they were writing their book Beyond Earth: Our Path to a New Home in the Planets, which I have not read yet, and I will only be basing the following on the SciAm essay. It is not really about settling Titan either, but something that bothers me with a lot of scenario-making.

A weak case for Titan and against Luna and Mars

titan2dmapBasically the essay outlines reasons why other locations in the solar system are not good: Mercury too hot, Venus way too hot, Mars and Luna have too much radiation. Only Titan remains, with a cold environment but not too much radiation.

A lot of course hinges on the assumptions:

We expect human nature to stay the same. Human beings of the future will have the same drives and needs we have now. Practically speaking, their home must have abundant energy, livable temperatures and protection from the rigors of space, including cosmic radiation, which new research suggests is unavoidably dangerous for biological beings like us.

I am not that confident in that we will remain biological or vulnerable to radiation. But even if we decide to accept the assumptions, the case against the Moon and Mars is odd:

Practically, a Moon or Mars settlement would have to be built underground to be safe from this radiation.Underground shelter is hard to build and not flexible or easy to expand. Settlers would need enormous excavations for room to supply all their needs for food, manufacturing and daily life.

So making underground shelters is much harder than settling Titan, where buildings need to be isolated against a -179 C atmosphere and ice ground full with complex and quite likely toxic hydrocarbons. They suggest that there is no point in going to the moon to live in an underground shelter when you can do it on Earth, which is not too unreasonable – but is there a point in going to live inside an insulated environment on Titan either? The actual motivations would likely be less of a desire for outdoor activities and more scientific exploration, reducing existential risk, and maybe industrialization.

Also, while making underground shelters in space may be hard, it does not look like an insurmountable problem. The whole concern is a bit like saying submarines are not practical because the cold of the depths of the ocean will give the crew hypothermia – true, unless you add heating.

I think this is similar to Schneier’s law:

Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can’t break.

It is not hard to find a major problem with a possible plan that you cannot see a reasonable way around. That doesn’t mean there isn’t one.

Settling for scenarios

9 of Matter: The Planet GardenMaybe Wohlforth and Hendrix spent a lot of time thinking about lunar excavation issues and consistent motivations for settlements to reach a really solid conclusion, but I suspect that they came to the conclusion relatively lightly. It produces an interesting scenario: Titan is not the standard target when we discuss where humanity ought to go, and it is an awesome environment.

Similarly the “humans will be humans” scenario assumptions were presumably chosen not after a careful analysis of relative likelihood of biological and postbiological futures, but just because it is similar to the past and makes an interesting scenario. Plus human readers like reading about humans rather than robots. All together it makes for a good book.

Clearly I have different priors compared to them on the ease and rationality of Lunar/Martian excavation and postbiology. Or even giving us D. radiodurans genes.

In The Age of Em Robin Hanson argues that if we get the brain emulation scenario space settlement will be delayed until things get really weird: while postbiological astronauts are very adaptable, so much of the mainstream of civilization will be turning inward towards a few dense centers (for economics and communications reasons). Eventually resource demand, curiosity or just whatever comes after the Age of Ems may lead to settling the solar system. But that process will be pretty different even if it is done by mentally human-like beings that do need energy and protection. Their ideal environments would be energy-gradient rich, with short communications lags: Mercury, slowly getting disassembled into a hot Dyson shell, might be ideal. So here the story will be no settlement, and then wildly exotic settlement that doesn’t care much about the scenery.

But even with biological humans we can imagine radically different space settlement scenarios, such as the Gerhard O’Neill scenario where planetary surfaces are largely sidestepped for asteroids and space habitats. This is Jeff Bezo’s vision rather than Elon Musk’s and Wohlforth/Hendrix’s. It also doesn’t tell the same kind of story: here our new home is not in the planets but between them.

My gripe is not against settling Titan, or even thinking it is the best target because of some reasons. It is against settling too easily for nice scenarios.

Beyond the good story

Sometimes we settle for scenarios because they tell a good story. Sometimes because they are amenable to study among other, much less analyzable possibilities. But ideally we should aim at scenarios that inform us in a useful way about options and pathways we have.

That includes making assumptions wide enough to cover relevant options, even the less glamorous or tractable ones.

That requires assuming future people will be just as capable (or more) at solving problems: just because I can’t see a solution to X doesn’t mean it is not trivially solved in the future.

(Maybe we could call it the “Manure Principle” after the canonical example of horse manure being seen as a insoluble urban planning problem at the previous turn of century and then neatly getting resolved by unpredicted trams and cars – and just like Schneier’s law and Stigler’s law the reality is of course more complex than the story.)

In standard scenario literature there are often admonitions not just to select a “best case scenario”, “worst case scenario” and “business as usual scenario” – scenario planning comes into its own when you see nontrivial, mixed value possibilities. In particular, we want decision-relevant scenarios that make us change what we will do when we hear about them (rather than good stories, which entertain but do not change our actions). But scenarios on their own do not tell us how to make these decisions: they need to be built from our rationality and decision theory applied to their contents. Easy scenarios make it trivial to choose (cake or death?), but those choices would have been obvious even without the scenarios: no forethought needed except to bring up the question. Complex scenarios force us to think in new ways about relevant trade-offs.

The likelihood of complex scenarios is of course lower than simple scenarios (the conjunction fallacy makes us believe much more in rich stories). But if they are seen as tools for developing decisions rather than information about the future, then their individual probability is less of an issue.

In the end, good stories are lovely and worth having, but for thinking and deciding carefully we should not settle for just good stories or the scenarios that feel neat.

 

 

How much should we spread out across future scenarios?

Robin Hanson mentions that some people take him to task for working on one scenario (WBE) that might not be the most likely future scenario (“standard AI”); he responds by noting that there are perhaps 100 times more people working on standard AI than WBE scenarios, yet the probability of AI is likely not a hundred times higher than WBE. He also notes that there is a tendency for thinkers to clump onto a few popular scenarios or issues. However:

In addition, due to diminishing returns, intellectual attention to future scenarios should probably be spread out more evenly than are probabilities. The first efforts to study each scenario can pick the low hanging fruit to make faster progress. In contrast, after many have worked on a scenario for a while there is less value to be gained from the next marginal effort on that scenario.

This is very similar to my own thinking about research effort. Should we focus on things that are likely to pan out, or explore a lot of possibilities just in case one of the less obvious cases happens? Given that early progress is quick and easy, we can often get a noticeable fraction of whatever utility the topic has by just a quick dip. The effective altruist heuristic of looking at neglected fields also is based on this intuition.

A model

But under what conditions does this actually work? Here is a simple model:

There are N possible scenarios, one of which (j) will come about. They have probability P_i. We allocate a unit budget of effort to the scenarios: \sum a_i = 1. For the scenario that comes about, we get utility \sqrt{a_j} (diminishing returns).

Here is what happens if we allocate proportional to a power of the scenarios, a_i \propto P_i^\alpha. \alpha=0 corresponds to even allocation, 1 proportional to the likelihood, >1 to favoring the most likely scenarios. In the following I will run Monte Carlo simulations where the probabilities are randomly generated each instantiation. The outer bluish envelope represents the 95% of the outcomes, the inner ranges from the lower to the upper quartile of the utility gained, and the red line is the expected utility.

Utility of allocating effort as a power of the probability of scenarios. Red line is expected utility, deeper blue envelope is lower and upper quartiles, lighter blue 95% interval.

This is the N=2 case: we have two possible scenarios with probability p and 1-p (where p is uniformly distributed in [0,1]). Just allocating evenly gives us 1/\sqrt{2} utility on average, but if we put in more effort on the more likely case we will get up to 0.8 utility. As we focus more and more on the likely case there is a corresponding increase in variance, since we may guess wrong and lose out. But 75% of the time we will do better than if we just allocated evenly. Still, allocating nearly everything to the most likely case means that one does lose out on a bit of hedging, so the expected utility declines slowly for large \alpha.

Utility of allocating effort as a power of the probability of scenarios. Red line is expected utility, deeper blue envelope is lower and upper quartiles, lighter blue 95% interval. 100 possible scenarios, with uniform probability on the simplex.
Utility of allocating effort as a power of the probability of scenarios. Red line is expected utility, deeper blue envelope is lower and upper quartiles, lighter blue 95% interval. 100 possible scenarios, with uniform probability on the simplex.

The  N=100 case (where the probabilities are allocated based on a flat Dirichlet distribution) behaves similarly, but the expected utility is smaller since it is less likely that we will hit the right scenario.

What is going on?

This doesn’t seem to fit Robin’s or my intuitions at all! The best we can say about uniform allocation is that it doesn’t produce much regret: whatever happens, we will have made some allocation to the possibility. For large N this actually works out better than the directed allocation for a sizable fraction of realizations, but on average we get less utility than betting on the likely choices.

The problem with the model is of course that we actually know the probabilities before making the allocation. In reality, we do not know the likelihood of AI, WBE or alien invasions. We have some information, and we do have priors (like Robin’s view that P_{AI} < 100 P_{WBE}), but we are not able to allocate perfectly.  A more plausible model would give us probability estimates instead of the actual probabilities.

We know nothing

Let us start by looking at the worst possible case: we do not know what the true probabilities are at all. We can draw estimates from the same distribution – it is just that they are uncorrelated with the true situation, so they are just noise.

Utility of allocating effort as a power of the probability of scenarios, but the probabilities are just random guesses. Red line is expected utility, deeper blue envelope is lower and upper quartiles, lighter blue 95% interval. 100 possible scenarios, with uniform probability on the simplex.
Utility of allocating effort as a power of the probability of scenarios, but the probabilities are just random guesses. Red line is expected utility, deeper blue envelope is lower and upper quartiles, lighter blue 95% interval. 100 possible scenarios, with uniform probability on the simplex.

In this case uniform distribution of effort is optimal. Not only does it avoid regret, it has a higher expected utility than trying to focus on a few scenarios (\alpha>0). The larger N is, the less likely it is that we focus on the right scenario since we know nothing. The rationality of ignoring irrelevant information is pretty obvious.

Note that if we have to allocate a minimum effort to each investigated scenario we will be forced to effectively increase our \alpha above 0. The above result gives the somewhat optimistic conclusion that the loss of utility compared to an even spread is rather mild: in the uniform case we have a pretty low amount of effort allocated to the winning scenario, so the low chance of being right in the nonuniform case is being balanced by having a slightly higher effort allocation on the selected scenarios. For high \alpha there is a tail of rare big “wins” when we hit the right scenario that drags the expected utility upwards, even though in most realizations we bet on the wrong case. This is very much the hedgehog predictor story: ocasionally they have analysed the scenario that comes about in great detail and get intensely lauded, despite looking at the wrong things most of the time.

We know a bit

We can imagine that knowing more should allow us to gradually interpolate between the different results: the more you know, the more you should focus on the likely scenarios.

Optimal alpha as a function of how much information we have about the true probabilities. N=2.
Optimal alpha as a function of how much information we have about the true probabilities (noise due to Monte Carlo and discrete steps of alpha). N=2 (N=100 looks similar).

If we take the mean of the true probabilities with some randomly drawn probabilities (the “half random” case) the curve looks quite similar to the case where we actually know the probabilities: we get a maximum for \alpha\approx 2. In fact, we can mix in just a bit (\beta) of the true probability and get a fairly good guess where to allocate effort (i.e. we allocate effort as a_i \propto (\beta P_i + (1-\beta)Q_i)^\alpha where Q_i is uncorrelated noise probabilities). The optimal alpha grows roughly linearly with \beta, \alpha_{opt} \approx 4\beta in this case.

We learn

Adding a bit of realism, we can consider a learning process: after allocating some effort \gamma to the different scenarios we get better information about the probabilities, and can now reallocate. A simple model may be that the standard deviation of noise behaves as 1/\sqrt{\tilde{a}_i} where \tilde{a}_i is the effort placed in exploring the probability of scenario i. So if we begin by allocating uniformly we will have noise at reallocation of the order of 1/\sqrt{\gamma/N}. We can set \beta(\gamma)=\sqrt{\gamma/N}/C, where C is some constant denoting how tough it is to get information. Putting this together with the above result we get \alpha_{opt}(\gamma)=\sqrt{2\gamma/NC^2}. After this exploration, now we use the remaining 1-\gamma effort to work on the actual scenarios.

Expected utility as a function of amount of probability-estimating effort (gamma) for C=1 (hard to update probabilities), C=0.1 and C=0.01 (easy to update). N=100.
Expected utility as a function of amount of probability-estimating effort (gamma) for C=1 (hard to update probabilities), C=0.1 and C=0.01 (easy to update). N=100.

This is surprisingly inefficient. The reason is that the expected utility declines as \sqrt{1-\gamma} and the gain is just the utility difference between the uniform case \alpha=0 and optimal \alpha_{opt}, which we know is pretty small. If C is small (i.e. a small amount of effort is enough to figure out the scenario probabilities) there is an optimal nonzero  \gamma. This optimum \gamma decreases as C becomes smaller. If C is large, then the best approach is just to spread efforts evenly.

Conclusions

So, how should we focus? These results suggest that the key issue is knowing how little we know compared to what can be known, and how much effort it would take to know significantly more.

If there is little more that can be discovered about what scenarios are likely, because our state of knowledge is pretty good, the world is very random,  or improving knowledge about what will happen will be costly, then we should roll with it and distribute effort either among likely scenarios (when we know them) or spread efforts widely (when we are in ignorance).

If we can acquire significant information about the probabilities of scenarios, then we should do it – but not overdo it. If it is very easy to get information we need to just expend some modest effort and then use the rest to flesh out our scenarios. If it is doable but costly, then we may spend a fair bit of our budget on it. But if it is hard, it is better to go directly on the object level scenario analysis as above. We should not expect the improvement to be enormous.

Here I have used a square root diminishing return model. That drives some of the flatness of the optima: had I used a logarithm function things would have been even flatter, while if the returns diminish more mildly the gains of optimal effort allocation would have been more noticeable. Clearly, understanding the diminishing returns, number of alternatives, and cost of learning probabilities better matters for setting your strategy.

In the case of future studies we know the number of scenarios are very large. We know that the returns to forecasting efforts are strongly diminishing for most kinds of forecasts. We know that extra efforts in reducing uncertainty about scenario probabilities in e.g. climate models also have strongly diminishing returns. Together this suggests that Robin is right, and it is rational to stop clustering too hard on favorite scenarios. Insofar we learn something useful from considering scenarios we should explore as many as feasible.

Predictions for 2016

The ever readable Slate Star Codex has a post about checking how accurate the predictions for 2015 were; overall Scott Alexander seems pretty well calibrated. Being a born follower I decided to make a bunch of predictions to check my calibration in a year’s time.

Here is my list of predictions, with my confidence (some predictions obviously stolen):

  • No nuclear war: 99%
  • No terrorist attack in the USA will kill > 100 people: 95%
  • I will be involved in at least one published/accepted-to-publish research paper by the end of 2015: 95%
  • Vesuvius will not have a major eruption: 95%
  • I will remain at my same job through the end of 2015: 90%
  • MAX IV in Lund delivers X-rays: 90%
  • Andart II will remain active: 90%
  • Israel will not get in a large-scale war (ie >100 Israeli deaths) with any Arab state: 90%
  • US will not get involved in any new major war with death toll of > 100 US soldiers: 90%
  • New Zeeland has not decided to change current flag at end of year: 85%
  • No multi-country Ebola outbreak: 80%
  • Assad will remain President of Syria: 80%
  • ISIS will control less territory than it does right now: 80%
  • North Korea’s government will survive the year without large civil war/revolt: 80%
  • The US NSABB will allow gain of function funding: 80%
  • US presidential election: democratic win: 75%
  • A general election will be held in Spain: 75%
  • Syria’s civil war will not end this year: 75%
  • There will be no NEO with Torino Scale >0 on 31 Dec 2016: 75%
  • The Atlantic basin ACE will be below 96.2: 70%
  • Sweden does not get a seat on the UN Security Council: 70%
  • Bitcoin will end the year higher than $200: 70%
  • Another major eurozone crisis: 70%
  • Brent crude oil will end the year lower than $60 a barrel: 70%
  • I will actually apply for a UK citizenship: 65%
  • UK referendum votes to stay in EU: 65%
  • China will have a GDP growth above 5%: 65%
  • Evidence for supersymmetry: 60%
  • UK larger GDP than France: 60%
  • France GDP growth rate less than 2%: 60%
  • I will have made significant progress (4+ chapters) on my book: 55%
  • Iran nuclear deal holding: 50%
  • Apple buys Tesla: 50%
  • The Nikkei index ends up above 20,000: 50%

The point is to have enough that we can see how my calibration works.

Looking for topics leads to amusing finds like the predictions of Nostradamus for 2015. Given that language barriers remain, the dead remain dead, lifespans are less than 200, there has not been a Big One in western US nor has Vesuvius erupted, and taxes still remain, I think we can conclude he was wrong or the ability to interpret him accurately is near zero. Which of course makes his quatrains equally useless.

Bayes’ Broadsword

Yesterday I gave a talk at the joint Bloomberg-London Futurist meeting “The state of the future” about the future of decisionmaking. Parts were updates on my policymaking 2.0 talk (turned into this chapter), but I added a bit more about individual decisionmaking, rationality and forecasting.

The big idea of the talk: ensemble methods really work in a lot of cases. Not always, not perfectly, but they should be among the first tools to consider when trying to make a robust forecast or decision. They are Bayes’ broadsword:

Bayesbroadsword

Forecasting

One of my favourite experts on forecasting is J Scott Armstrong. He has stressed the importance of evidence based forecasting, including checking how well different methods work. The general answer is: not very well, yet people keep on using them. He has been pointing this out since the 70s. It also turns out that expertise only gets you so far: expert forecasts are not very reliable either, and the accuracy levels out quickly with increasing level of expertise. One implication is that one should at least get cheap experts since they are about as good as the pricey ones. It is also known that simple models for forecasting tends to be more accurate than complex ones, especially in complex and uncertain situations (see also Haldane’s “The Dog and the Frisbee”). Another important insight is that it is often better to combine different methods than try to select the one best method.

Another classic look at prediction accuracy is Philip Tetlock’s Expert Political Judgment (2005) where he looked at policy expert predictions. They were only slightly more accurate than chance, worse than basic extrapolation algorithms, and there was a negative link to fame: high profile experts have an incentive to be interesting and dramatic, but not right. However, he noticed some difference between “hedgehogs” (people with One Big Theory) and “foxes” (people using multiple theories), with the foxes outperforming hedgehogs.

OK, so in forecasting it looks like using multiple methods, theories and data sources (including experts) is a way to get better results.

Statistical machine learning

A standard problem in machine learning is to classify something into the right category from data, given a set of training examples. For example, given medical data such as age, sex, and blood test results, diagnose what a particular disease a patient might suffer from. The key problem is that it is non-trivial to construct a classifier that works well on data different from the training data. It can work badly on new data, even if it works perfectly on the training examples. Two classifiers that perform equally well during training may perform very differently in real life, or even for different data.

The obvious solution is to combine several classifiers and average (or vote about) their decisions: ensemble based systems. This reduces the risk of making a poor choice, and can in fact improve overall performance if they can specialize for different parts of the data. This also has other advantages: very large datasets can be split into manageable chunks that are used to train different components of the ensemble, tiny datasets can be “stretched” by random resampling to make an ensemble trained on subsets, outliers can be managed by “specialists”, in data fusion different types of data can be combined, and so on. Multiple weak classifiers can be combined into a strong classifier this way.

The method benefits from having diverse classifiers that are combined: if they are too similar in their judgements, there is no advantage. Estimating the right weights to give to them is also important, otherwise a truly bad classifier may influence the output.

Iris data classified using an ensemble of classification methods.
Iris data classified using an ensemble of classification methods (LDA, NBC, various kernels, decision tree). Note how the combination of classifiers also roughly indicates the overall reliability of classifications in a region.

The iconic demonstration of the power of this approach was the Netflix Prize, where different teams competed to make algorithms that predicted user ratings of films from previous ratings. As part of the rules the algorithms were made public, spurring innovation. When the competition concluded in 2009, the leading teams all consisted of ensemble methods where component algorithms were from past teams. The two big lessons were (1) that a combination of not just the best algorithms, but also less accurate algorithms, were the key to winning, and (2) that organic organization allows the emergence of far better performance than having strictly isolated teams.

Group cognition

Condorcet’s jury theorem is perhaps the classic result in group problem solving: if a group of people hold a majority vote, and each has a probability p>1/2 of voting for the correct choice, then the probability the group will vote correctly is higher than p and will tend to approach 1 as the size of the group increases. This presupposes that votes are independent, although stronger forms of the theorem have been proven. (In reality people may have different preferences so there is no clear “right answer”)

Probability that groups of different sizes will reach the correct decision as a function of the individual probability of voting right.
Probability that groups of different sizes will reach the correct decision as a function of the individual probability of voting right.

By now the pattern is likely pretty obvious. Weak decision-makers (the voters) are combined through a simple procedure (the vote) into better decision-makers.

Group problem solving is known to be pretty good at smoothing out individual biases and errors. In The Wisdom of Crowds Surowiecki suggests that the ideal crowd for answering a question in a distributed fashion has diversity of opinion, independence (each member has an opinion not determined by the other’s), decentralization (members can draw conclusions based on local knowledge), and the existence of a good aggregation process turning private judgements into a collective decision or answer.

Perhaps the grandest example of group problem solving is the scientific process, where peer review, replication, cumulative arguments, and other tools make error-prone and biased scientists produce a body of findings that over time robustly (if sometimes slowly) tends towards truth. This is anything but independent: sometimes a clever structure can improve performance. However, it can also induce all sorts of nontrivial pathologies – just consider the detrimental effects status games have on accuracy or focus on the important topics in science.

Small group problem solving on the other hand is known to be great for verifiable solutions (everybody can see that a proposal solves the problem), but unfortunately suffers when dealing with “wicked problems” lacking good problem or solution formulation. Groups also have scaling issues: a team of N people need to transmit information between all N(N-1)/2 pairs, which quickly becomes cumbersome.

One way of fixing these problems is using software and formal methods.

The Good Judgement Project (partially run by Tetlock and with Armstrong on the board of advisers) participated in the IARPA ACE program to try to improve intelligence forecasts. They used volunteers and checked their forecast accuracy (not just if they got things right, but if claims that something was 75% likely actually came true 75% of the time). This led to a plethora of fascinating results. First, accuracy scores based on the first 25 questions in the tournament predicted subsequent accuracy well: some people were consistently better than others, and it tended to remain constant. Training (such a debiasing techniques) and forming teams also improved performance. Most impressively, using the top 2% “superforecasters” in teams really outperformed the other variants. The superforecasters were a diverse group, smart but by no means geniuses, updating their beliefs frequently but in small steps.

The key to this success was that a computer- and statistics-aided process found the good forecasters and harnessed them properly (plus, the forecasts were on a shorter time horizon than the policy ones Tetlock analysed in his previous book: this both enables better forecasting, plus the all-important feedback on whether they worked).

Another good example is the Galaxy Zoo, an early crowd-sourcing project in galaxy classification (which in turn led to the Zooniverse citizen science project). It is not just that participants can act as weak classifiers and combined through a majority vote to become reliable classifiers of galaxy type. Since the type of some galaxies is agreed on by domain experts they can used to test the reliability of participants, producing better weightings. But it is possible to go further, and classify the biases of participants to create combinations that maximize the benefit, for example by using overly “trigger happy” participants to find possible rare things of interest, and then check them using both conservative and neutral participants to become certain. Even better, this can be done dynamically as people slowly gain skill or change preferences.

The right kind of software and on-line “institutions” can shape people’s behavior so that they form more effective joint cognition than they ever could individually.

Conclusions

The big idea here is that it does not matter that individual experts, forecasting methods, classifiers or team members are fallible or biased, if their contributions can be combined in such a way that the overall output is robust and less biased. Ensemble methods are examples of this.

While just voting or weighing everybody equally is a decent start, performance can be significantly improved by linking it to how well the participants perform. Humans can easily be motivated by scoring (but look out for disalignment of incentives: the score must accurately reflect real performance and must not be gameable).

In any case, actual performance must be measured. If we cannot tell if some method is more accurate than something else, then either accuracy does not matter (because it cannot be distinguished or we do not really care), or we will not get the necessary feedback to improve it. It is known from the expertise literature that one of the key factors for it to be possible to become an expert on a task is feedback.

Having a flexible structure that can change is a good approach to handling a changing world. If people have disincentives to change their mind or change teams, they will not update beliefs accurately.

I got a good question after the talk: if we are supposed to keep our models simple, how can we use these complicated ensembles? The answer is of course that there is a difference between using a complex and a complicated approach. The methods that tend to be fragile are the ones with too many free parameters, too much theoretical burden: they are the complex “hedgehogs”. But stringing together a lot of methods and weighting them appropriately merely produces a complicated model, a “fox”. Component hedgehogs are fine as long as they are weighed according to how well they actually perform.

(In fact, adding together many complex things can make the whole simpler. My favourite example is the fact that the Kolmogorov complexity of integers grows boundlessly on average, yet the complexity of the set of all integers is small – and actually smaller than some integers we can easily name. The whole can be simpler than its parts.)

In the end, we are trading Occam’s razor for a more robust tool: Bayes’ Broadsword. It might require far more strength (computing power/human interaction) to wield, but it has longer reach. And it hits hard.

Appendix: individual classifiers

I used Matlab to make the illustration of the ensemble classification. Here are some of the component classifiers. They are all based on the examples in the Matlab documentation. My ensemble classifier is merely a maximum vote between the component classifiers that assign a class to each point.

Iris data classified using a naive Bayesian classifier assuming Gaussian distributions.
Iris data classified using a naive Bayesian classifier assuming Gaussian distributions.
Iris data classified using a decision tree.
Iris data classified using a decision tree.
Iris data classified using Gaussian kernels.
Iris data classified using Gaussian kernels.
Iris data classified using linear discriminant analysis.
Iris data classified using linear discriminant analysis.

 

Energy requirements of the singularity

Infinity of Forces: The BeanstalkAfter a recent lecture about the singularity I got asked about its energy requirements. It is a good question. As my inquirer pointed out, humanity uses more and more energy and it generally has an environmental cost. If it keeps on growing exponentially, something has to give. And if there is a real singularity, how do you handle infinite energy demands?

First I will look at current trends, then different models of the singularity.

I will not deal directly with environmental costs here. They are relative to some idea of a value of an environment, and there are many ways to approach that question.

Current trends

Current computers are energy hogs. Currently general purpose computing consumes about one Petawatt-hour per year, with the entire world production somewhere above 22 Pwh.  While large data centres may be obvious, the vast number of low-power devices may be an even more significant factor; up to 10% of our electricity use may be due to ICT.

Together they perform on the order of 10^{20} operations per second, or somewhere in the zettaFLOPS range.

Koomey’s law states that the number of computations per joule of energy dissipated has been doubling approximately every 1.57 years. This might speed up as the pressure to make efficient computing for wearable devices and large data centres makes itself felt. Indeed, these days performance per watt is often more important than performance per dollar.

Meanwhile, general-purpose computing capacity has a growth rate of 58% per annum, doubling every 18 months. Since these trends cancel rather neatly, the overall energy need is not changing significantly.

The push for low-power computing may make computing greener, and it might also make other domains more efficient by moving tasks to the virtual world, making them efficient and allowing better resource allocation. On the other hand, as things become cheaper and more efficient usage tends to go up, sometimes outweighing the gain. Which trend wins out in the long run is hard to predict.

Semilog plot of global energy consumption over time.
Semilog plot of global energy (all types) consumption over time.

Looking at overall energy use trends it looks like overall energy use increases exponentially (but has stayed at roughly the same per capita level since the 1970s). In fact, plotting it on a semilog graph suggests that it is increasing faster than exponential (otherwise it would be a straight line). This is presumably due to a combination of population increase and increased energy use. The best fit exponential has a doubling time of 44.8 years.

Electricity use is also roughly exponential, with a doubling time of 19.3 years. So we might be shifting more and more to electricity, and computing might be taking over more and more of that.

Extrapolating wildly, we would need the total solar input on Earth in about 300 years and the total solar luminosity in 911 years. In about 1,613 years we would have used up the solar system’s mass energy. So, clearly, long before then these trends will break one way or another.

Physics places a firm boundary due to the Landauer principle: in order to erase on bit of information k T \ln(2) joules of energy have to be dissipated. Given current efficiency trends we will reach this limit around 2048.

The principle can be circumvented using reversible computation, either classical or quantum. But as I often like to point out, it still bites in the form of the need for error correction (erasing accidentally flipped bits) and formatting new computational resources (besides the work in turning raw materials into bits). We should hence expect a radical change in computation within a few decades, even if the cost per computation and second continues to fall exponentially.

What kind of singularity?

But how many joules of energy does a technological singularity actually need? It depends on what kind of singularity. In my own list of singularity meanings we have the following kinds:

A. Accelerating change
B. Self improving technology
C. Intelligence explosion
D. Emergence of superintelligence
E. Prediction horizon
F. Phase transition
G. Complexity disaster
H. Inflexion point
I. Infinite progress

Case A, acceleration, at first seems to imply increasing energy demands, but if efficiency grows faster they could of course go down.

Eric Chaisson has argued that energy rate density, how fast and densely energy get used (watts per kilogram), might be an indicator of complexity and growing according to a universal tendency. By this account, we should expect the singularity to have an extreme energy rate density – but it does not have to be using enormous amounts of energy if it is very small and light.

He suggests energy rate density may increase as Moore’s law, at least in our current technological setting. If we assume this to be true, then we would have \Phi(t) = \exp(kt) = P(t)/M(t), where P(t) is the power of the system and M(t) is the mass of the system at time t. One can maintain exponential growth by reducing the mass as well as increasing the power.

However, waste heat will need to be dissipated. If we use the simplest model where a radius R system with density \rho radiates it away into space, then the temperature will be T=[\rho \Phi R/3 \sigma]^{1/4}, or, if we have a maximal acceptable temperature, R < 3\sigma T^4 / \rho \Phi. So the system needs to become smaller as \Phi increases. If we use active heat transport instead (as outlined in my previous post), covering the surface with heat pipes that can remove X watts/square meter, then R < 3 X / \Phi \rho. Again, the radius will be inversely proportional to \Phi. This is similar to our current computers, where the CPU is a tiny part surrounded by cooling and energy supply.

If we assume the waste heat is just due to erasing bits, the rate of computation will be I = P/kT \ln(2) = \Phi M / kT\ln(2) = [4 \pi \rho /3 k \ln(2)] \Phi R^3 / T bits per second. Using the first cooling model gives us I \propto T^{11}/ \Phi^2 – a massive advantage for running extremely hot and dense computation. In the second cooling model I \propto \Phi^{-2}: in both cases higher energy rate densities make it harder to compute when close to the thermodynamic limit. Hence there might be an upper limit to how much we may want to push \Phi.

Also, a system with mass M will use up its own mass-energy in time Mc^2/P = c^2/\Phi: the higher the rate, the faster it will run out (and it is independent of size!). If the system is expanding at speed v it will gain and use up mass at a rate M'= 4\pi\rho v t^2 - M\Phi(t)/c^2; if \Phi grows faster than quadratic with time it will eventually run out of mass to use. Hence the exponential growth must eventually reduce simply because of the finite lightspeed.

The Chaisson scenario does not suggest a “sustainable” singularity. Rather, it suggests a local intense transformation involving small, dense nuclei using up local resources. However, such local “detonations” may then spread, depending on the long-term goals of involved entities.

Cases B, C, D (intelligence explosions, superintelligence) have an unclear energy profile. We do not know how complex code would become or what kind of computational search is needed to get to superintelligence. It could be that it is more a matter of smart insights, in which case the needs are modest, or a huge deep learning-like project involving massive amounts of data sloshing around, requiring a lot of energy.

Case E, a prediction horizon, is separate from energy use. As this essay shows, there are some things we can say about superintelligent computational systems based on known physics that likely remains valid no matter what.

Case F, phase transition, involves a change in organisation rather than computation, for example the formation of a global brain out of previously uncoordinated people. However, this might very well have energy implications. Physical phase transitions involve discontinuities of the derivatives of the free energy. If the phases have different entropies (first order transitions) there has to be some addition or release of energy. So it might actually be possible that a societal phase transition requires a fixed (and possibly large) amount of energy to reorganize everything into the new order.

There are also second order transitions. These are continuous do not have a latent heat, but show divergent susceptibilities (how much the system responds to an external forcing). These might be more like how we normally imagine an ordering process, with local fluctuations near the critical point leading to large and eventually dominant changes in how things are ordered. It is not clear to me that this kind of singularity would have any particular energy requirement.

Case G, complexity disaster, is related to superexponential growth, such as the city growth model of Bettancourt, West et al. or the work on bubbles and finite time singularities by Didier Sornette. Here the rapid growth rate leads to a crisis, or more accurately a series of crises increasingly rapidly succeeding each other until a final singularity. Beyond that the system must behave in some different manner. These models typically predict rapidly increasing resource use (indeed, this is the cause of the crisis sequence as one kind of growth runs into resource scaling problems and is replaced with another one), although as Sornette points out the post-singularity state might well be a stable non-rivalrous knowledge economy.

Case H, an inflexion point, is very vanilla. It would represent the point where our civilization is halfway from where we started to where we are going. It might correspond to “peak energy” where we shift from increasing usage to decreasing usage (for whatever reason), but it does not have to. It could just be that we figure out most physics and AI in the next decades, become a spacefaring posthuman civilization, and expand for the next few billion years, using ever more energy but not having the same intense rate of knowledge growth as during the brief early era when we went from hunter gatherers to posthumans.

Case I, infinite growth, is not normally possible in the physical universe. Information can as far as we know not be stored beyond densities set by the Bekenstein bound (I \leq k_I MR where k_I\approx 2.577\cdot 10^{43} bits per kg per meter), and we only have access to a volume 4 \pi c^3 t^3/3 with mass density \rho, so the total information growth must be bounded by I \leq 4 \pi k_I c^4 \rho t^4/3. It grows quickly, but still just polynomially.

The exception to the finitude of growth is if we approach the boundaries of spacetime. Frank J. Tipler’s omega point theory shows how information processing could go infinite in a finite (proper) time in the right kind of collapsing universe with the right kind of physics. It doesn’t look like we live in one, but the possibility is tantalizing: could we arrange the right kind of extreme spacetime collapse to get the right kind of boundary for a mini-omega? It would be way beyond black hole computing and never be able to send back information, but still allow infinite experience. Most likely we are stuck in finitude, but it won’t hurt poking at the limits.

Conclusions

Indefinite exponential growth is never possible for physical properties that have some resource limitation, whether energy, space or heat dissipation. Sooner or later they will have to shift to a slower rate of growth – polynomial for expanding organisational processes (forced to this by the dimensionality of space, finite lightspeed and heat dissipation), and declining growth rate for processes dependent on a non-renewable resource.

That does not tell us much about the energy demands of a technological singularity. We can conclude that it cannot be infinite. It might be high enough that we bump into the resource, thermal and computational limits, which may be what actually defines the singularity energy and time scale. Technological singularities may also be small, intense and localized detonations that merely use up local resources, possibly spreading and repeating. But it could also turn out that advanced thinking is very low-energy (reversible or quantum) or requires merely manipulation of high level symbols, leading to a quiet singularity.

My own guess is that life and intelligence will always expand to fill whatever niche is available, and use the available resources as intensively as possible. That leads to instabilities and depletion, but also expansion. I think we are – if we are lucky and wise – set for a global conversion of the non-living universe into life, intelligence and complexity, a vast phase transition of matter and energy where we are part of the nucleating agent. It might not be sustainable over cosmological timescales, but neither is our universe itself. I’d rather see the stars and planets filled with new and experiencing things than continue a slow dance into the twilight of entropy.

…contemplate the marvel that is existence and rejoice that you are able to do so. I feel I have the right to tell you this because, as I am inscribing these words, I am doing the same.
– Ted Chiang, Exhalation