The Universe Today wrote an article about a paper by me, Toby and Eric about the Fermi Paradox. The preprint can be found on Arxiv (see also our supplements: 1,2,3 and 4). Here is a quick popular overview/FAQ.

# TL;DR

• The Fermi question is not a paradox: it just looks like one if one is overconfident in how well we know the Drake equation parameters.
• Our distribution model shows that there is a large probability of little-to-no alien life, even if we use the optimistic estimates of the existing literature (and even more if we use more defensible estimates).
• The Fermi observation makes the most uncertain priors move strongly, reinforcing the rare life guess and an early great filter.
• Getting even a little bit more information can update our belief state a lot!

# So, do you claim we are alone in the universe?

No. We claim we could be alone, and the probability is non-negligible given what we know… even if we are very optimistic about alien intelligence.

# What is the paper about?

The Fermi Paradox – or rather the Fermi Question – is “where are the aliens?” The universe is immense and old and intelligent life ought to be able to spread or signal over vast distances, so if it has some modest probability we ought to see some signs of intelligence. Yet we do not. What is going on? The reason it is called a paradox is that is there is a tension between one plausible theory ([lots of sites]x[some probability]=[aliens]) and an observation ([no aliens]).

## Dissolving the Fermi paradox: there is not much tension

We argue that people have been accidentally misled to feel there is a problem by being overconfident about the probability.

$N=R_*\cdot f_p \cdot n_e \cdot f_l \cdot f_i \cdot f_c \cdot L$

The problem lies in how we estimate probabilities from a product of uncertain parameters (as the Drake equation above). The typical way people informally do this with the equation is to admit that some guesses are very uncertain, give a “representative value” and end up with some estimated number of alien civilisations in the galaxy – which is admitted to be uncertain, yet there is a single number.

Obviously, some authors have argued for very low probabilities, typically concluding that there is just one civilisation per galaxy (“the $N\approx 1$ school”). This may actually still be too much, since that means we should expect signs of activity from nearly any galaxy. Others give slightly higher guesstimates and end up with many civilisations, typically as many as one expects civilisations to last (“the $N\approx L$ school”). But the proper thing to do is to give a range of estimates, based on how uncertain we actually are, and get an output that shows the implied probability distribution of the number of alien civilisations.

If one combines either published estimates or ranges compatible with current scientific uncertainty we get a distribution that makes observing an empty sky unsurprising – yet is also compatible with us not being alone.

The reason is that even if one takes a pretty optimistic view (the published estimates are after all biased towards SETI optimism since the sceptics do not write as many papers on the topic) it is impossible to rule out a very sparsely inhabited universe, yet the mean value may be a pretty full galaxy. And current scientific uncertainties of the rates of life and intelligence emergence are more than enough to create a long tail of uncertainty that puts a fair credence on extremely low probability – probabilities much smaller than what one normally likes to state in papers. We get a model where there is 30% chance we are alone in the visible universe, 53% chance in the Milky Way… and yet the mean number is 27 million and the median about 1! (see figure below)

This is a statement about knowledge and priors, not a measurement: armchair astrobiology.

## The Great Filter: lack of obvious aliens is not strong evidence for our doom

After this result, we look at the Great Filter. We have reason to think at least one term in the Drake equation is small – either one of the early ones indicating how much life or intelligence emerges, or one of the last one that indicate how long technological civilisations survive. The small term is “the Filter”. If the Filter is early, that means we are rare or unique but have a potentially unbounded future. If it is a late term, in our future, we are doomed – just like all the other civilisations whose remains would litter the universe. This is worrying. Nick Bostrom argued that we should hope we do not find any alien life.

Our paper gets a somewhat surprising result: when updating our uncertainties in the light of no visible aliens, it reduces our estimate of the rate of life and intelligence emergence (the early filters) much more than the longevity factor (the future filter).

The reason is that if we exclude the cases where our galaxy is crammed with alien civilisations – something like the Star Wars galaxy where every planet has its own aliens – then that leads to an update of the parameters of the Drake equation. All of them become smaller, since we will have a more empty universe. But the early filter ones – life and intelligence emergence – change much more downwards than the expected lifespan of civilisations since they are much more uncertain (at least 100 orders of magnitude!) than the merely uncertain future lifespan (just 7 orders of magnitude!).

So this is good news: the stars are not foretelling our doom!

Note that a past great filter does not imply our safety.

The conclusion can be changed if we reduce the uncertainty of the past terms to less than 7 orders of magnitude, or the involved  probability distributions have weird shapes. (The mathematical proof is in supplement IV, which applies to uniform and normal distributions. It is possible to add tails and other features that breaks this effect – yet believing such distributions of uncertainty requires believing rather strange things. )

# Isn’t this armchair astrobiology?

Yes. We are after all from the philosophy department.

The point of the paper is how to handle uncertainties, especially when you multiply them together or combine them in different ways. It is also about how to take lack of knowledge into account. Our point is that we need to make knowledge claims explicit – if you claim you know a parameter to have the value 0.1 you better show a confidence interval or an argument about why it must have exactly that value (and in the latter case, better take your own fallibility into account). Combining overconfident knowledge claims can produce biased results since they do not include the full uncertainty range: multiplying point estimates together produces a very different result than when looking at the full distribution.

All of this is epistemology and statistics rather than astrobiology or SETI proper. But SETI makes a great example since it is a field where people have been learning more and more about (some) of the factors.

The same approach as we used in this paper can be used in other fields. For example, when estimating risk chains in systems (like the risk of a pathogen escaping a biosafety lab) taking uncertainties in knowledge will sometimes produce important heavy tails that are irreducible even when you think the likely risk is acceptable. This is one reason risk estimates tend to be overconfident.

# Probability?

What kind of distributions are we talking about here? Surely we cannot speak of the probability of alien intelligence given the lack of data?

There is a classic debate in probability between frequentists, claiming probability is the frequency of events that we converge to when an experiment is repeated indefinitely often, and Bayesians, claiming probability represents states of knowledge that get updated when we get evidence. We are pretty Bayesian.

The distributions we are talking about are distributions of “credences”: how much you believe certain things. We start out with a prior credence based on current uncertainty, and then discuss how this gets updated if new evidence arrives. While the original prior beliefs may come from shaky guesses they have to be updated rigorously according to evidence, and typically this washes out the guesswork pretty quickly when there is actual data. However, even before getting data we can analyse how conclusions must look if different kinds of information arrives and updates our uncertainty; see supplement II for a bunch of scenarios like “what if we find alien ruins?”, “what if we find a dark biosphere on Earth?” or “what if we actually see aliens at some distance?”

# Correlations?

Our use of the Drake equation assumes the terms are independent of each other. This of course is a result of how Drake sliced things into naturally independent factors. But there could be correlations between them. Häggström and Verendel showed that in worlds where the priors are strongly correlated updates about the Great Filter can get non-intuitive.

We deal with this in supplement II, and see also this blog post. Basically, it doesn’t look like correlations are likely showstoppers.

# You can’t resample guesses from the literature!

Sure can. As long as we agree that this is not so much a statement about what is actually true out there, but rather the range of opinions among people who have studied the question a bit. If people give answers to a question in the range from ten to a hundred, that tells you something about their beliefs, at least.

What the resampling does is break up the possibly unconscious correlation between answers (“the $N\approx 1$ school” and “the $N\approx L$ school” come to mind). We use the ranges of answers as a crude approximation to what people of good will think are reasonable numbers.

You may say “yeah, but nobody is really an expert on these things anyway”. We think that is wrong. People have improved their estimates as new data arrives, there are reasons for the estimates and sometimes vigorous debate about them. We warmly recommend Vakoch, D. A., Dowd, M. F., & Drake, F. (2015). The Drake Equation. The Drake Equation, Cambridge, UK: Cambridge University Press, 2015 for a historical overview. But at the same time these estimates are wildly uncertain, and this is what we really care about. Good experts qualify the certainty of their predictions.

## But doesn’t resampling from admittedly overconfident literature constitute “garbage in, garbage out”?

Were we trying to get the true uncertainties (or even more hubristically, the true values) this would not work: we have after all good reasons to suspect these ranges are both biased and overconfidently narrow. But our point is not that the literature is right, but that even if one were to use the overly narrow and likely overly optimistic estimates as estimates of actual uncertainty the broad distribution will lead to our conclusions. Using the literature is the most conservative case.

Note that we do not base our later estimates on the literature estimate but our own estimates of scientific uncertainty. If they are GIGO it is at least our own garbage, not recycled garbage. (This reading mistake seems to have been made on Starts With a Bang).

# What did the literature resampling show?

An overview can be found in Supplement III. The most important point is just that even estimates of super-uncertain things like the probability of life lies in a surprisingly narrow range of values, far more narrow than is scientifically defensible. For example, $f_l$ has five estimates ranging from $10^{-30}$ to $10^{-5}$, and all the rest are in the range $10^{-3}$ to 1. $f_i$ is even worse, with one microscopic and nearly all the rest between one in a thousand to one.

It also shows that estimates that are likely biased towards optimism (because of publication bias) can be used to get a credence distribution that dissolves the paradox once they are interpreted as ranges. See the above figure, were we get about 30% chance of being alone in the Milky Way and 8% chance of being alone in the visible universe… but a mean corresponding to 27 million civilisations in the galaxy and a median of about a hundred.

There are interesting patterns in the data. When plotting the expected number of civilisations in the Milky Way based on estimates from different eras the number goes down with time: the community has clearly gradually become more pessimistic. There are some very pessimistic estimates, but even removing them doesn’t change the overall structure.

# What are our assumed uncertainties?

A key point in the paper is trying to quantify our uncertainties somewhat rigorously. Here is a quick overview of where I think we are, with the values we used in our synthetic model:

• $N_*$: the star formation rate in the Milky Way per year is fairly well constrained. The actual current uncertainty is likely less than 1 order of magnitude (it can vary over 5 orders of magnitude in other galaxies). In our synthetic model we put this parameter as log-uniform from 1 to 100.
• $f_p$: the fraction of systems with planets is increasingly clear ≈1. We used log-uniform from 0.1 to 1.
• $n_e$: number of Earth-like in systems with planets.
• This ranges from rare earth arguments ($<10^{-12}$) to >1. We used log-uniform from 0.1 to 1 since recent arguments have shifted away from rare Earths, but we checked that adding it did not change the conclusions much.
• $f_l$: Fraction of Earthlike planets with life.
• This is very uncertain; see below for our arguments that the uncertainty ranges over perhaps 100 orders of magnitude.
• There is an absolute lower limit due to ergodic repetition: $f_l >10^{-10^{115}}$ – in an infinite universe there will eventually be randomly generated copies of Earth and even the entire galaxy (at huge distances from each other). Observer selection effects make using the earliness of life on Earth problematic.
• We used a log-normal rate of abiogenesis that was transformed to a fraction distribution.
• $f_i$: Fraction of lifebearing planets with intelligence/complex life.
• This is very uncertain; see below for our arguments that the uncertainty ranges over perhaps 100 orders of magnitude.
• One could argue there has been 5 billion species so far and only 1 intelligent, so we know $f_i>2\cdot 10^{-10}$. But one could argue that we should count assemblages of 10 million species, which gives a fraction 1/500 per assemblage. Observer selection effects may be distorting this kind of argument.
• We could have used a log-normal rate of complex life emergence that was transformed to a fraction distribution or a broad log-linear distribution. Since this would have made many graphs hard to interpret we used log-uniform from 0.001 to 1, not because we think this likely but just as a simple illustration (the effect of the full uncertainty is shown in Supplement II).
• $f_c$: Fraction of time when it is communicating.
• Very uncertain; humanity is 0.000615 so far. We used log-uniform from 0.01 to 1.
• $L$: Average lifespan of a civilisation.
• Fairly uncertain; $50? years (upper limit because of the Drake equation applicability: it assumes the galaxy is in a steady state, and if civilisations are long-lived enough they will still be accumulating since the universe is too young.)
• We used log-uniform from 100 to 10,000,000,000.

Note that this is to some degree a caricature of current knowledge, rather than an attempt to represent it perfectly. Fortunately our argument and conclusions are pretty insensitive to the details – it is the vast ranges of uncertainty that are doing the heavy lifting.

## Abiogenesis

Why do we think the fraction of planets with life parameters could have a huge range?

First, instead of thinking in terms of the fraction of planets having life, consider a rate of life formation in suitable environments: what is the induced probability distribution? The emergence is a physical/chemical transition of some kind of primordial soup, and transition events occur in this medium at some rate per unit volume: $f_L\approx \lambda V t$ where $V$ is the available volume and $t$ is the available time. High rates would imply that almost all suitable planets originate life, while low rates would imply that almost no suitable planets originate life.

The uncertainty regarding the length of time when it is possible is at least 3 orders of magnitude ($10^7-10^{10}$ years).

The uncertainty regarding volumes spans 20+ orders of magnitude – from entire oceans to brine pockets on ice floes.

Uncertainty regarding transition rates can span 100+ orders of magnitude! The reason is that this might involve combinatoric flukes (you need to get a fairly longish sequence of parts into the right sequence to get the right kind of replicator), or that it is like the protein folding problem where Levinthal’s paradox shows that it takes literally astronomical time to get entire oceans of copies of a protein to randomly find the correctly folded position (actual biological proteins “cheat” by being evolved to fold neatly and fast). Even chemical reaction rates span 100 orders of magnitude. On the other hand, spontaneous generation could conceivably be common and fast! So we should conclude that $\lambda$ has an uncertainty range of at least 100 orders of magnitude.

Actual abiogenesis will involve several steps. Some are easy, like generating simple organic compounds (plentiful in asteroids, comets and Miller-Urey experiment). Some are likely tough. People often overlook that even how to get proteins and nucleic acids in a watery environment is somewhat of a mystery since these chains tend to hydrolyze; the standard explanation is to look for environments that have a wet-dry cycle allowing complexity to grow. But this means $V$ is much smaller than an ocean.

That we have tremendous uncertainty about abiogenesis does not mean we do not know anything. We know a lot. But at present we have no good scientific reasons to believe we know the rate of life formation per liter-second. That will hopefully change.

## Doesn’t creationists argue stuff like this?

There is a fair number of examples of creationists arguing that the origin of life must be super-unlikely and hence we must believe in their particular god.

The problem(s) with this kind of argument is that it presupposes that there is only one planet, and somehow we got a one-in-a-zillion chance on that one. That is pretty unlikely. But the reality is that there is a zillion planets, so even if there is a one-in-a-zillion chance for each of them we should expect to see life somewhere… especially since being a living observer is a precondition for “seeing life”! Observer selection effects really matter.

We are also not arguing that life has to be super-unlikely. In the paper our distribution of life emergence rate actually makes it nearly universal 50% of the time – it includes the possibility that life will spontaneously emerge in any primordial soup puddle left alone for a few minutes. This is a possibility I doubt anybody believes in, but it could be that would-be new life is emerging right under our noses all the time, only to be outcompeted by the advanced life that already exists.

Creationists make a strong claim that they know $f_l \ll 1$; this is not really supported by what we know. But $f_l \ll 1$ is totally within possibility.

## Complex life

Even if you have life, it might not be particularly good at evolving. The reasoning is that it needs to have a genetic encoding system that is both rigid enough to function efficiently and fluid enough to allow evolutionary exploration.

All life on Earth shares almost exactly the same genetic systems, showing that only rare and minor changes have occurred in $\approx 10^{40}$ cell divisions. That is tremendously stable as a system. Nonetheless, it is fairly commonly believed that other genetic systems preceded the modern form. The transition to the modern form required major changes (think of upgrading an old computer from DOS to Windows… or worse, from CP/M to DOS!). It would be unsurprising if the rate was < 1 per $10^{100}$ cell divisions given the stability of our current genetic system – but of course, the previous system might have been super-easy to upgrade.

Modern genetics required >1/5 of the age of the universe to evolve intelligence. A genetic system like the one that preceded ours might both be stable over a google cell divisions and evolve more slowly by a factor of 10, and run out the clock. Hence some genetic systems may be incapable of ever evolving intelligence.

This related to a point made by Brandon Carter much earlier, where he pointed out that the timescales of getting life, evolving intelligence and how long biospheres last are independent and could be tremendously different – that life emerged early on Earth may have been a fluke due to the extreme difficulty of also getting intelligence within this narrow interval (on all the more likely worlds there are no observers to notice). If there are more difficult transitions, you get an even stronger observer selection effect.

Evolution goes down branches without looking ahead, and we can imagine that it could have an easier time finding inflexible coding systems (“B life”) unlike our own nice one (“A life”). If the rate of discovering B-life is $\lambda_B$ and the rate of discovering capable A-life is $\lambda_A$, then the fraction of A-life in the universe is just $\lambda_A/\lambda_B$ – and rates can differ many orders of magnitude, producing a life-rich but evolution/intelligence-poor universe. Multiple step models add integer exponents to rates: these the multiply order of magnitude differences.

So we have good reasons to think there could be a hundred orders of magnitude uncertainty on the intelligence parameter, even without trying to say something about evolution of nervous systems.

# How much can we rule out aliens?

Humanity has not scanned that many stars, so obviously we have checked even a tiny part of the galaxy – and could have missed them even if we looked at the right spot. Still, we can model how this weak data updates our beliefs (see Supplement II).

The strongest argument against aliens is the Tipler-Hart argument that settling the Milky Way, even when you are expanding at low speed, will only take a fraction of its age. And once a civilisation is everywhere it is hard to have it go extinct everywhere – it will tend to persist even if local pieces crash. Since we do not seem to be in a galaxy paved over by an alien supercivilisation we have a very strong argument to assume a low rate of intelligence emergence. Yes, even if if 99% of civilisations stay home or we could be in an alien zoo, you still get a massive update against a really settled galaxy. In our model the probability of less than one civilisation per galaxy went from 52% to 99.6% if one include the basic settlement argument.

The G-hat survey of galaxies, looking for signs of K3 civilisations, did not find any. Again, maybe we missed something or most civilisations don’t want to re-engineer galaxies, but if we assume about half of them want to and have 1% chance of succeeding we get an update from 52% chance of less than one civilisation per galaxy to 66%.

Using models of us looking at about 1,000 stars or that we do not think there is any civilisation within 18 pc gives a milder update, from 52% to 53 and 57% respectively. These just rule out super-densely inhabited scenarios.

# So what? What is the use of this?

People like to invent explanations for the Fermi paradox that all would have huge implications for humanity if they were true – maybe we are in a cosmic zoo, maybe there are interstellar killing machines out there, maybe singularity is inevitable, maybe we are the first civilisation ever, maybe intelligence is a passing stagemaybe the aliens are sleeping… But if you are serious about thinking about the future of humanity you want to be rigorous about this. This paper shows that current uncertainties actually force us to be very humble about these possible explanations – we can’t draw strong conclusions from the empty sky yet.

But uncertainty can be reduced! We can learn more, and that will change our knowledge.

From a SETI perspective, this doesn’t say that SETI is unimportant or doomed to failure, but rather that if we ever see even the slightest hint of intelligence out there many parameters will move strongly. Including the all-important $L$.

From an astrobiology perspective, we hope we have pointed at some annoyingly uncertain factors and that this paper can get more people to work on reducing the uncertainty. Most astrobiologists we have talked with are aware of the uncertainty but do not see the weird knock-on-effects from it. Especially figuring out how we got our fairly good coding system and what the competing options are seems very promising.

Even if we are not sure we can also update our plans in the light of this. For example, in my tech report about settling the universe fast I pointed out that if one is uncertain about how much competition there might be for the universe one can use one’s probability estimates to decide on the range to aim for.

## Uncertainty matters

Perhaps the most useful insight is that uncertainty matters and we should learn to embrace it carefully rather than assume that apparently specific numbers are better.

Perhaps never in the history of science has an equation been devised yielding values differing by eight orders of magnitude. . . . each scientist seems to bring his own prejudices and assumptions to the problem.
History of Astronomy: An Encyclopedia, ed. by John Lankford, s.v. “SETI,” by Steven J. Dick, p. 458.

When Dick complained about the wide range of results from the Drake equation he likely felt it was too uncertain to give any useful result. But 8 orders of magnitude differences is in this case just a sign of downplaying our uncertainty and overestimating our knowledge! Things gets much better when we look at what we know and don’t know, figuring out the implications from both.

Jill Tarter said the Drake equation was “a wonderful way to organize our ignorance”, which we think is closer to the truth than demanding a single number as an answer.

# Ah, but I already knew this!

We have encountered claims that “nobody” really is naive about using the Drake equation. Or at least not any “real” SETI and astrobiology people. Strangely enough people never seem to make this common knowledge visible, and a fair number of papers make very confident statements about “minimum” values for life probabilities that we think are far, far above the actual scientific support.

Sometimes we need to point out the obvious explicitly.

[Edit 2018-06-30: added the GIGO section]

# Survivorship curves and existential risk

In a discussion Dennis Pamlin suggested that one could make a mortality table/survival curve for our species subject to existential risk, just as one can do for individuals. This also allows demonstrations of how changes in risk affect the expected future lifespan. This post is a small internal FHI paper I did just playing around with survivorship curves and other tools of survival analysis to see what they add to considerations of existential risk. The outcome was more qualitative than quantitative: I do not think we know enough to make a sensible mortality table. But it does tell us a few useful things:

• We should try to reduce ongoing “state risks” as early as possible
• Discrete “transition risks” that do not affect state risks matters less; we may want to put them off indefinitely.
• Indefinite survival is possible if we make hazard decrease fast enough.

# Simple model

A first, very simple model: assume a fixed population and power-law sized disasters that randomly kill a number of people proportional to their size every unit of time (if there are survivors, then they repopulate until next timestep). Then the expected survival curve is an exponential decay.

This is in fact independent of the distribution, and just depends on the chance of exceedance. If disasters happen at a rate $\lambda$ and the probability of extinction $\Pr(X>\mathrm{population}) = p$, then the curve is $S(t) = \exp(-p \lambda t).$

This can be viewed as a simple model of state risks, the ongoing background of risk to our species from e.g. asteroids and supernovas.

## Correlations

What if the population rebound is slower than the typical inter-disaster interval? During the rebound the population is more vulnerable to smaller disasters. However, if we average over longer time than the rebound time constant we end up with the same situation as before: an adjusted, slightly higher hazard, but still an exponential.

In ecology there has been a fair number of papers analyzing how correlated environmental noise affects extinction probability, generally concluding that correlated (“red”) noise is bad (e.g. (Ripa and Lundberg 1996), (Ovaskainen and Meerson 2010)) since the adverse conditions can be longer than the rebound time.

If events behave in a sufficiently correlated manner, then the basic survival curve may be misleading since it only shows the mean ensemble effect rather than the tail risks. Human societies are also highly path dependent over long timescales: our responses can create long memory effects, both positive and negative, and this can affect the risk autocorrelation.

## Population growth

If population increases exponentially at a rate $G$ and is reduced by disasters, then initially some instances will be wiped out, but many realizations achieve takeoff where they grow essentially forever. As the population becomes larger, risk declines as $\exp(- \alpha G t).$

This is somewhat similar to Stuart’s and my paper on indefinite survival using backups: when we grow fast enough there is a finite chance of surviving indefinitely. The growth may be in terms of individuals (making humanity more resilient to larger and larger disasters), or in terms of independent groups (making humanity more resilient to disasters affecting a location). If risks change in size in proportion to population or occur in different locations in a correlated manner this basic analysis may not apply.

# General cases

Overall, if there is a constant rate of risk, then we should expect exponential survival curves. If the rate grows or declines as a power $t^k$ of time, we get a Weibull distribution of time to extinction, which has a “stretched exponential” survival curve: $\exp(-t/ \lambda)^k.$

If we think of risk increasing from some original level to a new higher level, then the survival curve will essentially be piece-wise exponential with a more or less softly interpolating “knee”.

## Transition risks

A transition risk is essentially an impulse of hazard. We can treat it as a Dirac delta function with some weight $w$ at a certain time $t$, in which case it just reduces the survival curve so $\frac{S(\mathrm{after }t)}{S(\mathrm{before }t)}=w$. If $t$ is randomly distributed it produces a softer decline, but with the same magnitude.

## Rectangular survival curves

Human individual survival curves are rectangularish because of exponentially increasing hazard plus some constant hazard (the Gompertz-Makeham law of mortality). The increasing hazard is due to ageing: old people are more vulnerable than young people.

Do we have any reason to believe a similar increasing hazard for humanity? Considering the invention of new dangerous technologies as adding more state risk we should expect at least enough of an increase to get a more convex shape of the survival curve in the present era, possibly with transition risk steps added in the future. This was counteracted by the exponential growth of human population until recently.

## How do species survival curves look in nature?

There is “van Valen’s law of extinction” claiming the normal extinction rate remains constant at least within families, finding exponential survivorship curves (van Valen 1973). It is worth noting that the extinction rate is different for different ecological niches and types of organisms.

However, fits with Weibull distributions seem to work better for Cenozoic foraminifera than exponentials (Arnold, Parker and Hansard 1995), suggesting the probability of extinction increases with species age. The difference in shape is however relatively small (k≈1.2), making the probability increase from 0.08/Myr at 1 Myr to 0.17/Myr at 40 Myr. Other data hint at slightly slowing extinction rates for marine plankton (Cermeno 2011).

In practice there are problems associated with speciation and time-varying extinction rates, not to mention biased data (Pease 1988). In the end, the best we can say at present appears to be that natural species survival is roughly exponentially distributed.

# Conclusions for xrisk research

Survival curves contain a lot of useful information. The median lifespan is easy to read off by checking the intersection with the 50% survival line. The life expectancy is the area under the curve.

In a semilog-diagram an exponentially declining survival probability is a line with negative slope. The slope is set by the hazard rate. Changes in hazard rate makes the line a series of segments.
An early reduction in hazard (i.e. the line slope becomes flatter) clearly improves the outlook at a later time more than a later equal improvement: to have a better effect the late improvement needs to reduce hazard significantly more.

A transition risk causes a vertical displacement of the line (or curve) downwards: the weight determines the distance. From a given future time, it does not matter when the transition risk occurs as long as the subsequent hazard rate is not dependent on it. If the weight changes depending on when it occurs (hardware overhang, technology ordering, population) then the position does matter. If there is a risky transition that reduces state risk we should want it earlier if it does not become worse.

### Acknowledgments

Thanks to Toby Ord for pointing out a mistake in an earlier version.

# Appendix: survival analysis

The main object of interest is the survival function $S(t)=\Pr(T>t)$ where $T$ is a random variable denoting the time of death. In engineering it is commonly called reliability function. It is declining over time, and will approach zero unless indefinite survival is possible with a finite probability.

The event density $f(t)=\frac{d}{dt}(1-S(t))$ denotes the rate of death per unit time.

The hazard function $\lambda(t)$ is the event rate at time $t$ conditional on survival until time $t$ or later. It is $\lambda(t) = - S'(t)/S(t)$. Note that unlike the event density function this does not have to decline as the number of survivors gets low: this is the overall force of mortality at a given time.

The expected future lifetime given survival to time $t_0$ is $\frac{1}{S(t_0)}\int_{t_0}^\infty S(t)dt.$ Note that for exponential survival curves (i.e. constant hazard) it remains constant.

# How much should we spread out across future scenarios?

Robin Hanson mentions that some people take him to task for working on one scenario (WBE) that might not be the most likely future scenario (“standard AI”); he responds by noting that there are perhaps 100 times more people working on standard AI than WBE scenarios, yet the probability of AI is likely not a hundred times higher than WBE. He also notes that there is a tendency for thinkers to clump onto a few popular scenarios or issues. However:

In addition, due to diminishing returns, intellectual attention to future scenarios should probably be spread out more evenly than are probabilities. The first efforts to study each scenario can pick the low hanging fruit to make faster progress. In contrast, after many have worked on a scenario for a while there is less value to be gained from the next marginal effort on that scenario.

This is very similar to my own thinking about research effort. Should we focus on things that are likely to pan out, or explore a lot of possibilities just in case one of the less obvious cases happens? Given that early progress is quick and easy, we can often get a noticeable fraction of whatever utility the topic has by just a quick dip. The effective altruist heuristic of looking at neglected fields also is based on this intuition.

# A model

But under what conditions does this actually work? Here is a simple model:

There are $N$ possible scenarios, one of which ($j$) will come about. They have probability $P_i$. We allocate a unit budget of effort to the scenarios: $\sum a_i = 1$. For the scenario that comes about, we get utility $\sqrt{a_j}$ (diminishing returns).

Here is what happens if we allocate proportional to a power of the scenarios, $a_i \propto P_i^\alpha$. $\alpha=0$ corresponds to even allocation, 1 proportional to the likelihood, >1 to favoring the most likely scenarios. In the following I will run Monte Carlo simulations where the probabilities are randomly generated each instantiation. The outer bluish envelope represents the 95% of the outcomes, the inner ranges from the lower to the upper quartile of the utility gained, and the red line is the expected utility.

This is the $N=2$ case: we have two possible scenarios with probability $p$ and $1-p$ (where $p$ is uniformly distributed in [0,1]). Just allocating evenly gives us $1/\sqrt{2}$ utility on average, but if we put in more effort on the more likely case we will get up to 0.8 utility. As we focus more and more on the likely case there is a corresponding increase in variance, since we may guess wrong and lose out. But 75% of the time we will do better than if we just allocated evenly. Still, allocating nearly everything to the most likely case means that one does lose out on a bit of hedging, so the expected utility declines slowly for large $\alpha$.

The  $N=100$ case (where the probabilities are allocated based on a flat Dirichlet distribution) behaves similarly, but the expected utility is smaller since it is less likely that we will hit the right scenario.

# What is going on?

This doesn’t seem to fit Robin’s or my intuitions at all! The best we can say about uniform allocation is that it doesn’t produce much regret: whatever happens, we will have made some allocation to the possibility. For large N this actually works out better than the directed allocation for a sizable fraction of realizations, but on average we get less utility than betting on the likely choices.

The problem with the model is of course that we actually know the probabilities before making the allocation. In reality, we do not know the likelihood of AI, WBE or alien invasions. We have some information, and we do have priors (like Robin’s view that $P_{AI} < 100 P_{WBE}$), but we are not able to allocate perfectly.  A more plausible model would give us probability estimates instead of the actual probabilities.

# We know nothing

Let us start by looking at the worst possible case: we do not know what the true probabilities are at all. We can draw estimates from the same distribution – it is just that they are uncorrelated with the true situation, so they are just noise.

In this case uniform distribution of effort is optimal. Not only does it avoid regret, it has a higher expected utility than trying to focus on a few scenarios ($\alpha>0$). The larger N is, the less likely it is that we focus on the right scenario since we know nothing. The rationality of ignoring irrelevant information is pretty obvious.

Note that if we have to allocate a minimum effort to each investigated scenario we will be forced to effectively increase our $\alpha$ above 0. The above result gives the somewhat optimistic conclusion that the loss of utility compared to an even spread is rather mild: in the uniform case we have a pretty low amount of effort allocated to the winning scenario, so the low chance of being right in the nonuniform case is being balanced by having a slightly higher effort allocation on the selected scenarios. For high $\alpha$ there is a tail of rare big “wins” when we hit the right scenario that drags the expected utility upwards, even though in most realizations we bet on the wrong case. This is very much the hedgehog predictor story: ocasionally they have analysed the scenario that comes about in great detail and get intensely lauded, despite looking at the wrong things most of the time.

# We know a bit

We can imagine that knowing more should allow us to gradually interpolate between the different results: the more you know, the more you should focus on the likely scenarios.

If we take the mean of the true probabilities with some randomly drawn probabilities (the “half random” case) the curve looks quite similar to the case where we actually know the probabilities: we get a maximum for $\alpha\approx 2$. In fact, we can mix in just a bit ($\beta$) of the true probability and get a fairly good guess where to allocate effort (i.e. we allocate effort as $a_i \propto (\beta P_i + (1-\beta)Q_i)^\alpha$ where $Q_i$ is uncorrelated noise probabilities). The optimal alpha grows roughly linearly with $\beta$, $\alpha_{opt} \approx 4\beta$ in this case.

# We learn

Adding a bit of realism, we can consider a learning process: after allocating some effort $\gamma$ to the different scenarios we get better information about the probabilities, and can now reallocate. A simple model may be that the standard deviation of noise behaves as $1/\sqrt{\tilde{a}_i}$ where $\tilde{a}_i$ is the effort placed in exploring the probability of scenario $i$. So if we begin by allocating uniformly we will have noise at reallocation of the order of $1/\sqrt{\gamma/N}$. We can set $\beta(\gamma)=\sqrt{\gamma/N}/C$, where $C$ is some constant denoting how tough it is to get information. Putting this together with the above result we get $\alpha_{opt}(\gamma)=\sqrt{2\gamma/NC^2}$. After this exploration, now we use the remaining $1-\gamma$ effort to work on the actual scenarios.

This is surprisingly inefficient. The reason is that the expected utility declines as $\sqrt{1-\gamma}$ and the gain is just the utility difference between the uniform case $\alpha=0$ and optimal $\alpha_{opt}$, which we know is pretty small. If C is small (i.e. a small amount of effort is enough to figure out the scenario probabilities) there is an optimal nonzero  $\gamma$. This optimum $\gamma$ decreases as C becomes smaller. If C is large, then the best approach is just to spread efforts evenly.

# Conclusions

So, how should we focus? These results suggest that the key issue is knowing how little we know compared to what can be known, and how much effort it would take to know significantly more.

If there is little more that can be discovered about what scenarios are likely, because our state of knowledge is pretty good, the world is very random,  or improving knowledge about what will happen will be costly, then we should roll with it and distribute effort either among likely scenarios (when we know them) or spread efforts widely (when we are in ignorance).

If we can acquire significant information about the probabilities of scenarios, then we should do it – but not overdo it. If it is very easy to get information we need to just expend some modest effort and then use the rest to flesh out our scenarios. If it is doable but costly, then we may spend a fair bit of our budget on it. But if it is hard, it is better to go directly on the object level scenario analysis as above. We should not expect the improvement to be enormous.

Here I have used a square root diminishing return model. That drives some of the flatness of the optima: had I used a logarithm function things would have been even flatter, while if the returns diminish more mildly the gains of optimal effort allocation would have been more noticeable. Clearly, understanding the diminishing returns, number of alternatives, and cost of learning probabilities better matters for setting your strategy.

In the case of future studies we know the number of scenarios are very large. We know that the returns to forecasting efforts are strongly diminishing for most kinds of forecasts. We know that extra efforts in reducing uncertainty about scenario probabilities in e.g. climate models also have strongly diminishing returns. Together this suggests that Robin is right, and it is rational to stop clustering too hard on favorite scenarios. Insofar we learn something useful from considering scenarios we should explore as many as feasible.

# Scientific progress goes zig-zag

I recently nerded out about high-energy proton interaction with matter, enjoying reading up on the Bethe equation at the Particle Data Group review and elsewhere. That got me to look around at the PDL website, which is full of awesome stuff – everything from math and physics reviews to data for the most obscure “particles” ever, plus tests of how conserved the conservation laws are.

The first thing that strikes the viewer is that they have moved a fair bit, including often being far outside the original error bars. 6 of them have escaped them. That doesn’t look very good for science!

Fortunately, it turns out that these error bars are not 95% confidence intervals (the most common form in many branches of science) but 68.3% confidence intervals (one standard deviation, if things are normal). That means having half of them out of range is entirely reasonable! On the other hand, most researchers don’t understand error bars (original paper), and we should be able to do much better.

The PDG state:

Sometimes large changes occur. These usually reflect the introduction of significant new data or the discarding of older data. Older data are discarded in favor of newer data when it is felt that the newer data have smaller systematic errors, or have more checks on systematic errors, or have made corrections unknown at the time of the older experiments, or simply have much smaller errors. Sometimes, the scale factor becomes large near the time at which a large jump takes place, reflecting the uncertainty introduced by the new and inconsistent data. By and large, however, a full scan of our history plots shows a dull progression toward greater precision at central values quite consistent with the first data points shown.

Overall, kudos to PDG for showing the history and making it clearer what is going on! But I do not agree it is a dull progression.

## Zigzag to truth

The locus classicus for histories of physical constants being not quite a monotonic march towards truth is Max Henrion and Baruch Fischhoff. Assessing uncertainty in physical constants. American Journal of Physics 54, 791 (1986); doi: 10.1119/1.14447. They discuss the problem of people being overconfident and badly calibrated, and then show the zigzagging approach to current values:

Note that the shifts were far larger than the estimated error bars. The dip in the 1930s and 40s even made some physicists propose that c could be changing over time. Overall Henrion and Fischhoff find that physicists have been rather overconfident in their tight error bounds on their measurements. The approach towards current estimates is anything but dull, and hides many amusing historical anecdotes.

Stories like this might have been helpful; it is notable that the PDG histories on the right, for newer constants, seem to stay closer to the present value than the longer ones to the left. Maybe this is just because they have not had the time to veer off yet, but one can be hopeful.

Still, even if people are improving this might not mean the conclusions stay stable or approach truth monotonically. A related issue is “negative learning”, where more data and improved models make the consensus view of a topic move in the wrong direction: Oppenheimer, M., O’Neill, B. C., & Webster, M. (2008). Negative learning. Climatic Change, 89(1-2), 155-172. Here the problem is not just that people are overconfident in how certain they can be about their conclusions, but also that there is a bit of group-think, plus that the models change in structure and are affected in different ways by the same data. They point out how estimates of ozone depletion oscillated, or the consensus on the stability of climate has shifted from oscillatory (before 1968) towards instability (68-82), towards stability (82-96), and now towards instability again (96-06). These problems are not due to mere irrationality, but the fact that as we learn more and build better models these incomplete but better models may still deviate strongly from the ground truth because they miss some key component.

## Noli fumare

This is related to what Nick Bostrom calls the “data fumes” problem. Early data will be fragmentary and explanations uncertain – but the data points and their patterns are very salient, just as the early models, since there is nothing else. So we begin to anchor on them. Then new data arrives and the models improve… and the old patterns are revealed as statistical noise, or bugs in the simulation or plotting routine. But since we anchored on them, we are unlikely to update as strongly towards the new most likely estimates. Worse, accommodating a new model takes mental work; our status quo bias will be pushing against the update. Even if we do accommodate the new state, things will likely change more – we may well end up either with a view anchored on early noise, or assume that the final state is far more uncertain than it actually is (since we weigh the early jumps strongly because of their saliency).

This is of course why most people prefer to believe a charismatic diet cultleader expert rather than trying to dig through 70 years of messy, conflicting dietary epidemiology.

Here is a simple example where an agent is trying to do a maximum likelihood estimation of a Gaussian distribution with mean 1 and variance 1, but is hamstrung by giving double weight to the first 9 data points:

It is not hard to complicate the model with anchoring/recency/status quo bias (estimates get biased towards previous estimates), or that early data points are more polluted by differently distributed noise. Asymmetric error checking (you will look for bugs if results deviate from expectation and hence often find such bugs, but not look for bugs making your results closer to expectation) is another obvious factor for how data fumes can get integrated in models.

The problem with data fumes is that it is not easy to tell when you have stabilized enough to start trusting the data. It is even messier when the inputs are results generated by your own models or code. I like to approach it by using multiple models to guesstimate model error: for example, one mathematical model on paper and one Monte Carlo simulation – if they don’t agree, then I should disregard either answer and keep on improving.

Even when everything seems to be fine there may be a big crucial consideration one has missed. The Turing-Good estimator gives another way of estimating the risk of that: if you have acquired $N$ data points and seen $K$ big surprises (remember that the first data point counts as one!), then the probability of a new surprise for your next data point is $\approx K/N$. So if you expect $M$ data points in total, when $K(M-N)/N \ll 1$ you can start to trust the estimates… assuming surprises are uncorrelated etc. Which you will not be certain about. The progression towards greater precision may be anything but dull.

# Did amphetamines help Erdős?

During my work on the Paris talk I began to wonder whether Paul Erdős (who I used as an example of a respected academic who used cognitive enhancers) could actually have been shown to have benefited from his amphetamine use, which began in 1971 according to Hill (2004). One way of investigating is his publication record: how many papers did he produce per year before or after 1971? Here is a plot, based on Jerrold Grossman’s 2010 bibliography:

The green dashed line is the start of amphetamine use, and the red dashed life is the date of death. Yes, there is a fairly significant posthumous tail: old mathematicians never die, they just asymptote towards zero. Overall, the later part is more productive per year than the early part (before 1971 the mean and standard deviation was 14.6±7.5, after 24.4±16.1; a Kruskal-Wallis test rejects that they are the same distribution, p=2.2e-10).

This does not prove anything. After all, his academic network was growing and he moved from topic to topic, so we cannot prove any causal effect of the amphetamine: for all we know, it might have been holding him back.

One possible argument might be that he did not do his best work on amphetamine. To check this, I took the Wikipedia article that lists things named after Erdős, and tried to find years for the discovery/conjecture. These are marked with red crosses in the diagram, slightly jittered. We can see a few clusters that may correspond to creative periods: one in 35-41, one in 46-51, one in 56-60. After 1970 the distribution was more even and sparse. 76% of the most famous results were done before 1971; given that this is 60% of the entire career it does not look that unlikely to be due to chance (a binomial test gives p=0.06).

Again this does not prove anything. Maybe mathematics really is a young man’s game, and we should expect key results early. There may also have been more time to recognize and name results from the earlier career.

In the end, this is merely a statistical anecdote. It does show that one can be a productive, well-renowned (if eccentric) academic while on enhancers for a long time. But given the N=1, firm conclusions or advice are hard to draw.

Erdős’s friends worried about his drug use, and in 1979 Graham bet Erdős \$500 that he couldn’t stop taking amphetamines for a month. Erdős accepted, and went cold turkey for a complete month. Erdős’s comment at the end of the month was “You’ve showed me I’m not an addict. But I didn’t get any work done. I’d get up in the morning and stare at a blank piece of paper. I’d have no ideas, just like an ordinary person. You’ve set mathematics back a month.” He then immediately started taking amphetamines again. (Hill 2004)

# Quantifying busyness

## Tempus fugit

If I have one piece of advice to give to people, it is that they typically have way more time now than they will ever have in the future. Do not procrastinate, take chances when you see them – you might never have the time to do it later.

One reason is the gradual speeding up of subjective time as we age: one day is less time for a 40 year old than for a 20 year old, and way less than the eon it is to a 5 year old. Another is that there is a finite risk that opportunities will go away (including our own finite lifespans). The main reason is of course the planning fallacy: since we underestimate how long our tasks will take, our lives tend to crowd up. Accepting to give a paper in several months time is easy, since there seems to be a lot of time to do it in between… which mysteriously disappears until you sit there doing an all-nighter. There is also the likely effect that as you grow in skill, reputation and career there will be more demands on your time. All in all, expect your time to grow in preciousness!

## Mining my calendar

I recently noted that my calendar had filled up several weeks in advance, something I think did not happen to this extent a few years back. A sign of a career taking off, worsening time management, or just bad memory? I decided to do some self-quantification using my Google calendar. I exported the calendar as an .ics file and made a simple parser in Matlab.

It is pretty clear from a scatter plot that most entries are for the near future – a few days or weeks ahead. Looking at a histogram shows that most are within a month (a few are in the past – I sometimes use my calendar to note when I have done something like an interview that I may want to remember later).

Plotting it as a log-log diagram suggests it is lighter-tailed than a power-law: there is a characteristic scale. And there are a few wobbles suggesting 1-week, 2-week and 3-week periodicities.

Am I getting busier? Plotting the mean and median distance to scheduled events, and the number of events per year, suggests yes. The median distance to the things I schedule seems to be creeping downwards, while the number of events per year has clearly doubled from about 400 in 2008 to 800 in 2014 (and extrapolating 2015 suggests about 1000 scheduled events).

Plotting the number of events I had per 14-day period also suggests that I have way more going on now than a few years ago. The peaks are getting higher and the mean period is more intense.

## When am I free?

A good measure of busyness would be the time horizon: how far ahead should you ask me for a meeting if you want to have a high chance of getting it?

One approach would be to look for the probability $Q(t)$ that a day $t$ days ahead is entirely empty. If the probability that I will fill in something $i$ days ahead is $P(i)$, then the chance for an empty day is $Q(t) = \prod_{i=t}^\infty (1-P(i))$. We can estimate $P(i)$ by doing a curve-fit (a second degree curve works well), but we can of course just estimate from the histogram counts: $\hat{P}(i)=N(i)/N$.

However, this method is slightly wrong. Some days are free, others have many different events. If I schedule twice as many events the chance of a free day should be lower. A better way of estimating $Q(t)$ is to think in terms of the rate of scheduling. We can view this as a Poisson process, where the rate of scheduling $\lambda(i)$ tells us how often I schedule something $i$ days ahead. An approximation is $\hat{\lambda}(i)=N(i)/T$, where $T$ is the time interval we base our estimate on. This way $Q(t) = \prod_{i=t}^\infty e^{-\lambda(i)}$.

If we slice the data by year, then there seems to be a fairly clear trend towards the planning horizon growing – I have more and more events far into future, and I have more to do. Oh, those halcyon days in 2007 when I presumably just lazed around…

If we plot when I have 50%, 75% and 90% chance of being free, the trend is even clearer. At present you need to ask about three weeks in advance to have a 50% chance of grabbing me, and 187 days in advance to be 90% certain (if you want an entire working week with 50% chance, this is close to where you should go). Back in 2008 the 50% point was about a week and the 90% point 1.5 months ahead. I have become around 3 times busier.

## Conclusions

So, I have become busier. This is of course no evidence of getting more done – a lot of events are pointless meetings, and who knows if I am doing anything helpful at the other events. Plus, I might actually be wasting my time doing statistics and blogging instead of working.

But the exercise shows that it is possible to automatically estimate necessary planning horizons. Maybe we should add this to calendar apps to help scheduling: my contact page or virtual secretary might give you an automatically updated estimate of how far ahead you need to schedule things to have a good chance of getting me. It doesn’t have to tell you my detailed schedule (in principle one could do a privacy attack on the schedule by asking for very specific dates and seeing if they were blocked).

We can also use this method to look at levels of busyness across organisations. Who have flexibility in their schedules, who are so overloaded that they cannot be effectively involved in projects? In the past, tasks tended to be simple and the issue was just the amount of time people had. But today we work individually yet as part of teams, and coordination (meetings, seminars, lectures) are the key links: figuring out how to schedule them right is important for effectivity.

If team member $j$ has scheduling rates $\lambda_j(i)$ and they are are uncorrelated (yeah, right), then $Q(t)=\prod_{i=t}^\infty e^{-\sum_j\lambda_j(i)}$. The most important lesson is that the chance of everybody being able to make it to any given meeting day declines exponentially with the number of people. If the $\lambda_j(i)$ decline exponentially with time (plausible in at least my case) then scheduling a meeting requires the time ahead to be proportional to the number of people involved: double the meeting size, at least double the planning horizon. So if you want nimble meetings, make them tiny.

In the end, I prefer to live by the advice my German teacher Ulla Landvik once gave me, glancing at the school clock: “I see we have 30 seconds left of the lesson. Let’s do this excercise – we have plenty of time!” Time not only flies, it can be stretched too.

Some further explorations.

Owen Cotton-Barratt pointed out that another measure of busyness might be the distance to the next free day. Plotting it shows a very bursty pattern, with noisy peaks. The mean time was about 2-3 days: even though a lot of time the horizon is far away, often an empty day slips through too. It is just that it cannot be relied on.

Are there periodicities? The most obvious is the weekly dynamics: Thursdays are busiest, weekend least busy. I tend to do scheduling in a roughly similar manner, with Tuesdays as the top scheduling day.

Over the years, plotting the number of events per day (“event intensity”) it is also clear that there is a loose pattern. Back in 2008-2011 one can see a lower rate around day 75 – that is the break between Hilary and Trinity term here in Oxford. There is another trough around day 200-250, the summer break and the time before the Michaelmas term. However, this is getting filled up over time.

Making a periodogram produces an obvious peak for 7 days, and a loose yearly periodicity. Between them there is a bunch of harmonics. The funny thing is that the week periodicity is very strong but hard to see in the map above.

# Ebola and the dragon

Here is a reason not to worry too much about Ebola… yet. I took the WHO data on Ebola outbreaks and plotted it. The distribution is not power-law distributed (looks bent on a loglog scale) but is decently exponential (straight on a semilog scale). The probability goes down fast with size.

However, when we add the final toll from the current outbreak (1603 suspected cases with 887 fatalities at August 1) it might turn out to be a “dragon-king” bucking the line: in that case we should expect that large international outbreaks follow an entirely new dynamic. This is mildly worrying. Still, it is early days.