Did amphetamines help Erdős?

During my work on the Paris talk I began to wonder whether Paul Erdős (who I used as an example of a respected academic who used cognitive enhancers) could actually have been shown to have benefited from his amphetamine use, which began in 1971 according to Hill (2004). One way of investigating is his publication record: how many papers did he produce per year before or after 1971? Here is a plot, based on Jerrold Grossman’s 2010 bibliography:

Productivity of Paul Erdos over his life. Green dashed line: amphetamine use, red dashed line: death. Crosses mark named concepts.
Productivity of Paul Erdos over his life. Green dashed line: amphetamine use, red dashed line: death. Crosses mark named concepts.

The green dashed line is the start of amphetamine use, and the red dashed life is the date of death. Yes, there is a fairly significant posthumous tail: old mathematicians never die, they just asymptote towards zero. Overall, the later part is more productive per year than the early part (before 1971 the mean and standard deviation was 14.6±7.5, after 24.4±16.1; a Kruskal-Wallis test rejects that they are the same distribution, p=2.2e-10).

This does not prove anything. After all, his academic network was growing and he moved from topic to topic, so we cannot prove any causal effect of the amphetamine: for all we know, it might have been holding him back.

One possible argument might be that he did not do his best work on amphetamine. To check this, I took the Wikipedia article that lists things named after Erdős, and tried to find years for the discovery/conjecture. These are marked with red crosses in the diagram, slightly jittered. We can see a few clusters that may correspond to creative periods: one in 35-41, one in 46-51, one in 56-60. After 1970 the distribution was more even and sparse. 76% of the most famous results were done before 1971; given that this is 60% of the entire career it does not look that unlikely to be due to chance (a binomial test gives p=0.06).

Again this does not prove anything. Maybe mathematics really is a young man’s game, and we should expect key results early. There may also have been more time to recognize and name results from the earlier career.

In the end, this is merely a statistical anecdote. It does show that one can be a productive, well-renowned (if eccentric) academic while on enhancers for a long time. But given the N=1, firm conclusions or advice are hard to draw.

Erdős’s friends worried about his drug use, and in 1979 Graham bet Erdős $500 that he couldn’t stop taking amphetamines for a month. Erdős accepted, and went cold turkey for a complete month. Erdős’s comment at the end of the month was “You’ve showed me I’m not an addict. But I didn’t get any work done. I’d get up in the morning and stare at a blank piece of paper. I’d have no ideas, just like an ordinary person. You’ve set mathematics back a month.” He then immediately started taking amphetamines again. (Hill 2004)

Limits of morphological freedom

Alternative limb projectMy talk “Morphological freedom: what are the limits to transforming the body?” was essentially a continuation of my original morphological freedom talk from 2001. Now with added philosophical understanding and linking to some of the responses to the original paper. Here is a quick summary:

Enhancement and extensions

I began by a few cases: Liz Parrish self-experimenting with gene therapy to slow her ageing, Paul Erdös using drugs for cognitive enhancement, Todd Huffman exploring the realm of magnetic vision using an implanted magnet, Neil Harbisson gaining access to the realm of color using sonification, Stelarc doing body modification and extension as performance art, and Erik “The Lizardman” Sprague transforming into a lizard as an existential project.

It is worth noting that several of these are not typical enhancements amplifying an existing ability, but about gaining access to entirely new abilities (I call it “extension”). Their value is not instrumental, but lies in the exploration or self-transformation. They are by their nature subjective and divergent. While some argue enhancement will by their nature be convergent (I disagree) extensions definitely go in all directions – and in fact gain importance from being different.

Morphological freedom and its grounding

Cool tattooMorphological freedom, “The right to modify one’s body (or not modify) according to one’s desires”, can be derived from fundamental rights such as the right to life and the right to pursue happiness. If you are not free to control your own body, your right to life and freedom are vulnerable and contingent: hence you need to be allowed to control your body. But I argue this includes a right to change the body: morphological freedom.

One can argue about what rights are, or if they exist. If there are such things, there is however a fair consensus that life and liberty is on the list. Similarly, morphological freedom seems to be so intrinsically tied together with personhood that it becomes inalienable: you cannot remove it from a person without removing an important aspect of what it means to be a person.

These arguments are about fundamental rights rather than civil and legal rights: while I think we should make morphological freedom legally protected, I do think there is more to it than just mutual agreement. Patrick Hopkins wrote an excellent paper analysing how morphological freedom could be grounded. He argued that there are three primary approaches: grounding it in individual autonomy, in  human nature, or in human interests. Autonomy is very popular, but Hopkins thinks much of current discourse is a juvenile “I want to be allowed to do what I want” autonomy rather than the more rational or practical concepts of autonomy in deontological or consequentialist ethics. One pay-off is that these concepts do imply limits to morphological freedom to undermine one’s own autonomy. Grounding in human nature requires a view of human nature. Transhumanists and many bioconservatives actually find themselves allies against the relativists and constructivists that deny any nature: they merely disagree on what the sacrosanct parts of that nature are (and these define limits of morphological freedom). Transhumanists think most proposed enhancements are outside these parts, the conservatives think they cover nearly any enhancement. Finally, grounding in what makes humans truly flourish again produces some ethically relevant limits. However, the interest account has trouble with extensions: at best it can argue that we need exploration or curiosity.

One can motivate morphological freedom in many other ways. One is that we need to explore: both because there may be posthuman modes of existence of extremely high value, and because we cannot know the value of different changes without trying them – the world is too complex to be reliably predicted, and many valuable things are subjective in nature. One can also argue we have some form of duty to approach posthumanity, because this approach is intrinsically or instrumentally important (consider a transhumanist reading of Nietzsche, or cosmist ideas). This approach typically seem to require some non-person affecting value. Another approach is to argue morphological freedom is socially constructed within different domains; we have one kind of freedom in sport, another one in academia. I am not fond of this approach since it does not explain how to handle the creation of new domains or what to do between domains. Finally, there is the virtue approach: self-transformation can be seen as a virtue. By this perspective we are not only allowed to change ourselves, we ought to since it is part of human excellence and authenticity.

Limits

Limits to morphological freedom can be roughly categorized as practical/prudential, issues of willingness to change/identity, the ethical limits, and the social limits.

Practical/prudential limits

Safety is clearly a constraint. If an enhancement is too dangerous, then the risk outweighs the benefit and it should not be done. This is tricky to evaluate for more subjective benefits. The real risk boundary might not be a risk/benefit trade-off, but whether risk is handled in a responsible manner. The difference between being a grinder and doing self-harm consists in whether one is taking precautions and regard pain and harms as problems rather than the point of the exercise.

There are also obvious technological and biological limits. I did not have the time to discuss them, but I think one can use heuristics like the evolutionary optimality challenge to make judgements about feasibility and safety.

Identity limits

Design your bodyEven in a world where anything could be changed with no risk, economic cost or outside influence it is likely that many traits would remain stable. We express ourselves through what we transform ourselves into, and this implies that we will not change what we consider to be prior to that. The Riis, Simmons and Goodwin study showed that surveyed students were much less willing to enhance traits that were regarded more relevant to personal identity than peripheral traits. Rather than “becoming more than you are” the surveyed students were interested in being who they are – but better at it. Morphological freedom may hence be strongly constrained by the desire to maintain a variant of the present self.

Ethical limits

Beside the limits coming from the groundings discussed above, there are the standard constraints of not harming or otherwise infringing on the rights of others, capacity (what do we do about prepersons, children or the deranged?) and informed consent. The problem here is not any disagreement about the existence of the constraints, but where they actually lie and how they actually play out.

Social limits

There are obvious practical social limits for some morphological freedom. Becoming a lizard affects your career choices and how people react to you – the fact that maybe it shouldn’t does not change the fact that it does.

There are also constraints from externalities: morphological freedom should not unduly externalize its costs on the rest of society.

My original paper has got a certain amount of flak from the direction of disability rights, since I argued morphological freedom is a negative right. You have a right to try to change yourself, but I do not need to help you – and vice versa. The criticism is that this is ableist: to be a true right there must be social support for achieving the inherent freedom. To some extent my libertarian leanings made me favour a negative right, but it was also the less radical choice: I am actually delighted that others think we need to reshape society to help people self-transform, a far more radical view. I have some misgivings about the politics of this, prioritization tends to be nasty business, it means that costs will be socially externalized, and in the literature there seem to be some odd views about who gets to say what bodies are authentic or not, but I am all in favour of a “commitment to the value, standing, and social legibility of the widest possible (and an ever-expanding) variety of desired morphologies and lifeways.”

Another interesting discourse has been about the control of the body. While in medicine there has been much work to normalize it (slowly shifting towards achieving functioning in one’s own life), in science the growth of ethics review has put more and more control in the hands of appointed experts, while in performance art almost anything goes (and attempts to control it would be censorship). As Goodall pointed out, many of the body-oriented art pieces are as much experiments in ethics as they are artistic experiments. They push the boundaries in important ways.

Touch the limits

In the end, I think this is an important realization: we do not fully know the moral limits of morphological freedom. We should not expect all of them to be knowable through prior reasoning. This is a domain where much is unknown and hard for humans to reason about. Hence we need experiments and exploration to learn them. We should support this exploration since there is much of value to be found, and because it embodies much of what humanity is about. Even when we do not know it yet.

Being reasonable

DisagreementThe ever readable Scott Alexander stimulated a post on Practical Ethics about defaults, status quo, and disagreements about sex. The quick of it: our culture sets defaults on who is reasonable or unreasonable when couples disagree, and these become particularly troubling when dealing with biomedical enhancements of love and sex. The defaults combine with status quo bias and our scepticism for biomedical interventions to cause biases that can block or push people towards certain interventions.

Packing my circles

One of the first fractals I ever saw was the Apollonian gasket, the shape that emerges if you draw the circle internally tangent to three other tangent circles. It is somewhat similar to the Sierpinski triangle, but has a more organic flair. I can still remember opening my copy of Mandelbrot’s The Fractal Geometry of Nature and encountering this amazing shape. There is a lot of interesting things going on here.

Here is a simple algorithm for generating related circle packings, trading recursion for flexibility:

  1. Start with a domain and calculate the distance to the border for all interior points.
  2. Place a circle of radius \alpha d^* at the point with maximal distance d^*=\max d(x,y) from the border.
  3. Recalculate the distances, treating the new circle as a part of the border.
  4. Repeat (2-3) until the radius becomes smaller than some tolerance.

This is easily implemented in Matlab if we discretize the domain and use an array of distances d(x,y), which is then updated d(x,y) \leftarrow \min(d(x,y), D(x,y)) where D(x,y) is the distance to the circle. This trades exactness for some discretization error, but it can easily handle nearly arbitrary shapes.

Apollonian circle packing in square.
Apollonian circle packing in square.
Apollonian circle packing in blob.
Apollonian circle packing in blob.
Apollonian circle packing in heart.
Apollonian circle packing in heart.

It is interesting to note that the topology is Apollonian nearly everywhere: as soon as three circles form a curvilinear triangle the interior will be a standard gasket if \alpha=1.

Number of circles larger than a certain radius in packing in blob shape.
Number of circles larger than a certain radius in packing in blob shape.

In the above pictures the first circle tends to dominate. In fact, the size distribution of circles is a power law: the number of circles larger than r grows as N(r)\propto r^-\delta as we approach zero, with \delta \approx 1.3. This is unsurprising: given a generic curved triangle, the inscribed circle will be a fraction of the radii of the bordering circles. If one looks at integral circle packings it is possible to see that the curvatures of subsequent circles grow quadratically along each “horn”, but different “horns” have different growths. Because of the curvature the self-similarity is nontrivial: there is actually, as far as I know, still no analytic expression of the fractal dimension of the gasket. Still, one can show that the packing exponent \delta is the Hausdorff dimension of the gasket.

Anyway, to make the first circle less dominant we can either place a non-optimal circle somewhere, or use lower \alpha.

Apollonian packing in square with central circle of radius 1/6.
Apollonian packing in square with central circle of radius 1/6.

If we place a circle in the centre of a square with a radius smaller than the distance to the edge, it gets surrounded by larger circles.

Randomly started Apollonian packing.
Randomly started Apollonian packing.

If the circle is misaligned, it is no problem for the tiling: any discrepancy can be filled with sufficiently small circles. There is however room for arbitrariness: when a bow-tie-shaped region shows up there are often two possible ways of placing a maximal circle in it, and whichever gets selected breaks the symmetry, typically producing more arbitrary bow-ties. For “neat” arrangements with the right relationships between circle curvatures and positions this does not happen (they have circle chains corresponding to various integer curvature relationships), but the generic case is a mess. If we move the seed circle around, the rest of the arrangement both show random jitter and occasional large-scale reorganizations.


When we let \alpha<1 we get sponge-like fractals: these are relatives to the Menger sponge and the Cantor set. The domain gets an infinity of circles punched out of itself, with a total area approaching the area of the domain, so the total measure goes to zero.

Apollonian packing with alpha=0.5.
Apollonian packing with alpha=0.5.

That these images have an organic look is not surprising. Vascular systems likely grow by finding the locations furthest away from existing vascularization, then filling in the gaps recursively (OK, things are a bit more complex).

Apollonian packing with alpha=1/4.
Apollonian packing with alpha=1/4.
Apollonian packing with alpha=0.1.
Apollonian packing with alpha=0.1.

How small is the wiki?

Recently I encountered a specialist Wiki. I pressed “random page” a few times, and got a repeat page after 5 tries. How many pages should I expect this small wiki to have?

We can compare this to the German tank problem. Note that it is different; in the tank problem we have a maximum sample (maybe like the web pages on the site were numbered), while here we have number of samples before repetition.

We can of course use Bayes theorem for this. If I get a repeat after k random samples, the posterior distribution of N, the number of pages, is P(N|k) = P(k|N)P(N)/P(k).

If I randomly sample from N pages, the probability of getting a repeat on my second try is 1/N, on my third try 2/N, and so on: P(k|N)=(k-1)/N. Of course, there has to be more pages than k-1, otherwise a repeat must have happened before step k, so this is valid for k \leq N+1. Otherwise, P(k|N)=0 for k>N+1.

The prior P(N) needs to be decided. One approach is to assume that websites have a power-law distributed number of pages. The majority are tiny, and then there are huge ones like Wikipedia; the exponent is close to 1. This gives us P(N) = N^{-\alpha}/\zeta(\alpha). Note the appearance of the Riemann zeta function as a normalisation factor.

We can calculate P(k) by summing over the different possible N: P(k)=\sum_{N=1}^\infty P(k|N)P(N) = \frac{k-1}{\zeta(\alpha)}\sum_{N=k-1}^\infty N^{-(\alpha+1)} =\frac{k-1}{\zeta(\alpha)}(\zeta(\alpha+1)-\sum_{i=1}^{k-2}i^{-(\alpha+1)}).

Putting it all together we get P(N|k)=N^{-(\alpha+1)}/(\zeta(\alpha+1) -\sum_{i=1}^{k-2}i^{-(\alpha+1)}) for N\geq k-1. The posterior distribution of number of pages is another power-law. Note that the dependency on k is rather subtle: it is in the support of the distribution, and the upper limit of the partial sum.

What about the expected number of pages in the wiki? E(N|k)=\sum_{N=1}^\infty N P(N|k) = \sum_{N=k-1}^\infty N^{-\alpha}/(\zeta(\alpha+1) -\sum_{i=1}^{k-2}i^{-(\alpha+1)}) =\frac{\zeta(\alpha)-\sum_{i=1}^{k-2} i^{-\alpha}}{\zeta(\alpha+1)-\sum_{i=1}^{k-2}i^{-(\alpha+1)}}. The expectation is the ratio of the zeta functions of \alpha and \alpha+1, minus the first k-2 terms of their series.

Distribution of P(N|5) for alpha=1.1.
Distribution of P(N|5) for [latex]\alpha=1.1[/latex].

So, what does this tell us about the wiki I started with? Assuming \alpha=1.1 (close to the behavior of big websites), it predicts E(N|k)\approx 21.28. If one assumes a higher \alpha=2 the number of pages would be 7 (which was close to the size of the wiki when I looked at it last night – it has grown enough today for k to equal 13 when I tried it today).

Expected number of pages given k random views before a repeat.
Expected number of pages given k random views before a repeat.

So, can we derive a useful rule of thumb for the expected number of pages? Dividing by k shows that E(N|k) approaches proportionality, especially for larger \alpha:

<img src='http://s0.wp.com/latex.php?latex=E%28N%7Ck%29%2Fk&bg=ffffff&fg=000000&s=0' alt='E(N|k)/k' title='E(N|k)/k' class='latex' /> as a function of k.
E(N|k)/k as a function of k.

So a good rule of thumb is that if you get k pages before a repeat, expect between 2k and 4k pages on the site. However, remember that we are dealing with power-laws, so the variance can be surprisingly high.

 

Bayes’ Broadsword

Yesterday I gave a talk at the joint Bloomberg-London Futurist meeting “The state of the future” about the future of decisionmaking. Parts were updates on my policymaking 2.0 talk (turned into this chapter), but I added a bit more about individual decisionmaking, rationality and forecasting.

The big idea of the talk: ensemble methods really work in a lot of cases. Not always, not perfectly, but they should be among the first tools to consider when trying to make a robust forecast or decision. They are Bayes’ broadsword:

Bayesbroadsword

Forecasting

One of my favourite experts on forecasting is J Scott Armstrong. He has stressed the importance of evidence based forecasting, including checking how well different methods work. The general answer is: not very well, yet people keep on using them. He has been pointing this out since the 70s. It also turns out that expertise only gets you so far: expert forecasts are not very reliable either, and the accuracy levels out quickly with increasing level of expertise. One implication is that one should at least get cheap experts since they are about as good as the pricey ones. It is also known that simple models for forecasting tends to be more accurate than complex ones, especially in complex and uncertain situations (see also Haldane’s “The Dog and the Frisbee”). Another important insight is that it is often better to combine different methods than try to select the one best method.

Another classic look at prediction accuracy is Philip Tetlock’s Expert Political Judgment (2005) where he looked at policy expert predictions. They were only slightly more accurate than chance, worse than basic extrapolation algorithms, and there was a negative link to fame: high profile experts have an incentive to be interesting and dramatic, but not right. However, he noticed some difference between “hedgehogs” (people with One Big Theory) and “foxes” (people using multiple theories), with the foxes outperforming hedgehogs.

OK, so in forecasting it looks like using multiple methods, theories and data sources (including experts) is a way to get better results.

Statistical machine learning

A standard problem in machine learning is to classify something into the right category from data, given a set of training examples. For example, given medical data such as age, sex, and blood test results, diagnose what a particular disease a patient might suffer from. The key problem is that it is non-trivial to construct a classifier that works well on data different from the training data. It can work badly on new data, even if it works perfectly on the training examples. Two classifiers that perform equally well during training may perform very differently in real life, or even for different data.

The obvious solution is to combine several classifiers and average (or vote about) their decisions: ensemble based systems. This reduces the risk of making a poor choice, and can in fact improve overall performance if they can specialize for different parts of the data. This also has other advantages: very large datasets can be split into manageable chunks that are used to train different components of the ensemble, tiny datasets can be “stretched” by random resampling to make an ensemble trained on subsets, outliers can be managed by “specialists”, in data fusion different types of data can be combined, and so on. Multiple weak classifiers can be combined into a strong classifier this way.

The method benefits from having diverse classifiers that are combined: if they are too similar in their judgements, there is no advantage. Estimating the right weights to give to them is also important, otherwise a truly bad classifier may influence the output.

Iris data classified using an ensemble of classification methods.
Iris data classified using an ensemble of classification methods (LDA, NBC, various kernels, decision tree). Note how the combination of classifiers also roughly indicates the overall reliability of classifications in a region.

The iconic demonstration of the power of this approach was the Netflix Prize, where different teams competed to make algorithms that predicted user ratings of films from previous ratings. As part of the rules the algorithms were made public, spurring innovation. When the competition concluded in 2009, the leading teams all consisted of ensemble methods where component algorithms were from past teams. The two big lessons were (1) that a combination of not just the best algorithms, but also less accurate algorithms, were the key to winning, and (2) that organic organization allows the emergence of far better performance than having strictly isolated teams.

Group cognition

Condorcet’s jury theorem is perhaps the classic result in group problem solving: if a group of people hold a majority vote, and each has a probability p>1/2 of voting for the correct choice, then the probability the group will vote correctly is higher than p and will tend to approach 1 as the size of the group increases. This presupposes that votes are independent, although stronger forms of the theorem have been proven. (In reality people may have different preferences so there is no clear “right answer”)

Probability that groups of different sizes will reach the correct decision as a function of the individual probability of voting right.
Probability that groups of different sizes will reach the correct decision as a function of the individual probability of voting right.

By now the pattern is likely pretty obvious. Weak decision-makers (the voters) are combined through a simple procedure (the vote) into better decision-makers.

Group problem solving is known to be pretty good at smoothing out individual biases and errors. In The Wisdom of Crowds Surowiecki suggests that the ideal crowd for answering a question in a distributed fashion has diversity of opinion, independence (each member has an opinion not determined by the other’s), decentralization (members can draw conclusions based on local knowledge), and the existence of a good aggregation process turning private judgements into a collective decision or answer.

Perhaps the grandest example of group problem solving is the scientific process, where peer review, replication, cumulative arguments, and other tools make error-prone and biased scientists produce a body of findings that over time robustly (if sometimes slowly) tends towards truth. This is anything but independent: sometimes a clever structure can improve performance. However, it can also induce all sorts of nontrivial pathologies – just consider the detrimental effects status games have on accuracy or focus on the important topics in science.

Small group problem solving on the other hand is known to be great for verifiable solutions (everybody can see that a proposal solves the problem), but unfortunately suffers when dealing with “wicked problems” lacking good problem or solution formulation. Groups also have scaling issues: a team of N people need to transmit information between all N(N-1)/2 pairs, which quickly becomes cumbersome.

One way of fixing these problems is using software and formal methods.

The Good Judgement Project (partially run by Tetlock and with Armstrong on the board of advisers) participated in the IARPA ACE program to try to improve intelligence forecasts. They used volunteers and checked their forecast accuracy (not just if they got things right, but if claims that something was 75% likely actually came true 75% of the time). This led to a plethora of fascinating results. First, accuracy scores based on the first 25 questions in the tournament predicted subsequent accuracy well: some people were consistently better than others, and it tended to remain constant. Training (such a debiasing techniques) and forming teams also improved performance. Most impressively, using the top 2% “superforecasters” in teams really outperformed the other variants. The superforecasters were a diverse group, smart but by no means geniuses, updating their beliefs frequently but in small steps.

The key to this success was that a computer- and statistics-aided process found the good forecasters and harnessed them properly (plus, the forecasts were on a shorter time horizon than the policy ones Tetlock analysed in his previous book: this both enables better forecasting, plus the all-important feedback on whether they worked).

Another good example is the Galaxy Zoo, an early crowd-sourcing project in galaxy classification (which in turn led to the Zooniverse citizen science project). It is not just that participants can act as weak classifiers and combined through a majority vote to become reliable classifiers of galaxy type. Since the type of some galaxies is agreed on by domain experts they can used to test the reliability of participants, producing better weightings. But it is possible to go further, and classify the biases of participants to create combinations that maximize the benefit, for example by using overly “trigger happy” participants to find possible rare things of interest, and then check them using both conservative and neutral participants to become certain. Even better, this can be done dynamically as people slowly gain skill or change preferences.

The right kind of software and on-line “institutions” can shape people’s behavior so that they form more effective joint cognition than they ever could individually.

Conclusions

The big idea here is that it does not matter that individual experts, forecasting methods, classifiers or team members are fallible or biased, if their contributions can be combined in such a way that the overall output is robust and less biased. Ensemble methods are examples of this.

While just voting or weighing everybody equally is a decent start, performance can be significantly improved by linking it to how well the participants perform. Humans can easily be motivated by scoring (but look out for disalignment of incentives: the score must accurately reflect real performance and must not be gameable).

In any case, actual performance must be measured. If we cannot tell if some method is more accurate than something else, then either accuracy does not matter (because it cannot be distinguished or we do not really care), or we will not get the necessary feedback to improve it. It is known from the expertise literature that one of the key factors for it to be possible to become an expert on a task is feedback.

Having a flexible structure that can change is a good approach to handling a changing world. If people have disincentives to change their mind or change teams, they will not update beliefs accurately.

I got a good question after the talk: if we are supposed to keep our models simple, how can we use these complicated ensembles? The answer is of course that there is a difference between using a complex and a complicated approach. The methods that tend to be fragile are the ones with too many free parameters, too much theoretical burden: they are the complex “hedgehogs”. But stringing together a lot of methods and weighting them appropriately merely produces a complicated model, a “fox”. Component hedgehogs are fine as long as they are weighed according to how well they actually perform.

(In fact, adding together many complex things can make the whole simpler. My favourite example is the fact that the Kolmogorov complexity of integers grows boundlessly on average, yet the complexity of the set of all integers is small – and actually smaller than some integers we can easily name. The whole can be simpler than its parts.)

In the end, we are trading Occam’s razor for a more robust tool: Bayes’ Broadsword. It might require far more strength (computing power/human interaction) to wield, but it has longer reach. And it hits hard.

Appendix: individual classifiers

I used Matlab to make the illustration of the ensemble classification. Here are some of the component classifiers. They are all based on the examples in the Matlab documentation. My ensemble classifier is merely a maximum vote between the component classifiers that assign a class to each point.

Iris data classified using a naive Bayesian classifier assuming Gaussian distributions.
Iris data classified using a naive Bayesian classifier assuming Gaussian distributions.
Iris data classified using a decision tree.
Iris data classified using a decision tree.
Iris data classified using Gaussian kernels.
Iris data classified using Gaussian kernels.
Iris data classified using linear discriminant analysis.
Iris data classified using linear discriminant analysis.

 

All models are wrong, some are useful – but how can you tell?

City engineeringOur whitepaper about the systemic risk of risk modelling is now out. The topic is how the risk modelling process can make things worse – and ways of improving things. Cognitive bias meets model risk and social epistemology.

The basic story is that in insurance (and many other domains) people use statistical models to estimate risk, and then use these estimates plus human insight to come up with prices and decisions. It is well known (at least in insurance) that there is a measure of model risk due to the models not being perfect images of reality; ideally the users will take this into account. However, in reality (1) people tend to be swayed by models, (2) they suffer from various individual and collective cognitive biases making their model usage imperfect and correlates their errors, (3) the markets for models, industrial competition and regulation leads to fewer models being used than there could be. Together this creates a systemic risk: everybody makes correlated mistakes and decisions, which means that when a bad surprise happens – a big exogenous shock like a natural disaster or a burst of hyperinflation, or some endogenous trouble like a reinsurance spiral or financial bubble – the joint risk of a large chunk of the industry failing is much higher than it would have been if everybody had had independent, uncorrelated models. Cue bailouts or skyscrapers for sale.

Note that this is a generic problem. Insurance is just unusually self-aware about its limitations (a side effect of convincing everybody else that Bad Things Happen, not to mention seeing the rest of the financial industry running into major trouble). When we use models the model itself (the statistics and software) is just one part: the data fed into the model, the processes of building and tuning the model, how people use it in their everyday work, how the output leads to decisions, and how the eventual outcomes become feedback to the people involved – all of these factors are important parts in making model use useful. If there is no or too slow feedback people will not learn what behaviours are correct or not. If there are weak incentives to check errors of one type, but strong incentives for other errors, expect the system to become biased towards one side. It applies to climate models and military war-games too.

The key thing is to recognize that model usefulness is not something that is directly apparent: it requires a fair bit of expertise to evaluate, and that expertise is also not trivial to recognize or gain. We often compare models to other models rather than reality, and a successful career in predicting risk may actually be nothing more than good luck in avoiding rare but disastrous events.

What can we do about it? We suggest a scorecard as a first step: comparing oneself to some ideal modelling process is a good way of noticing where one could find room for improvement. The score does not matter as much as digging into one’s processes and seeing whether they have cruft that needs to be fixed – whether it is following standards mindlessly, employees not speaking up, basing decisions on single models rather than more broad views of risk, or having regulators push one into the same direction as everybody else. Fixing it may of course be tricky: just telling people to be less biased or to do extra error checking will not work, it has to be integrated into the organisation. But recognizing that there may be a problem and getting people on board is a great start.

In the end, systemic risk is everybody’s problem.

Messages on plaques and disks

Local plaqueOn Sky News I mildly disagree with Christopher Riley about whether we ought to add a short update message to the Voyager probes.

Representing ourselves

If we wanted to represent humanity most honestly to aliens, we would just give them a constantly updated full documentation of our cultures and knowledge. But that is not possible.

So in METI we may consider sending “a copy of the internet” as a massive snapshot of what we currently are, or as the Voyager recording did, send a sample of what we are. In both cases it is a snapshot at a particular time: had we sent the message at some other time, the contents would have been different. The selection used is also a powerful shaper, with what is chosen as representative telling a particular story.

That we send a snapshot is not just a necessity, it may be a virtue. The full representation of what humanity is, is not so much a message as a gift with potentially tricky moral implications: imagine if we were given the record of an alien species, clearly sent with the intention that we ought to handle it according to some – to us unknowable – preferences. If we want to do some simple communication, essentially sending a postcard-like “here we are! This is what we think we are!” is the best we can do. A thick and complex message would obscure the actual meaning:

The spacecraft will be encountered and the record played only if there are advanced space-faring civilizations in interstellar space. But the launching of this ‘bottle’ into the cosmic ‘ocean’ says something very hopeful about life on this planet.
– Carl Sagan

It is a time capsule we send because we hope to survive and matter. If it becomes an epitaph of our species it is a decent epitaph. Anybody receiving it is a bonus.

Temporal preferences

How should we relate to this already made and launched message?

Clearly we want the message to persist, maybe be detected, and ideally understood. We do not want the message to be distorted by random chance (if it can be avoided) or by independent actors.

This is why I am not too keen on sending an addendum. One can change the meaning of a message with a small addition: “Haha, just kidding!” or “We were such tools in the 1970s!”

Note that we have a present desire for a message (possibly the original) to reach the stars, but the launchers in 1977 clearly wanted their message to reach the stars: their preferences were clearly linked to what they selected. I think we have a moral duty to respect past preferences for information. I have expressed it elsewhere as a temporal golden rule: “treat the past as you want the future to treat you”. We would not want our message or amendments changed, so we better be careful about past messages.

Additive additions

However, adding a careful footnote is not necessarily wrong. But it needs to be in the spirit of the past message, adding to it.

So what kind of update would be useful?

We might want to add something that we have learned since the launch that aliens ought to know. For example, an important discovery. But this needs to be something that advanced aliens are unlikely to already know, which is tricky: they likely know about dark matter, that geopolitical orders can suddenly shift, or a proof of the Poincaré conjecture.

They have to be contingent, unique to humanity, and ideally universally significant. Few things are. Maybe that leaves us with adding the notes for some new catchy melody (“Gangnam style” or “Macarena”?) or a really neat mathematical insight (PCP theorem? Oops, it looks like Andrew Wiles’ Fermat proof is too large for the probe).

In the end, maybe just a “Still here, 38 years later” may be the best addition. Contingent, human, gives some data on the survival of intelligence in the universe.

Halloween explanation of Fermi question

dysonpumpkin

John Harris proposed a radical solution to the KIC 8462852 problem: it is a Halloween pumpkin.

A full Dyson sphere does not have to be 100% opaque. It consists of independently orbiting energy collectors, presumably big flat surfaces. But such collectors can turn their thin side towards the star, letting past starlight. So with the right program, your Dyson sphere could project any pattern of light like a lantern.

Of course, the real implication of this is that we should watch out for trick-or-treating alien super-civilizations. By using self-replicating Bracewell probes they could spread across the Milky way within a few million years: they ought to be here by now. And in this scenario they are… they are just hiding until KIC 8462852 suddenly turns into a skull, and suddenly the skies will swarming with their saucers demanding we give them treats – or suffer their tricks…

There is just one problem: when is galactic Halloween? A galactic year is 250 million years. We have a 1/365 chance of being in the galactic “day” corresponding to Halloween (itself 680,000 years long). We might be in for a long night…

 

Likely not even a microDyson

XIX: The Dyson SunRight now KIC 8462852 is really hot, and not just because it is a F3 V/IV type star: the light curve, as measured by Kepler, has irregular dips that looks like something (or rather, several somethings) are obscuring the star. The shapes of the dips are odd. The system is too old and IR-clean to have a remaining protoplanetary disk, dust clumps would coalesce, the aftermath of a giant planet impact is very unlikely (and hard to fit with the aperiodicity); maybe there is a storm of comets due to a recent stellar encounter, but comets are not very good at obscuring stars. So a lot of people on the net are quietly or not so quietly thinking that just maybe this is a Dyson sphere under construction.

I doubt it.

My basic argument is this: if a civilization builds a Dyson sphere it is unlikely to remain small for a long period of time. Just as planetary collisions are so rare that we should not expect to see any in the Kepler field, the time it takes to make a Dyson sphere is also very short: seeing it during construction is very unlikely.

Fast enshrouding

In my and Stuart Armstrong’s paper “Eternity in Six Hours” we calculated that disassembling Mercury to make a partial Dyson shell could be done in 31 years. We did not try to push things here: our aim was to show that using a small fraction of the resources in the solar system it is possible to harness enough energy to launch a massive space colonization effort (literally reaching every reachable galaxy, eventually each solar system). Using energy from already built solar captors more material is mined and launched, producing an exponential feedback loop. This was originally discussed by Robert Bradbury. The time to disassemble terrestrial planets is not much longer than for Mercury, while the gas giants would take a few centuries.

If we imagine the history of a F5 star 1,000 years is not much. Given the estimated mass of KIC 8462852 as 1.46 solar masses, it will have a main sequence lifespan of 4.1 billion years. The chance of seeing it while being enshrouded is one in 4.3 million. This is the same problem as the giant impact theory.

A ruin?

An abandoned Dyson shell would likely start clumping together; this might at first sound like a promising – if depressing – explanation of the observation. But the timescale is likely faster than planetary formation timescales of 10^510^6 years – the pieces are in nearly identical orbits – so the probability problem remains.

But it is indeed more likely to see the decay of the shell than the construction by several orders of magnitude. Just like normal ruins hang around far longer than the time it took to build the original building.

Laid-back aliens?

Maybe the aliens are not pushing things? Obviously one can build a Dyson shell very slowly – in a sense we are doing it (and disassembling Earth to a tiny extent!) by launching satellites one by one. So if an alien civilization wanted to grow at a leisurely rate or just needed a bit of Dyson shell they could of course do it.

However, if you need something like 2.87\cdot 10^{19} Watt (a 100,000 km collector at 1 AU around the star) your demands are not modest. Freeman Dyson originally proposed the concept based on the observation that human energy needs were growing exponentially, and this was the logical endpoint. Even at 1% growth rate a civilization quickly – in a few millennia – need most of the star’s energy.

In order to get a reasonably high probability of seeing an incomplete shell we need to assume growth rates that are exceedingly small (on the order of less than a millionth per year). While it is not impossible, given how the trend seems to be towards more intense energy use in many systems and that entities with higher growth rates will tend to dominate a population, it seems rather unlikely. Of course, one can argue that we currently can more easily detect the rare laid-back civilizations than the ones that aggressively enshrouded their stars, but Dyson spheres do look pretty rare.

Other uses?

Dyson shells are not the only megastructures that could cause intriguing transits.

C. R. McInnes has a suite of fun papers looking at various kinds of light-related megastructures. One can sort asteroid material using light pressure, engineer climate, adjust planetary orbits, and of course travel using solar sails. Most of these are smallish compared to stars (and in many cases dust clouds), but they show some of the utility of obscuring objects.

Duncan Forgan has a paper on detecting stellar engines (Shkadov thrusters) using light curves; unfortunately the calculated curves do not fit KIC8462852 as far as I can tell.

Luc Arnold analysed the light curves produced by various shapes of artificial objectsHe suggested that one could make a weirdly shaped mask for signalling one’s presence using transits. In principle one could make nearly any shape, but for signalling something unusual yet simple enough to be artificial would make most sense: I doubt the KIC transits fit this.

More research is needed (duh)

In the end, we need more data. I suspect we will find that it is yet another odd natural phenomenon or coincidence. But it makes sense to watch, just in case.

Were we to learn that there is (or was) a technological civilization acting on a grand scale it would be immensely reassuring: we would know intelligent life could survive for at least some sizeable time. This is the opposite side of the Great Filter argument for why we should hope not to see any extraterrestrial life: life without intelligence is evidence for intelligence either being rare or transient, but somewhat non-transient intelligence in our backyard (just 1,500 light-years away!) is evidence that it is neither rare nor transient. Which is good news, unless we fancy ourselves as unique and burdened by being stewards of the entire reachable universe.

But I think we will instead learn that the ordinary processes of astrophysics can produce weird transit curves, perhaps due to weird objects (remember when we thought hot jupiters were exotic?) The universe is full of strange things, which makes me happy I live in it.

[An edited version of this post can be found at The Conversation: What are the odds of an alien megastructure blocking light from a distant star? ]