Thanks for the razor, Bill!

Thermocouple, Piotr KowalskiI like the idea of a thanksgiving day, leaving out all the Americana turkeys, problematic immigrant-native relations and family logistics: just the moment to consider what really matters to you and why life is good. And giving thanks for intellectual achievements and tools makes eminent sense: This thanksgiving Sean Carroll gave thanks for the Fourier transform.

Inspired by this, I want to give thanks for Occam’s razor.

These days a razor in philosophy denotes a rule of thumb that allows one to eliminate something unnecessary or unlikely. Occam’s was the first: William of Ockham (ca. 1285-1349) stated “Pluralitas non est ponenda sine neccesitate” (“plurality should not be posited without necessity.”) Today we usually phrase it as “the simplest theory that fits is best”.

Principles of parsimony have been suggested for a long time; Aristotle had one, so did Maimonides and various other medieval thinkers. But let’s give Bill from Ockham the name in the spirit of Stigler’s law of eponymy.

Of course, it is not always easy to use. Is the many worlds interpretation of quantum mechanics possible to shave away? It posits an infinite number of worlds that we cannot interact with… except that it does so by taking the quantum mechanics formalism seriously (each possible world is assigned a probability) and not adding extra things like wavefunction collapse or pilot waves. In many ways it is conceptually simpler: just because there are a lot of worlds doesn’t mean they are wildly different. Somebody claiming there is a spirit world is doubling the amount of stuff in the universe, but that there is a lot of ordinary worlds is not too different from the existence of a lot of planets.

Simplicity is actually quite complicated. One can argue about which theory has the fewest and most concise basic principles, but also the number of kinds of entities postulated by the theory. Not to mention why one should go for parsimony at all.

In my circles, we like to think of the principle in terms of Bayesian statistics and computational complexity. The more complex a theory is, the better it can typically fit known data – but it will also generalize worse to new data, since it overfits the first set of data points. Parsimonious theories have fewer degrees of freedom, so they cannot fit as well as complex theories, but they are less sensitive to noise and generalize better. One can operationalize the optimal balance using various statistical information criteria (AIC = minimize the information lost when fitting, BIC = maximize likeliehood of the model). And Solomonoff gave a version of the razor in theoretical computer science: for computable sequences of bits there exists a unique (up to choice of Turing machine) prior that promotes sequences generated by simple programs and has awesome powers of inference.

But in day-to-day life Occam works well, especially with a maximum probability principle (you are more likely to see likely things than unlikely; if you see hoofprints in the UK, think horses not zebras). A surprising number of people fall for the salient stories inherent in unlikely scenarios and then choose to ignore Occam (just think of conspiracy theories). If the losses from low-probability risks are great enough one should rationally focus on them, but then one must check one’s priors for such risks. Starting out with a possibilistic view that anything is possible (and hence have roughly equal chance) means that one becomes paranoid or frozen with indecision. Occam tells you to look for the simple, robust ways of reasoning about the world. When they turn out to be wrong, shift gears and come up with the next simplest thing.

Simplicity might sometimes be elegant, but that is not why we should choose it. To me it is the robustness that matters: given our biased, flawed thought processes and our limited and noisy data, we should not build too elaborate castles on those foundations.