What is the smallest positive integer that will never be used?

A great question from twitter:

This is a bit like the “what is the smallest uninteresting number?” paradox, but not paradoxical: we do not have to name it (and hence make it thought about/interesting) to reason about it.

I will first give a somewhat rough probabilistic bound, and then a much easier argument for the scale of this number. TL;DR: the number is likely smaller than 10^{173}.

Probabilistic bound

If we think about k numbers with frequencies N(x), N(x) approaches some probability distribution p(x). To simplify things we assume p(x) is a decreasing function of x; this is not strictly true (see below) but likely good enough.

If we denote the cumulative distribution function P(x)=\Pr[X<x] we can use the k:th order statistics to calculate the distribution of the maximum of the numbers: F_{(k)}(x) = [P(x)]^{k}. We are interested in the point where it becomes is likely that the number x has not come up despite the trials, which is somewhere above the median of the maximum: F_{(k)}(x^*)=1/2.

What shape does p(x) have? (Dorogovtsev, Mendes, Oliveira 2005) investigated numbers online and found a complex, non-monotonic shape. Obviously dates close to the present are overrepresented, as are prices (ending in .99 or .95), postal codes and other patterns. Numbers in exponential notation stretch very far up. But mentions of explicit numbers generally tend to follow N(x)\sim 1/\sqrt{x}, a very flat power-law.

So if we have k uses we should expect roughly x<k^2 since much larger x are unlikely to occur even once in the sample. We can hence normalize to get p(x)=\frac{1}{2(k-1)}\frac{1}{\sqrt{x}}. This gives us P(x)=(\sqrt{x}-1)/(k-1), and hence F_{(k)}(x)=[(\sqrt{x}-1)/(k-1)]^k. The median of the maximum becomes x^* = ((k-1)2^{-1/k}+1)^2 \approx k^2 - 2k \ln(2). We are hence not entirely bumping into the k^2 ceiling, but we are close – a more careful argument is needed to take care of this.

So, how large is $k$ today? Dorogovtsev et al. had on the order of k=10^{12}, but that was just searchable WWW pages back in 2005. Still, even those numbers contain numbers that no human ever considered since many are auto-generated. So guessing x^* \approx 10^{24} is likely not too crazy. So by this argument, there are likely 24 digit numbers that nobody ever considered.

Consider a number…

Another approach is to assume each human considers a number about once a minute throughout their lifetime (clearly an overestimate given childhood, sleep, innumeracy etc. but we are mostly interested in orders of magnitude anyway and making an upper bound) which we happily assume to be about a century, giving a personal k across a life as about 10^{8}. There has been about 100 billion people, so humanity has at most considered 10^{19} numbers. This would give an estimate using my above formula as x^* \approx 10^{38}.

But that assumes “random” numbers, and is a very loose upper bound, merely describing a “typical” small unconsidered number. Were we to systematically think through the numbers from 1 and onward we would have the much lower x^* \approx 10^{19}. Just 19 digits!

One can refine this a bit: if we have time T and generate new numbers at a rate r per second, then k=rT and we will at most get k numbers. Hence the smallest number never considered has to be at most k+1.

Seth Lloyd estimated that the observable universe cannot have performed more than 10^{120} operations on 10^{90} bits. If each of those operations was a consideration of a number we get a bound on the first unconsidered number as <10^{120}.

This can be used to consider the future too. Computation of our kind can continue until proton decay in \sim 10^{36} years or so, giving a bound of 10^{173} if we use Lloyd’s formula. That one uses the entire observable universe; if we instead consider our own future light cone the number is going to be much smaller.

But the conclusion is clear: if you write a 173 digit number with no particular pattern of digits (a bit more than two standard lines of typing), it is very likely that this number would never have shown up across the history of the entire universe except for your action. And there is a smaller number that nobody – no human, no alien, no posthuman galaxy brain in the far future – will ever consider.

 

 

Shakespearian numbers

Number lineDuring a recent party I got asked the question “Since \pi has an infinite decimal expansion, does that mean the collected works of Shakespeare (suitably encoded) are in it somewhere?”

My first response was to point out that infinite decimal expressions are not enough: obviously 1/3=0.33333\ldots is a Shakespeare-free number (unless we have a bizarre encoding of the works in the form of all threes). What really matters is whether the number is suitably random. In mathematics this is known as the question about whether pi is a normal number.

If it is normal, then by the infinite monkey theorem then Shakespeare will almost surely be in the number. We actually do not know whether pi is normal, but it looks fairly likely. But that is not enough for a mathematician. A good overview of the problem can be found in a popular article by Bailey and Borwein. (Yep, one of the Borweins)

Where are the Shakespearian numbers?

This led to a second issue: what is the distribution of the Shakespeare-containing numbers?

We can encode Shakespeare in many ways. As an ASCII text the works take up 5.3 MB. One can treat this as a sequence of 7-bit characters and the works as  37,100,000 bits, or 11,168,212 decimal digits. A simple code where each pair of digits encode a character would encode 10,600,000 digits. This allows just a 100 character alphabet rather than a 127 character alphabet, but is likely OK for Shakespeare: we can use the ASCII code minus 32, for example.

If we denote the encoded works of Shakespeare by [Shakespeare], all numbers of the form 0.[Shakespeare]xxxxx\ldots are Shakespeare-containing.

They form a rather tiny interval: since the works start with ‘The’, [Shakespeare] starts as “527269…” and the interval lies inside the interval [0. 527269000\ldots , 0.52727], a mere millionth of [0,1]. The actual interval is even shorter.

But outside that interval there are numbers of the form 0.y[Shakespeare]xxxx\ldots , where y is a digit different from the starting digit of [Shakespeare] and x anything else. So there are 9 such second level intervals, each ten times thinner than the first level interval.

This pattern continues, with the intervals at each level ten times thinner but also 9 times as numerous. This is fairly similar to the Cantor set and gives rise to a fractal. But since the intervals are very tiny it is hard to see.

One way of visualizing this is to assume the weird encoding [Shakespeare]=3, so all numbers containing the digit 3 in the decimal expansion are Shakespearian and the rest are Shakespeare-free.

Distribution of Shakespeare-free numbers in the unit interval, assuming Shakespeare's collected works are encoded as the digit "3".
Distribution of Shakespeare-free numbers in the unit interval, assuming Shakespeare’s collected works are encoded as the digit “3”.

The fractal dimension of this Shakespeare-free set is \log(9)/\log(10)\approx 0.9542. This is less than 1: most points are Shakespearian and in one of the intervals, but since they are thin compared to the line the Shakespeare-free set is nearly one dimensional. Like the Cantor set, each Shakespeare-free number is isolated from any other Shakespeare-free number: there is always some Shakespearian numbers between them.

In the case of the full 5.3MB [Shakespeare] the interval length is around 10^{-10,600,000}. The fractal dimension of the Shakespeare-free set is \log(10^{10,600,000} - 1)/\log(10^{10,600,600}) \approx 1-\epsilon, for some tiny \epsilon \approx 10^{-10,600,000}.  It is very nearly an unbroken line… except for that nearly every point actually does contain Shakespeare.

We have been looking at the unit interval. We can of course look at the entire real line too, but the pattern is similar: just magnify the unit interval pattern by 10, 100, 1000, … times. Somewhere around  $10^{10,600,000}$ there are the numbers that have an integer part equal to [Shakespeare]. And above them are the intervals that start with his works followed by something else, a decimal point and then any decimals. And beyond them there are the [Shakespeare][Shakespeare]xxx\ldots numbers…

Shakespeare is common

One way of seeing that Shakespearian numbers are the generic case is to imagine choosing a number randomly. It has probability S of being in the level 1 interval of Shakespearian numbers. If not, then it will be in one of the 9 intervals 1/10 long that don’t start with the correct first digit, where the probability of starting with Shakespeare in the second digit is S. If that was all there was, the total probability would be S+(9/10)S+(9/10^2)S+\ldots = 10S<1. But the 1/10 interval around the first Shakespearian interval also counts: a number that has the right first digit but wrong second digit can still be Shakespearian. So it will add probability.

Another way of thinking about it is just to look at the initial digits: the probability of starting with [Shakespeare] is S, the probability of starting with [Shakespeare] in position 2 is (1-S)S (the first factor is the probability of not having Shakespeare first), and so on. So the total probability of finding Shakespeare is S + (1-S)S + (1-S)^2S + (1-S)^3S + \ldots = S/(1-(1-S))=1. So nearly all numbers are Shakespearian.

This might seem strange, since any number you are likely to mention is very likely Shakespeare-free. But this is just like the case of transcendental, normal or uncomputable numbers: they are actually the generic case in the reals, but most everyday numbers belong to the algebraic, non-normal and computable numbers.

It is also worth remembering that while all normal numbers are (almost surely) Shakespearian, there are non-normal Shakespearian numbers. For example, the fractional number 0.[Shakespeare]000\ldots is non-normal but Shakespearian. So is 0.[Shakespeare][Shakespeare][Shakespeare]\ldots We can throw in arbitrary finite sequences of digits between the Shakespeares, biasing numbers as close or far as we want from normality. There is a number 0.[Shakespeare]3141592\ldots that has the digits of \pi plus Shakespeare. And there is a number that looks like \pi until Graham’s number digits, then has a single Shakespeare and then continues. Shakespeare can hide anywhere.

In things of great receipt with case we prove,
Among a number one is reckoned none.
Then in the number let me pass untold,
Though in thy store’s account I one must be
-Sonnet 136