What is the smallest positive integer that will never be used?

A great question from twitter:

This is a bit like the “what is the smallest uninteresting number?” paradox, but not paradoxical: we do not have to name it (and hence make it thought about/interesting) to reason about it.

I will first give a somewhat rough probabilistic bound, and then a much easier argument for the scale of this number. TL;DR: the number is likely smaller than 10^{173}.

Probabilistic bound

If we think about k numbers with frequencies N(x), N(x) approaches some probability distribution p(x). To simplify things we assume p(x) is a decreasing function of x; this is not strictly true (see below) but likely good enough.

If we denote the cumulative distribution function P(x)=\Pr[X<x] we can use the k:th order statistics to calculate the distribution of the maximum of the numbers: F_{(k)}(x) = [P(x)]^{k}. We are interested in the point where it becomes is likely that the number x has not come up despite the trials, which is somewhere above the median of the maximum: F_{(k)}(x^*)=1/2.

What shape does p(x) have? (Dorogovtsev, Mendes, Oliveira 2005) investigated numbers online and found a complex, non-monotonic shape. Obviously dates close to the present are overrepresented, as are prices (ending in .99 or .95), postal codes and other patterns. Numbers in exponential notation stretch very far up. But mentions of explicit numbers generally tend to follow N(x)\sim 1/\sqrt{x}, a very flat power-law.

So if we have k uses we should expect roughly x<k^2 since much larger x are unlikely to occur even once in the sample. We can hence normalize to get p(x)=\frac{1}{2(k-1)}\frac{1}{\sqrt{x}}. This gives us P(x)=(\sqrt{x}-1)/(k-1), and hence F_{(k)}(x)=[(\sqrt{x}-1)/(k-1)]^k. The median of the maximum becomes x^* = ((k-1)2^{-1/k}+1)^2 \approx k^2 - 2k \ln(2). We are hence not entirely bumping into the k^2 ceiling, but we are close – a more careful argument is needed to take care of this.

So, how large is $k$ today? Dorogovtsev et al. had on the order of k=10^{12}, but that was just searchable WWW pages back in 2005. Still, even those numbers contain numbers that no human ever considered since many are auto-generated. So guessing x^* \approx 10^{24} is likely not too crazy. So by this argument, there are likely 24 digit numbers that nobody ever considered.

Consider a number…

Another approach is to assume each human considers a number about once a minute throughout their lifetime (clearly an overestimate given childhood, sleep, innumeracy etc. but we are mostly interested in orders of magnitude anyway and making an upper bound) which we happily assume to be about a century, giving a personal k across a life as about 10^{8}. There has been about 100 billion people, so humanity has at most considered 10^{19} numbers. This would give an estimate using my above formula as x^* \approx 10^{38}.

But that assumes “random” numbers, and is a very loose upper bound, merely describing a “typical” small unconsidered number. Were we to systematically think through the numbers from 1 and onward we would have the much lower x^* \approx 10^{19}. Just 19 digits!

One can refine this a bit: if we have time T and generate new numbers at a rate r per second, then k=rT and we will at most get k numbers. Hence the smallest number never considered has to be at most k+1.

Seth Lloyd estimated that the observable universe cannot have performed more than 10^{120} operations on 10^{90} bits. If each of those operations was a consideration of a number we get a bound on the first unconsidered number as <10^{120}.

This can be used to consider the future too. Computation of our kind can continue until proton decay in \sim 10^{36} years or so, giving a bound of 10^{173} if we use Lloyd’s formula. That one uses the entire observable universe; if we instead consider our own future light cone the number is going to be much smaller.

But the conclusion is clear: if you write a 173 digit number with no particular pattern of digits (a bit more than two standard lines of typing), it is very likely that this number would never have shown up across the history of the entire universe except for your action. And there is a smaller number that nobody – no human, no alien, no posthuman galaxy brain in the far future – will ever consider.

 

 

How small is the wiki?

Recently I encountered a specialist Wiki. I pressed “random page” a few times, and got a repeat page after 5 tries. How many pages should I expect this small wiki to have?

We can compare this to the German tank problem. Note that it is different; in the tank problem we have a maximum sample (maybe like the web pages on the site were numbered), while here we have number of samples before repetition.

We can of course use Bayes theorem for this. If I get a repeat after k random samples, the posterior distribution of N, the number of pages, is P(N|k) = P(k|N)P(N)/P(k).

If I randomly sample from N pages, the probability of getting a repeat on my second try is 1/N, on my third try 2/N, and so on: P(k|N)=(k-1)/N. Of course, there has to be more pages than k-1, otherwise a repeat must have happened before step k, so this is valid for k \leq N+1. Otherwise, P(k|N)=0 for k>N+1.

The prior P(N) needs to be decided. One approach is to assume that websites have a power-law distributed number of pages. The majority are tiny, and then there are huge ones like Wikipedia; the exponent is close to 1. This gives us P(N) = N^{-\alpha}/\zeta(\alpha). Note the appearance of the Riemann zeta function as a normalisation factor.

We can calculate P(k) by summing over the different possible N: P(k)=\sum_{N=1}^\infty P(k|N)P(N) = \frac{k-1}{\zeta(\alpha)}\sum_{N=k-1}^\infty N^{-(\alpha+1)} =\frac{k-1}{\zeta(\alpha)}(\zeta(\alpha+1)-\sum_{i=1}^{k-2}i^{-(\alpha+1)}).

Putting it all together we get P(N|k)=N^{-(\alpha+1)}/(\zeta(\alpha+1) -\sum_{i=1}^{k-2}i^{-(\alpha+1)}) for N\geq k-1. The posterior distribution of number of pages is another power-law. Note that the dependency on k is rather subtle: it is in the support of the distribution, and the upper limit of the partial sum.

What about the expected number of pages in the wiki? E(N|k)=\sum_{N=1}^\infty N P(N|k) = \sum_{N=k-1}^\infty N^{-\alpha}/(\zeta(\alpha+1) -\sum_{i=1}^{k-2}i^{-(\alpha+1)}) =\frac{\zeta(\alpha)-\sum_{i=1}^{k-2} i^{-\alpha}}{\zeta(\alpha+1)-\sum_{i=1}^{k-2}i^{-(\alpha+1)}}. The expectation is the ratio of the zeta functions of \alpha and \alpha+1, minus the first k-2 terms of their series.

Distribution of P(N|5) for alpha=1.1.
Distribution of P(N|5) for [latex]\alpha=1.1[/latex].

So, what does this tell us about the wiki I started with? Assuming \alpha=1.1 (close to the behavior of big websites), it predicts E(N|k)\approx 21.28. If one assumes a higher \alpha=2 the number of pages would be 7 (which was close to the size of the wiki when I looked at it last night – it has grown enough today for k to equal 13 when I tried it today).

Expected number of pages given k random views before a repeat.
Expected number of pages given k random views before a repeat.

So, can we derive a useful rule of thumb for the expected number of pages? Dividing by k shows that E(N|k) approaches proportionality, especially for larger \alpha:

<img src='http://s0.wp.com/latex.php?latex=E%28N%7Ck%29%2Fk&bg=ffffff&fg=000000&s=0' alt='E(N|k)/k' title='E(N|k)/k' class='latex' /> as a function of k.
E(N|k)/k as a function of k.

So a good rule of thumb is that if you get k pages before a repeat, expect between 2k and 4k pages on the site. However, remember that we are dealing with power-laws, so the variance can be surprisingly high.

 

Rational fractal distributions

Up among the peaksMost of the time we encounter probability distributions over the reals, the positive reals, or integers. But one can use the rational numbers as a probability space too.

Recently I found the paper Vladimir Trifonov, Laura Pasqualucci, Riccardo Dalla-Favera & Raul Rabadan. Fractal-like Distributions over the Rational Numbers in High-throughput Biological and Clinical Data. Scientific Reports 1:191 DOI: 10.1038/srep00191. They discuss the distribution of ratios of the number of reads from the same spot of DNA that come from each chromosome in a pair: the number of reads is an integer, so the ratio is rational. They get a peaky, self-similar distribution empirically, and the paper explains why. 

If you take positive independent integers from some distribution f(n) and generate ratios q=a/(a+b), then those ratios will have a distribution that is a convolution over the rational numbers: g(q) = g(a/(a+b)) = \sum_{m=0}^\infty \sum_{n=0}^\infty f(m) g(n) \delta \left(\frac{a}{a+b} - \frac{m}{m+n} \right ) = \sum_{t=0}^\infty f(ta)f(tb)

One can of course do the same for non-independent and different distributions of the integers. Oh, and by the way: this whole thing has little to do with ratio distributions (alias slash distributions), which is what happens in the real case.

The authors found closed form solutions for integers distributed as a power-law with an exponential cut-off and for the uniform distribution; unfortunately the really interesting case, the Poisson distribution, doesn’t seem to have a neat closed form solution.

In the case of a uniform distributions on the set \{1,2,\ldots , L\} they get g(a/(a+b)) = (1/L^2) \lfloor L/\max(a,b) \rfloor .

The rational distribution g(a/(a+b))=1/max(a,b) of Trifonov et al.
The rational distribution g(a/(a+b))=1/max(a,b) of Trifonov et al.

They note that this is similar to Thomae’s function, a somewhat well-known (and multiply named) counterexample in real analysis. That function is defined as f(p/q)=1/q (where the fraction is in lowest terms). In fact, both graphs have the same fractal dimension of 1.5.

It is easy to generate other rational distributions this way. Using a power law as an input produces a sparser pattern, since the integers going into the ratio tend to be small numbers, putting more probability at simple ratios:

The rational distribution g(a/(a+b))=C(ab)^-2 of Trifonov et al.
The rational distribution g(a/(a+b))=C(ab)^-2 (rational convolution of two index -2 power-law distributed integers).

If we use exponential distributions the pattern is fairly similar, but we can of course change the exponent to get something that ranges over a lot of numbers, putting more probability at nonsimple ratios p/q where p+q \gg 1:

The rational distribution of two convolved Exp[0.1] distributions.
The rational distribution of two convolved Exp[0.1] distributions.
Not everything has to be neat and symmetric. Taking the ratio of two unequal Poisson distributions can produce a rather appealing pattern:

Rational distribution of ratio between a Poisson[10] and a Poisson[5] variable.
Rational distribution of ratio between a Poisson[10] and a Poisson[5] variable.
Of course, full generality would include ratios of non-positive numbers. Taking ratios of normal variates rounded to the nearest integer produces a fairly sparse distribution since high numerators or denominators are rare.

Rational distribution of normal variates rounded to nearest integer.
Rational distribution of a/(a+b) ratios of normal variates rounded to nearest integer.

But multiplying the variates by 10 produces a nice distribution.

Rational distribution of ratios of normal variates multiplied by 10 and rounded.
Rational distribution of a/(a+b) ratios of normal variates that have been multiplied by 10 and rounded.

This approaches the Chauchy distribution as the discretisation gets finer. But note the fun microstructure (very visible in the Poisson case above too), where each peak at a simple ratio is surrounded by a “moat” of low probability. This is reminiscent of the behaviour of roots of random polynomials with integer coefficients (see also John Baez page on the topic).

The rational numbers do tend to induce a fractal recursive structure on things, since most measures on them will tend to put more mass at simple ratios than at complex ratios, but when plotting the value of the ratio everything gets neatly folded together. The lower approximability of numbers near the simple ratios produce moats. Which also suggests a question to ponder further: what role does the über-unapproximable golden ratio have in distributions like these?