Adding cooks to the broth

If there are k key ideas needed to produce some important goal (like AI), there is a constant probability per researcher-year to come up with an idea, and the researcher works for y years, what is the the probability of success? And how does it change if we add more researchers to the team?

The most obvious approach is to think of this as y Bernouilli trials with probability p of success, quickly concluding that the number of successes n at the end of y years will be distributed as \mathrm{Pr}(n)=\binom{y}{n}p^n(1-p)^{y-n}. Unfortunately, then the actual answer to the question will be \mathrm{Pr}(n\geq k) = \sum_{n=k}^y \binom{y}{n}p^n(1-p)^{y-n} which is a real mess…

A somewhat cleaner way of thinking of the problem is to go into continuous time, treating it as a homogeneous Poisson process. There is a rate \lambda of good ideas arriving to a researcher, but they can happen at any time. The time between two ideas will be exponentially distributed with parameter \lambda. So the time t until a researcher has k ideas will be the sum of k exponentials, which is a random variable distributed as the Erlang distribution: f(t; k,\lambda)=\lambda^k t^{k-1} e^{-\lambda t} / (k-1)!.

Just like for the discrete case one can make a crude argument that we are likely to succeed if y is bigger than the mean k/\lambda (or k/p) we will have a good chance of reaching the goal. Unfortunately the variance scales as k/\lambda^2 – if the problems are hard, there is a significant risk of being unlucky for a long time. We have to consider the entire distribution.

Unfortunately the cumulative density function in this case is \mathrm{Pr}(t<y)=1-\sum_{n=0}^{k-1} e^{-\lambda y} (\lambda y)^n / n! which is again not very nice for algebraic manipulation. Still, we can plot it easily.

Before we do that, let us add extra researchers. If there are N researchers, equally good, contributing to the idea generation, what is the new rate of ideas per year? Since we have assumed independence and a Poisson process, it just multiplies the rate by a factor of N. So we replace \lambda with \lambda N everywhere and get the desired answer.

psucc10This is a plot of the case k=10, y=10.

What we see is that for each number of scientists it is a sigmoid curve: if the discovery probability is too low, there is hardly any chance of success, when it becomes comparable to k/N it rises, and sufficiently above we can be almost certain the project will succeed (the yellow plateau). Conversely, adding extra researchers has decreasing marginal returns when approaching the plateau: they make an already almost certain project even more certain. But they do have increasing marginal returns close to the dark blue “floor”: here the chances of success are small, but extra minds increase them a lot.

We can for example plot the ratio of success probability for \lambda=0.09 to the one researcher case as we add researchers:

psuccratioEven with 10 researchers the success probability is just 40%, but clearly the benefit of adding extra researchers is positive. The curve is not quite exponential; it slackens off and will eventually become a big sigmoid. But the overall lesson seems to hold: if the project is a longshot, adding extra brains makes it roughly exponentially more likely to succeed.

It is also worth recognizing that in this model time is on par with discovery rate and number of researchers: what matters is the product \lambda y N and how it compares to k.

This all assumes that ideas arrive independently, and that there are no overheads for having a large team. In reality these things are far more complex. For example, sometimes you need to have idea 1 or 2 before idea 3 becomes possible: that makes the time t_3 of that idea distributed as an exponential plus the distribution of \mathrm{min}(t_1,t_2). If the first two ideas are independent and exponential with rates \lambda, \mu, then the minimum is distributed as an exponential with rate \lambda+\mu. If they instead require each other, we get a non-exponential distribution (the pdf is \lambda e^{-\lambda t} + \mu e^{-\mu t} - (\lambda+\mu)e^{-(\lambda+\mu)t}). Some discoveries or bureaucratic scalings may change the rates. One can construct complex trees of intellectual pathways, unfortunately quickly making the distributions impossible to write out (but still easy to run Monte Carlo on). However, as long as the probabilities and the induced correlations small, I think we can linearise and keep the overall guess that extra minds are exponentially better.

In short: if the cooks are unlikely to succeed at making the broth, adding more is a good idea. If they already have a good chance, consider managing them better.

Julia lace gaskets

While surfing the web I came across a neat Julia set, defined by iterating z_{n+1}=f(z_n)=z_n^2+c/z_n^3 for some complex constant c. Here are some typical pictures, and two animations: one moving around a circle in the c-plane, one moving slowly down from c=1 to c=0.


The points behind the set

What is going on?

The first step in analysing fractals like this is to find the fixed points and their preimages. z=\infty is clearly mapped to itself. The z^2 term will tend to make large magnitude iterates approach infinity, so it is an attractive fixed point.

z=0 is a preimage of infinity: iterates falling on zero will be mapped onto infinity. Nearby points will also end up attracted to infinity, so we have a basin of attraction to infinity around the origin. Preimages of the origin will be mapped to infinity in two steps: 0=z^2+c/z^3 has the solutions z=(-c)^{1/5} – this is where the pentagonal symmetry comes from, since these five points are symmetric. Their preimages and so on will also be mapped to infinity, so we have a hierarchy of basins of attraction sending points away forming some gasket-like structure. The Julia set consists of the points that never gets mapped away, the boundary of this hierarchy of basins.

The other fixed points are defined by z=z^2+c/z^3, which can be rearranged into z^5-z^4+c=0. They don’t have any neat expression and actually do not affect the big picture dynamics as much. The main reason seems to be that they are unstable. However, their location and the derivative close to them affect the shapes in the Julia set as we will see. Their preimages will be surrounded by the same structures (scaled and rotated) as they have.

Below are examples with preimages of zero marked as white circles, fixed points as red crosses, and critical points as black squares.


The set behind the points

A simple way of mapping the dynamics is to look at the (generalized) Mandelbrot set for the function, taking a suitable starting point z_0=(3c/2)^{1/5} and mapping out its fate in the c-plane. Why that particular point? Because it is one of the critical point where f'(z)=0, and a theorem by Julia and Fatou tells us that its fate indicates whether the Julia set is filled or dust-like: bounded orbits of the critical points of a map imply a connected Julia set. When c is in the Mandelbrot set the Julia image has “thick” regions with finite area that do not escape to infinity. When c is outside, then most points end up at infinity, and what remains is either dust or a thin gasket with no area.


The set is much smaller than the vanilla f(z)=z^2+c Mandelbrot, with a cuspy main body surrounded by a net reminiscent of the gaskets in the Julia set. It also has satellite vanilla Mandelbrots, which is not surprising at all: the square term tends to dominate in many locations. As one zooms into the region near the origin a long spar covered in Mandelbrot sets runs towards the origin, surrounded by lacework.


One surprising thing is that the spar does not reach the origin – it stops at c=4.5 \cdot 10^{-5}. Looking at the dynamics, above this point the iterates of the critical point jump around in the interval [0,1], forming a typical Feigenbaum cascade of period doubling as you go out along the spar (just like on the spar of the vanilla Mandelbrot set). But at this location points now are mapped outside the interval, running off to infinity: one of the critical points breaches a basin boundary, causing iterates to run off and the earlier separate basins to merge. Below this point the dynamics is almost completely dominated by the squaring, turning the Julia set into a product of a Cantor set and a circle (a bit wobbly for higher c; it is all very similar to KAM torii). The empty spaces correspond to the regions where preimages of zero throw points to infinity, while along the wobbly circles points get their argument angles multiplied by two for every iteration by the dominant quadratic term: they are basically shift maps. For c=0 it is just the filled unit disk.


So when we allow c to move around a circle as in the animations, the part that passes through the Mandelbrot set has thick regions that thin as we approach the edge of the set. Since the edge is highly convoluted the passage can be quite complex (especially if the circle is “tangent” to it) and the regions undergo complex twisting and implosions/explosions. During the rest of the orbit the preimages just quietly rotate, forming a fractal gasket. A gasket that sometimes looks like a model of the hyperbolic plane, since each preimage has five other preimages, naturally forming an exponential hierarchy that has to be squeezed into a finite roughly circular space.

A prime minimal surface

A short while ago I mentioned lacunary functions, complex functions that are analytic inside a disc but cannot be continued outside it. Then I remembered the wonderful Weierstrass-Enneper representation formula, which assigns a minimal surface to (nearly) any pair of complex functions. What happens when you make a minimal surface from a lacunary function?

I was not first with this idea. In fact, F.F. de Brito used this back in 1992 to demonstrate that there exist complete embedded minimal surfaces in 3-space that are contained between two planes.


Here is the surface defined by the function g(z)=\sum_{p \mathrm{is prime}} z^p, the Taylor series that only includes all prime powers, combined with f(z)=1.


Close to zero, the surface is flat. Away from zero it begins to wobble as increasingly high powers in the series begin to dominate. It behaves very much like a higher-degree Enneper surface, but with a wobble that is composed of smaller wobbles. It is cool to consider that this apparently irregular pattern corresponds to the apparently irregular pattern of all primes.

Some math for Epiphany

Analytic functions helping you out

Recently I chatted with a mathematician friend about generating functions in combinatorics. Normally they are treated as a neat symbolic trick: you have a sequence a_n (typically how many there are of some kind of object of size n), you formally define a function f(z)=\sum_{n=0}^\infty a_n z^n, you derive some constraints on the function, and from this you get a formula for the a_n or other useful data. Convergence does not matter, since this is purely symbolic. We used this in our paper counting tie knots. It is a delightful way of solving recurrence relations or bundle up moments of probability distributions.

I innocently wondered if the function (especially its zeroes and poles) held any interesting information. My friend told me that there was analytic combinatorics: you can actually take f(z) seriously as a (complex) function and use the powerful machinery of complex analysis to calculate asymptotic behavior for the a_n from the location and type of the “dominant” singularities. He pointed me at the excellent course notes from a course at Princeton linked to the textbook by Philippe Flajolet and Robert Sedgewick. They show a procedure for taking combinatorial objects, converting them symbolically into generating functions, and then get their asymptotic behavior from the properties of the functions. This is extraordinarily neat, both in terms of efficiency and in linking different branches of math.

Plot of z/(1-z-z^2), the generating function of the Fibonacci numbers. It has poles at (1+sqrt(5))/2 (the dominant pole giving the overall asymptotic growth of Fibonacci numbers) and (1-sqrt(5))/2, which does not contribute much to the asymptotic behavior.
Plot of z/(1-z-z^2), the generating function of the Fibonacci numbers. It has poles at (1+sqrt(5))/2 (the dominant pole giving the overall asymptotic growth of Fibonacci numbers) and (1-sqrt(5))/2, which does not contribute much to the asymptotic behavior.

In our case, one can show nearly by inspection that the number of Fink-Mao tie knots grow with the number of moves as \sim 2^n, while single tuck tie knots grow as \sim \sqrt{6}^n.

Analytic functions behaving badly

The second piece of math I found this weekend was about random Taylor series and lacunary functions.

If f(z)=\sum_{n=0}^\infty X_n z^n where X_n are independent random numbers, what kind of functions do we get? Trying it with complex Gaussian X_n produces a disk of convergence with some nondescript function on the inside.

Plot of function with a Gaussian Taylor series. Color corresponds to stereographic mapping of the complex plane to a sphere, with infinity being white and zeros black. The domain of convergence is the unit circle.
Plot of function with a Gaussian Taylor series. Color corresponds to stereographic mapping of the complex plane to a sphere, with infinity being white and zeros black. The domain of convergence is the unit circle.

Replacing the complex Gaussian with a real one, or uniform random numbers, or even power-law numbers gives the same behavior. They all seem to have radius 1. This is not just a vanilla disk of convergence (where an analytic function reaches a pole or singularity somewhere on the boundary but is otherwise fine and continuable), but a natural boundary – that is, a boundary so dense with poles or singularities that continuation beyond it is not possible at all.

The locus classicus about random Taylor series is apparently Kahane, J.-P. (1985), Some Random Series of Functions. 2nd ed., Cambridge University Press, Cambridge.

A naive handwave argument is that for |z|<1 we have an exponentially decaying sequence of z^n, so if the X_n have some finite average size E(X) and not too divergent variance we should expect convergence, while outside the unit circle any nonzero E(X) will allow it to diverge. We can even invoke the Markov inequality P(X>t) \leq E(X)/t to argue that a series \sum X_n f(n) would converge if \sum f(n)/n converges. However, this is not correct enough for proper mathematics. One entirely possible Gaussian outcome is X_n=1,10,100,1000,\ldots or worse. We need to speak of probabilistic convergence.

Andrés E. Caicedo has a good post about how to approach it properly. The “trick” is the awesome Kolmogorov zero-one law that implies that since the radius of convergence depends on the entire series X_n rather than any finite subset (and they are all independent) it will be a constant.

This kind of natural boundary disk of convergence may look odd to beginning students of complex analysis: after all, none of the functions we normally encounter behave like this. Except that this is of course selection bias. If you look at the example series for lacunary functions they all look like fairly reasonable sparse Taylor series like $z+z^4+z^8+z^16+^32+\lddots$. In calculus we are used to worrying that the coefficients in front of the z-terms of a series don’t diminish fast enough: having fewer nonzero terms seems entirely innocuous. But as Hadamard showed, it is enough that the size of the gaps grow geometrically for the function to get a natural boundary (in fact, even denser series do this – for example having just prime powers). The same is true for Fourier series. Weierstrass’ famous continuous but nowhere differentiable function is lacunary (in his 1880 paper on analytic continuation he gives the example \sum a_n z^{b^n} of an uncontinuable function). In fact, as Emile Borel found and Steinhardt eventually proved in a stricter sense, in general (“almost surely”) a Taylor series isn’t continuable because of boundaries.

The function sum_p z^p, where p runs over the primes.
The function [latex]sum_p z^p[/latex], where [latex]p[/latex] runs over the primes.

One could of course try to combine the analytic combinatorics with the lacunary stuff. In a sense a lacunary generating function is a worst case scenario for the singularity-measuring methods used in analytical combinatorics since you get an infinite number of them at a finite and equal distance, and now have to average them together somehow. Intuitively this case seems to correspond to counting something that becomes rarer at a geometric rate or faster. But the Borel-Steinhardt results suggest that even objects that do not become rare could have nasty natural boundaries – if the number a_n were due to something close enough to random we should expect estimating asymptotics to be hard. The funniest example I can think of is the number of roots of Chaitin-style Diophantine equations where for each n it is an independent arithmetic fact whether there are any: this is hardcore random, and presumably the exact asymptotic growth rate will be uncomputable both practically and theoretically.

Did amphetamines help Erdős?

During my work on the Paris talk I began to wonder whether Paul Erdős (who I used as an example of a respected academic who used cognitive enhancers) could actually have been shown to have benefited from his amphetamine use, which began in 1971 according to Hill (2004). One way of investigating is his publication record: how many papers did he produce per year before or after 1971? Here is a plot, based on Jerrold Grossman’s 2010 bibliography:

Productivity of Paul Erdos over his life. Green dashed line: amphetamine use, red dashed line: death. Crosses mark named concepts.
Productivity of Paul Erdos over his life. Green dashed line: amphetamine use, red dashed line: death. Crosses mark named concepts.

The green dashed line is the start of amphetamine use, and the red dashed life is the date of death. Yes, there is a fairly significant posthumous tail: old mathematicians never die, they just asymptote towards zero. Overall, the later part is more productive per year than the early part (before 1971 the mean and standard deviation was 14.6±7.5, after 24.4±16.1; a Kruskal-Wallis test rejects that they are the same distribution, p=2.2e-10).

This does not prove anything. After all, his academic network was growing and he moved from topic to topic, so we cannot prove any causal effect of the amphetamine: for all we know, it might have been holding him back.

One possible argument might be that he did not do his best work on amphetamine. To check this, I took the Wikipedia article that lists things named after Erdős, and tried to find years for the discovery/conjecture. These are marked with red crosses in the diagram, slightly jittered. We can see a few clusters that may correspond to creative periods: one in 35-41, one in 46-51, one in 56-60. After 1970 the distribution was more even and sparse. 76% of the most famous results were done before 1971; given that this is 60% of the entire career it does not look that unlikely to be due to chance (a binomial test gives p=0.06).

Again this does not prove anything. Maybe mathematics really is a young man’s game, and we should expect key results early. There may also have been more time to recognize and name results from the earlier career.

In the end, this is merely a statistical anecdote. It does show that one can be a productive, well-renowned (if eccentric) academic while on enhancers for a long time. But given the N=1, firm conclusions or advice are hard to draw.

Erdős’s friends worried about his drug use, and in 1979 Graham bet Erdős $500 that he couldn’t stop taking amphetamines for a month. Erdős accepted, and went cold turkey for a complete month. Erdős’s comment at the end of the month was “You’ve showed me I’m not an addict. But I didn’t get any work done. I’d get up in the morning and stare at a blank piece of paper. I’d have no ideas, just like an ordinary person. You’ve set mathematics back a month.” He then immediately started taking amphetamines again. (Hill 2004)

Packing my circles

One of the first fractals I ever saw was the Apollonian gasket, the shape that emerges if you draw the circle internally tangent to three other tangent circles. It is somewhat similar to the Sierpinski triangle, but has a more organic flair. I can still remember opening my copy of Mandelbrot’s The Fractal Geometry of Nature and encountering this amazing shape. There is a lot of interesting things going on here.

Here is a simple algorithm for generating related circle packings, trading recursion for flexibility:

  1. Start with a domain and calculate the distance to the border for all interior points.
  2. Place a circle of radius \alpha d^* at the point with maximal distance d^*=\max d(x,y) from the border.
  3. Recalculate the distances, treating the new circle as a part of the border.
  4. Repeat (2-3) until the radius becomes smaller than some tolerance.

This is easily implemented in Matlab if we discretize the domain and use an array of distances d(x,y), which is then updated d(x,y) \leftarrow \min(d(x,y), D(x,y)) where D(x,y) is the distance to the circle. This trades exactness for some discretization error, but it can easily handle nearly arbitrary shapes.

Apollonian circle packing in square.
Apollonian circle packing in square.
Apollonian circle packing in blob.
Apollonian circle packing in blob.
Apollonian circle packing in heart.
Apollonian circle packing in heart.

It is interesting to note that the topology is Apollonian nearly everywhere: as soon as three circles form a curvilinear triangle the interior will be a standard gasket if \alpha=1.

Number of circles larger than a certain radius in packing in blob shape.
Number of circles larger than a certain radius in packing in blob shape.

In the above pictures the first circle tends to dominate. In fact, the size distribution of circles is a power law: the number of circles larger than r grows as N(r)\propto r^-\delta as we approach zero, with \delta \approx 1.3. This is unsurprising: given a generic curved triangle, the inscribed circle will be a fraction of the radii of the bordering circles. If one looks at integral circle packings it is possible to see that the curvatures of subsequent circles grow quadratically along each “horn”, but different “horns” have different growths. Because of the curvature the self-similarity is nontrivial: there is actually, as far as I know, still no analytic expression of the fractal dimension of the gasket. Still, one can show that the packing exponent \delta is the Hausdorff dimension of the gasket.

Anyway, to make the first circle less dominant we can either place a non-optimal circle somewhere, or use lower \alpha.

Apollonian packing in square with central circle of radius 1/6.
Apollonian packing in square with central circle of radius 1/6.

If we place a circle in the centre of a square with a radius smaller than the distance to the edge, it gets surrounded by larger circles.

Randomly started Apollonian packing.
Randomly started Apollonian packing.

If the circle is misaligned, it is no problem for the tiling: any discrepancy can be filled with sufficiently small circles. There is however room for arbitrariness: when a bow-tie-shaped region shows up there are often two possible ways of placing a maximal circle in it, and whichever gets selected breaks the symmetry, typically producing more arbitrary bow-ties. For “neat” arrangements with the right relationships between circle curvatures and positions this does not happen (they have circle chains corresponding to various integer curvature relationships), but the generic case is a mess. If we move the seed circle around, the rest of the arrangement both show random jitter and occasional large-scale reorganizations.

When we let \alpha<1 we get sponge-like fractals: these are relatives to the Menger sponge and the Cantor set. The domain gets an infinity of circles punched out of itself, with a total area approaching the area of the domain, so the total measure goes to zero.

Apollonian packing with alpha=0.5.
Apollonian packing with alpha=0.5.

That these images have an organic look is not surprising. Vascular systems likely grow by finding the locations furthest away from existing vascularization, then filling in the gaps recursively (OK, things are a bit more complex).

Apollonian packing with alpha=1/4.
Apollonian packing with alpha=1/4.
Apollonian packing with alpha=0.1.
Apollonian packing with alpha=0.1.

How small is the wiki?

Recently I encountered a specialist Wiki. I pressed “random page” a few times, and got a repeat page after 5 tries. How many pages should I expect this small wiki to have?

We can compare this to the German tank problem. Note that it is different; in the tank problem we have a maximum sample (maybe like the web pages on the site were numbered), while here we have number of samples before repetition.

We can of course use Bayes theorem for this. If I get a repeat after k random samples, the posterior distribution of N, the number of pages, is P(N|k) = P(k|N)P(N)/P(k).

If I randomly sample from N pages, the probability of getting a repeat on my second try is 1/N, on my third try 2/N, and so on: P(k|N)=(k-1)/N. Of course, there has to be more pages than k-1, otherwise a repeat must have happened before step k, so this is valid for k \leq N+1. Otherwise, P(k|N)=0 for k>N+1.

The prior P(N) needs to be decided. One approach is to assume that websites have a power-law distributed number of pages. The majority are tiny, and then there are huge ones like Wikipedia; the exponent is close to 1. This gives us P(N) = N^{-\alpha}/\zeta(\alpha). Note the appearance of the Riemann zeta function as a normalisation factor.

We can calculate P(k) by summing over the different possible N: P(k)=\sum_{N=1}^\infty P(k|N)P(N) = \frac{k-1}{\zeta(\alpha)}\sum_{N=k-1}^\infty N^{-(\alpha+1)} =\frac{k-1}{\zeta(\alpha)}(\zeta(\alpha+1)-\sum_{i=1}^{k-2}i^{-(\alpha+1)}).

Putting it all together we get P(N|k)=N^{-(\alpha+1)}/(\zeta(\alpha+1) -\sum_{i=1}^{k-2}i^{-(\alpha+1)}) for N\geq k-1. The posterior distribution of number of pages is another power-law. Note that the dependency on k is rather subtle: it is in the support of the distribution, and the upper limit of the partial sum.

What about the expected number of pages in the wiki? E(N|k)=\sum_{N=1}^\infty N P(N|k) = \sum_{N=k-1}^\infty N^{-\alpha}/(\zeta(\alpha+1) -\sum_{i=1}^{k-2}i^{-(\alpha+1)}) =\frac{\zeta(\alpha)-\sum_{i=1}^{k-2} i^{-\alpha}}{\zeta(\alpha+1)-\sum_{i=1}^{k-2}i^{-(\alpha+1)}}. The expectation is the ratio of the zeta functions of \alpha and \alpha+1, minus the first k-2 terms of their series.

Distribution of P(N|5) for alpha=1.1.
Distribution of P(N|5) for [latex]\alpha=1.1[/latex].

So, what does this tell us about the wiki I started with? Assuming \alpha=1.1 (close to the behavior of big websites), it predicts E(N|k)\approx 21.28. If one assumes a higher \alpha=2 the number of pages would be 7 (which was close to the size of the wiki when I looked at it last night – it has grown enough today for k to equal 13 when I tried it today).

Expected number of pages given k random views before a repeat.
Expected number of pages given k random views before a repeat.

So, can we derive a useful rule of thumb for the expected number of pages? Dividing by k shows that E(N|k) approaches proportionality, especially for larger \alpha:

<img src='' alt='E(N|k)/k' title='E(N|k)/k' class='latex' /> as a function of k.
E(N|k)/k as a function of k.

So a good rule of thumb is that if you get k pages before a repeat, expect between 2k and 4k pages on the site. However, remember that we are dealing with power-laws, so the variance can be surprisingly high.


Tying up loose ties

Eldredge tie knotOur paper on tie knot classification is finally officially published: Hirsch D, Markström I, Patterson ML, Sandberg A, Vejdemo-Johansson M.(2015) More ties than we thought. PeerJ Computer Science 1:e2.

Besides the paper and its supplementary code, there is also a random tie knot generator and a tutorial by Mikael about how to read the notation.

The classification of tie knots is not in itself important, but having a nice notation helps for specifying how to tie them. And the links to languages and finite state machines are cool. The big research challenge is understanding how knot façades are to be modelled and judged.

Continued integrals

Many of the most awesome formulas you meet when getting into mathematics are continued fractions like

\Phi = 1+\frac{1}{1+\frac{1}{1+\frac{1}{\ldots}}}

and nested radicals like

2 = \sqrt{2 + \sqrt{2 + \sqrt{2 + \ldots}}}.

What about nested/continued integrals? Here is a simple one:

e^x=1+x+\int x+\left(\int x+\left(\int x+\left(\ldots\right)dx\right)dx\right)dx.

The way to see this is to recognize that the x in the first integral is going to integrate to x^2/2, the x in the second will be integrated twice \int x^2/2 dx = x^3/3!, and so on.

In general additive integrals of this kind turn into sums (assuming convergence, handwave, handwave…):

I(x)=\int f(x)+\left(\int f(x)+\left(\int f(x)+\left(\ldots\right)dx\right)dx\right)dx = \sum_{n=1}^\infty \int^n f(x) dx.

On the other hand, I'(x)=f(x)+I(x).

So if we insert f_k(x)=\sin(kx) we get the sum I_k(x)=-\cos(kx)/k-\sin(kx)/k^2+\cos(kx)/k^3+\sin(x)/k^4-\cos(kx)/k^5-\ldots. For x=0 we end up with I_k(0)=\sum_{n=0}^\infty 1/k^{4n+2} - \sum_{n=0}^\infty 2/k^{4n+1}. The differential equation has solution I_k(x)=ce^x-\sin(kx)/(k^2+1) - k\cos(kx)/(k^2+1). Setting k=0 the integral is clearly zero, so c=0. Tying it together we get:

\sum_{n=0}^\infty 1/k^{4n+2}-\sum_{n=0}^\infty 1/k^{4n+1}=-k/(k^2+1).

Things are trickier when the integrals are multiplicative, like I(x)=\int x \int x \int x \ldots dx dx dx. However, we can turn it into a differential equation: I'(x)=x I(x) which has the well known solution I(x)=ce^{x^2/2}. Same thing for f_k(x)=\sin(kx), giving us I_k(x)=ce^{-\cos(kx)/k}. Since we are running indefinite integrals we get those pesky constants.

Plugging in f(x)=1/x gives I(x)=cx. If we set c=1 we get the mildly amusing and in retrospect obvious formula

x=\int \frac{\int \frac{\int \frac{\ldots}{x} dx}{x} dx}{x} dx.

We can of course mess things up further, like I(x)=\int\sqrt{\int\sqrt{\int\sqrt{\ldots} dx} dx} dx, where the differential equation becomes I'^2=I with the solution I(x)=(1/4)(c^2 + 2cx + x^2). A surprisingly simple solution to a weird-looking integral. In a similar vein:

2\cot^{-1}(e^{c-x})=\int\sin\left(\int\sin\left(\int\sin\left(\ldots\right)dx\right) dx\right) dx

-\log(c-x)=\int \exp\left(\int \exp\left(\int \exp\left(\ldots \right) dx \right) dx \right) dx

1/(c-x)=\int \left(\int \left(\int \left(\ldots \right)^2 dx \right)^2 dx \right)^2 dx

And if you want a real mind-bender, use the Lambert W function:

I(x)=\int W\left(\int W\left(\int W\left(\ldots \right) dx \right) dx \right) dx, then x=\int_1^{I(x)}1/W(t) dt + c.

(that is, you get an implicit but well defined expression for the (x,I(x)) values. With Lambert, the x and y axes always tend to switch place).

[And yes, convergence is handwavy in this essay. I think the best way of approaching it is to view the values of these integrals as the functions invariant under the functional consisting of the integral and its repeated function: whether nearby functions are attracted to it (or not) under repeated application of the functional depends on the case. ]

Gamma function surfaces

The gamma function has a long and interesting history (check out (Davis 1963) excellent review), but one application does not seem to have shown up: minimal surfaces.

A minimal surface is one where the average curvature is always zero; it bends equally in two opposite directions. This is equivalent to having the (locally) minimal area given its boundary: such surfaces are commonly seen as soap films stretched from frames. There exists a rich theory for them, linking them to complex analysis through the Enneper-Weierstrass representation: if you have a meromorphic function g and an analytic function f such that fg^2 is holomorphic, then

X(z)=\Re\left(\int_{z_0}^z f(1-g^2)/2 dz\right)
Y(z)=\Re\left(\int_{z_0}^z if(1+g^2)/2 dz\right)
Z(z)=\Re\left(\int_{z_0}^z fg dz\right)

produces a minimal surface (X(z),Y(z),Z(z)).

When plugging in the hyperbolic tangent as g and using f=1 I got a new and rather nifty surface a few years back. What about plugging in the gamma function? Let f=1, g=\Gamma(z).

We integrate from the regular point z_0=1 to different points z in the complex plane. Let us start with the simple case of \Re(z)>1/2.

Gamma function minimal surface for z in 0.5<Re(z)<3.5, -8<Im(z)<8. Color denotes Re(z).
Gamma function minimal surface for z in 0.5<Re(z)<3.5, -8<Im(z)

The surface is a billowing strip, and as we include z with larger and larger real parts the amplitude of the oscillations grow rapidly, making it self-intersect. The behaviour is somewhat similar to the Catalan minimal surface, except that we only get one period. If we go to larger imaginary parts the surface approaches a horizontal plane. OK, the surface is a plane with some wild waves, right?

Not so fast, we have not looked at the mess for Re(z)<0. First, let’s examine the area around the z=0 singularity. Since the values of the integrand blows up close to it, they produce a surface expanding towards infinity – very similar to a catenoid. Indeed, catenoid ends tend to show up where there are poles. But this one doesn’t close exactly: for re(z)<0 there is some overshoot producing a self-intersecting plane-like strip.

Gamma function minimal surface close to the z=0 singularity. Colour denotes Re(z). Integration contours from 1 to z run clockwise for Im(z)0.
Gamma function minimal surface close to the z=0 singularity. Colour denotes Re(z). Integration contours from 1 to z run clockwise for Im(z)<0 and counterclockwise for Im(z)>0.

The problem is of course the singularity: when integrating in the complex plane we need to avoid them, and depending on the direction we go around them we can get a complex phase that gives us an entirely different value of the function. In this case the branch cut corresponds to the real line: integrating clockwise or counter-clockwise around z=0 to the same z gives different values. In fact, a clockwise turn adds [3.6268i, 3.6268, 6.2832i] (which looks like \gamma\pi – a rather neat residue!) to the coordinates: a translation in the positive y-direction. If we extend the surface by going an extra turn clockwise or counterclockwise a number of times, we get copies that attach seamlessly.

Gamma minimal surface extended by integration paths between the -1 and 0 singularities (blue patches).
Gamma minimal surface extended by integration paths between the -1 and 0 singularities (blue patches).


Gamma minimal surface patch that can be repeated by translation along the y-axis. Colour denotes Re(z).
Gamma minimal surface patch that can be repeated by translation along the y-axis. Colour denotes Re(z).

OK, we have a surface with some planar strips that turn wobbly and self-intersecting in the x-direction, with elliptic catenoid ends repeating along the y-direction due to the z=0 singularity. Going down the negative x-direction things look plane between the catenoids… except of course for the catenoids due to all the other singularities for z=-1,-2,\ldots. They also introduce residues along the y-direction, but different ones from the z=0 – their extensions of the surface will be out of phase with each other, making the fully extended surface fantastically self-intersecting and confusing.

Gamma function minimal surface extended by integrating around poles.
Gamma function minimal surface extended by integrating around poles.

So, I think we have a simple answer to why the gamma function minimal surface is not well known: it is simply too messy and self-intersecting.

Of course, there may be related nifty surfaces. 1/\Gamma(z) is nicely behaved and looks very much like the Enneper surface near zero, with “wings” that oscillate ever more wildly as we move towards the negative reals. No doubt there are other beautiful things to look for in the vicinity.

Minimal surface based on 1/gamma(z).
Minimal surface based on 1/gamma(z).