# Infinite Newton

Apropos Newton’s method in the complex plane, what happens when the degree of the polynomial goes to infinity?

# Towards infinity

Obviously there will be more zeros, so there will be more attractors and we should expect the boundaries of the basins of attraction to become messier. But it is not entirely clear where the action will be, so it would be useful to compress the entire complex plane into a convenient square.

How do you depict the entire complex plane? While I have always liked the Riemann sphere here I tried mapping $x+yi$ to $[\tanh(x),\tanh(y)]$. The origin is unchanged, and infinity becomes the edges of the square $[-1,1]\times [-1,1]$. This is not a conformal map, so things will get squished near the edges.

For color, I used $(1/2)+(1/2)[\tanh(|z-1|-1), \tanh(|z+1|-1), \tanh(|z-i|-1)]$ to map complex coordinates to RGB. This makes the color depend on the distance to 1, -1 and i, making infinity white and zero some drab color (the -1 terms at the end determines the overall color range).

Here is the animated result:

What is going on? As I scale up the size of the leading term from zero, the root created by adding that term moves in from infinity towards the center, making the new basin of attraction grow. This behavior has been described in this post on dancing zeros. The zeros also tend to cluster towards the unit circle, crowding together and distributing themselves evenly. That distribution is the reason for the the colorful “flowers” that surround white spots (poles of the Newton formula, corresponding to zeros of the derivative of the polynomial). The central blob is just the attractor of the most “solid” zero, corresponding to the linear and constant terms of the polynomial.

The jostling is amusing: it looks like the roots do repel each other. This is presumably because close roots require a sharp turn of the function, but the “turning radius” is set by the coefficients that tend to be of order unity. Getting degenerate roots requires coefficients to be in a set of measure zero, so it is rare. Near-degenerate roots exist in a positive measure set surrounding that set, but it is still “small” compared to the general case.

# At infinity

So what happens if we let the degree go to infinity? As I previously mentioned, the generic behaviour of $\sum_{n=1}^\infty a_n z^n$ where $a_n$ is independent random numbers is a lacunary function. So we should not expect anything outside the unit circle. Inside the circle there will be poles, so there will be copies of the undefined outside region (because of Great Picard’s Theorem (meromorphic version)). Since the function is analytic these copies will be conformal mappings of the exterior and hence roughly circular. There will also be zeros, and these will have their own basins of attraction. A few of the central ones dominate, but there is an infinite number of attractors as we approach the circular border which is crammed with poles and zeros.

Since we now know we will only deal with the unit disk, we can avoid transforming the entire plane and enjoy the results:

What happens here is that the white regions represents places where points get mapped onto the undefined outside, while the colored fractal regions are the attraction basins for the zeros. And between them there is a truly wild boundary. In the vanilla $z^3+1$ fractal every point on the boundary is a meeting point of the three basins, a tri-point. Here there is an infinite number of attractors: the boundary consists of points where an infinite number of different attractors meet.

# Uriel’s stacking problem

In Scott Alexander’s kabbalistic sf story Unsong, the archangel Uriel works on a problem while other things are going on in heaven:

All the angels listened in rapt attention except Uriel, who was sort of half-paying attention while trying to balance several twelve-dimensional shapes on top of each other.

There was utter silence throughout the halls of Heaven, except a brief curse as Uriel’s hyperdimensional tower collapsed on itself and he picked up the pieces to try to rebuild it.

A great clamor arose from all the heavenly hosts, save Uriel, who took advantage of the brief lapse to conjure a parchment and pen and start working on a proof about the optimal configuration of twelve-dimensional shapes.

## A polytope on a plane

This got me thinking about the stability of stacking polytopes. That seemed complicated (I am no archangel) so I started toying with the stability of polytopes on a flat surface.

(Terminology note: I will consistently use “face” to denote the D-1 dimensional elements that bound the polytope, although “facet” is in some use.)

A face of a 3D polyhedron is stable if the polyhedron can rest on it without tipping over. This means that the projection of the center of mass onto the plane containing the face is inside the polygon. The platonic polyhedra are stable on all faces, but it is not hard to make a few faces unstable by moving a vertex far away from the center. A polyhedron has at least one stable face (if it did not, it would be a perpetual motion device: every tip will move the center of mass downwards, but there is a bound on how low it can go. A uni-stable or monostatic polyhedron has  just one stable face. It is an unsolved problem what the simplest uni-stable 3D polyhedron is, with the current record 14 faces. Also, it seems unclear whether there are monostatic simplices in dimension 9 (they exist in 10 or more dimensions, but not in 8 or fewer).

So, how many faces of a polytope will typically be unstable?

I wrote a Matlab script to generate random convex polytopes by selecting N points randomly on the surface of a D-dimensional sphere and calculating their convex hull. Using a Delaunay decomposition I can split them into simplices, which allow me to calculate the center of mass. The center of mass of a simplex is just the average of the corners $\vec{x^c_j}=\sum_{i=1}^N \vec{x}_{ij}$, and the center of mass of the polyhedron is just the sum of the simplex centers of mass weighted by their volumes: $\vec{x^p} = \sum_{j=1} V_j \vec{x^c_j}$. The volume of a simplex is $V_j=(1/D!)\mathrm{det}(X_j)$ where $X_j=[x_{1j};x_{2j};\ldots;x_{Dj}]$, the matrix made by sticking together the coordinate vectors of a simplex. Once we know this we can project the center of mass onto the plane of a face by finding its nullspace (the higher dimensional counterpart to a normal) $\vec{p}= \vec{x^p} -(\vec{x^p}\cdot \vec{n})\vec{n}$. Finally, to check whether the projection is inside the face, we can look at the matrix A where each column is the coordinates of one of the faces minus $\vec{p}$ and the final row just ones, and solve for Ax=b where b is zero except for a one in the last row (I found this neat algorithm due to elisbben on stack overflow).  If the answer vector is all positive, then the point is inside the face. Repeat for all the faces.

Whew. This math is of course really simple to do in Matlab.

The 12 dimensional case is a bit messier:

So, what is the average fraction of stable faces on a 3D polyhedron?

It tends to converge to 50%. Doing this in higher dimensions shows the same kind of convergence, although to lower fractions.

It looks like the fraction of stable faces declines exponentially with dimensionality.

Does this mean that for a sufficiently high dimension it is likely that a random polytope is unistable? The answer is no: the number of faces increases pretty exponentially (as $2^{1.7680D}$), but the number of stable faces also increases exponentially with D (as $latex 2^{0.9273 D}$).

This was based on runs with N=100. Obviously things go much faster if you select a lower N, such as 30. However, as you approach N=D the polytopes become more and more simplex-like, and simplices tend to both have fewer faces and be less stable in high dimensions, so the exponential growth stops. This actually happens far below D; for N=30 the effect is felt already in 11 dimensions. The face growth rates were also lower, with coefficients 1.1621 and 0.4730.

(There are some asymptotic formulas known for the growth of the number of faces for random convex hulls; they grow linearly with N but at an accelerating rate with D.)

Stuart Armstrong gave me a very heuristic argument for why there would be so many unstable faces. Consider building up the polytope vertex by vertex, essentially just adding together the simplices from the Delaunay decomposition. If you start from a stable state, eventually you will likely end up with an unstable face. Adding the next vertex will add a simplex to the polyhedron, and the center of mass will move in the direction of the new simplex. To have the face become stable again the shift in center of mass needs to be large enough along the directions parallel to the face to bring the projection back inside the face. But in high dimensional spaces there are many directions you can move in: the probability of a random vector being nearly parallel to another vector is very low. Hence, the next step and the following are likely to preserve the instability. So high dimensional polytopes are likely to have many unstable faces even if they are nicely inscribed in spheres.

The number of steps the polytope rolls over  until finding a stable face is also limited: the “drainage basin” of a stable face is a tree, with a branching degree set by D-1 (if faces are D-simplexes). So the number of steps will scale as $\log_{D-1}(2^{(1.7680 - 0.9273)D})=0.8407 D \ln(2) / \ln(D-1) \propto D/\ln(D)$. Even high-dimensional polytopes will stop flipping quickly in general. (A unistable polytope on the other hand can run through at least half of its faces, so there are some very slow ones too).

The expected minimum distance between two points on this kind of random polytope scales as $N^{-2/D}$ (if they were optimally distributed it would be $N^{-1/D}$). At the same time, if N is relatively small compared to D (the polytope is simplex-like), the average diameter (the longest edge) of each face seems to approach $\sqrt{\pi}$. Why? I think this is because  $\Gamma(1/2)$, the mean of a flipped k=2 Weibull distribution that shows up because of extreme value theory. Meanwhile the average and median cord length between random points on hyperspheres tends towards $\sqrt{2}$. Faces hence tends to be fairly wide unless N is large compared to D, but there will typically always be a few very narrow ones that are tricky to balance on.

## Stacking no-slip polytopes

If you put a polytope on top of another one (assuming no slipping) at first it seems you need to use a stable face of the top polytope, but this is not enough nor necessary.

Since the underlying face is likely tilted from the horizontal, the vertical projection of the center of mass has to be within the top face. The upper polytope can be rotated, moving the projection point. The tilt angle $\theta$ (or rather, tilt angles – we are doing this in higher dimensions, remember?) generates a hypersphere of radius $d \tan(\theta)$ around the normal projection point (which is at distance d from the center of mass) where the vertical projection can intersect the face. Only parts of the hypersphere surface that are inside the face represent orientations that are stable. Even an unstable face can (sometimes) be stabilized if you turn it so that the tilted projection is inside, but for sufficiently high angles the hypersphere will be bigger than the face and it cannot be stable.

Having the top polytope stay in place is the first requirement. The second is that the bottom polytope should not become unstable. The new center of mass is moved to a point somewhere along the connecting line between the individual centers of mass of the polytopes, with exact position dependent on their volume ratio (note that turning the top polytope can move the center of mass too). This moves the projection point along the plane of the bottom face, and if it gets outside that face the assembly will tip over.

One can imagine this as adding random (D-1)-dimensional vectors of length 1/N until they reach the edge of the face. I am a bit uncertain about the properties of such random walks (all works on decreasing step size walks I have seen have been in 1D). The harmonic random walk in 1D apparently converges with probability 1, so I think the (D-1)-dimensional one also does it since the distance from the origin to the walker will be smaller than if the walker just kept to a 1D line. Since the expected distance traversed in 1D is $latex E[|X|] \approx 1.0761$ this is actually not a very extreme  shift. Given the surprisingly large diameters of the faces if $N \ll D$ the first condition might be tougher to meet than the second, but this is just a guess.

The no slipping constraint is important. If the polytopes are frictionless, then any transverse force will move them. Hence only polytopes that have some parallel top and bottom stable faces can be stacked, and the problem becomes simpler. There are still surprises there, though: even stacks of rectangular blocks can do surprising things. The block stacking problem also demonstrates that one can have 1/N overhangs (counting downwards), enabling arbitrarily large total overhangs without tipping over. With polytopes with shapes that act as counterweights the overhangs can be even larger.

## Uriel’s stacking problems

This leads to what we might call “Uriel’s stacking problem”: given a collection of no-slip convex D-dimensional polytopes, what is the tallest tower that can be constructed from them?

I suspect that this problem is NP-hard. It sounds very much like a knapsack problem, but there is a dependency on previous steps when you add a new polytope that seem to make it harder. It seems that it would not be too difficult to fool a greedy algorithm just trying to put the next polytope on the most topmost face into adding one that makes subsequent steps too unstable, forcing backtracking.

Another related problem: if the polytopes are random convex hulls of N points, what is the distribution of maximum tower heights? What if we just try random stacking?

And finally, what is the maximum overhang that can be done by stacking polytopes from a given set?

# Simple instability

As a side effect of a chat about dynamical systems models of metabolic syndrome, I came up with the following nice little toy model showing two kinds of instability: instability because of insufficient dampening, and instability because of too slow dampening.

$x'(t) = Ax(t)/N - px(t-\tau) -x(t)^3$

Where $x$ is a N-dimensional vector, A is a $N \times N$ matrix with Gaussian random numbers, and $p \geq 0, \tau \geq 0$ constants. The last term should strictly speaking be written as $||x(t)||^2 x(t)$ but I am lazy.

The first term causes chaos, as we will see below. The 1/N factor is just to compensate for the N terms. The middle term represents dampening trying to force the system to the origin, but acting with a delay $\tau$. The final term keeps the dynamics bounded: as $||x||$ becomes large this term will dominate and bring back the trajectory to the vicinity of the origin. However, it is a soft spring that has little effect close to the origin.

## Chaos

Let us consider the obvious fixed point $x=0$. Is it stable? If we calculate the Jacobian matrix there it becomes $J = A/N - pI$. First, consider the case where $p=0$. The eigenvalues of J will be the ones of a random Gaussian matrix with no symmetry conditions. If it had been symmetric, then Wigner’s semicircle rule implies that they would tend to be distributed as $P(\lambda)=(2/\pi)\sqrt{1-\lambda^2}$ as $N \rightarrow \infty$. However, it turns out that this is true for the non-symmetric Gaussian case too. (and might be true for any i.i.d. random numbers). This means that about half of them will have a positive real part, and that implies that the fixed point is unstable: for $p=0$ the system will be orbiting the origin in some fashion, and generically this means a chaotic attractor.

## Stability

If $p$ grows the diagonal elements of J will become more and more negative. If they are really negative then we essentially have a matrix with a negative diagonal and some tiny off-diagonal terms: the eigenvalues will almost be the diagonal ones, and they are all negative. The origin is a stable attractive fixed point in this limit.

In between, if we plot the eigenvalues as a function of $p$, we see that the semicircle just linearly moves towards the negative side and when all of it passes over, we shift from the chaotic dynamics to the fixed point. Exactly when this happens depends on the particular A we are looking at and its largest eigenvalue (which is distributed as the Tracy-Widom distribution), but it is generally pretty sharp for large N.

## Delay

But what if $\tau$ becomes large? In this case the force moving the trajectory towards the origin will no longer be based on where it is right now, but on where it was $\tau$ seconds earlier. If $p$ is small, then this is just minor noise/bias (and the dynamics is chaotic anyway). If it is large, then the trajectory will be pushed in some essentially random direction: we get instability again.

A (very slightly) more stringent way of thinking of it is to plug in $x_j(t)=c_j e^{i\lambda_j t}$ into the equation. To simplify, let’s throw away the cubic term since we want to look at behavior close to zero, and let’s use a coordinate system where the matrix is a diagonal matrix $\Lambda$. Then for $p=0$ we get $\lambda_j = \Lambda_j$, that is, the origin is a fixed point that repels or attracts trajectories depending on its eigenvalues (and we know from above that we can be pretty confident some are positive, so it is unstable overall). For $p>0$ we get $\lambda_j + pe^{-i\lambda_j \tau} = \Lambda_j$. Taylor expansion to the first order and rearranging gives us $\lambda_j \approx(\Lambda_j - p)/(1 - i p \tau)$. The  numerator means that as $p$ grows, each eigenvalue will eventually get a negative real part: that particular direction of dynamics becomes stable and attracted to the origin. But the denominator can sabotage this: it $p \tau$ gets large enough it can move the eigenvalue anywhere, causing instability.

So there you are: if you try to keep a system stable, make sure the force used is up to the task so the inherent recalcitrance cannot overwhelm it, and make sure the direction actually corresponds to the current state of the system.

# Dancing zeros

Playing with Matlab, I plotted the location of the zeros of a polynomial with normally distributed coefficients in the complex plane. It was nearly a circle:

This did not surprise me that much, since I have already toyed with the distribution of zeros of polynomials with coefficients in {-1,0,+1}, producing some neat distributions close to the unit circle (see also John Baez). A quick google found (Hughes & Nikeghbali 2008): under very general circumstances polynomial zeros tend towards the unit circle. One can heuristically motivate it in a lot of ways.

As you add more and more terms to the polynomial the zeros approach the unit circle. Each new term perturbs them a bit: at first they move around a lot as the degree goes up, but they soon stabilize into robust positions (“young” zeros move more than “old” zeros). This seems to be true regardless of whether the coefficients set in “little-endian” or “big-endian” fashion.

But then I decided to move things around: what if the coefficient on the leading term changed? How would the zeros move? I looked at the polynomial $P_\theta(z)=e^{i\theta} z^n + c_{n-1}z^{n-1}+\ldots+c_1 z + c_0$ where $c_i$ were from some suitable random sequence and $\theta$ could run around $[0,2\pi]$. Since the leading coefficient would start and end up back at 1, I knew all zeros would return to their starting position. But in between, would they jump around discontinuously or follow orderly paths?

Continuity is actually guaranteed, as shown by (Harris & Martin 1987). As you change the coefficients continuously, the zeros vary continuously too. In fact, for polynomials without multiple zeros, the zeros vary analytically with the coefficients.

As $\theta$ runs from 0 to $2\pi$ the roots move along different orbits. Some end up permuted with each other.

For low degrees, most zeros participate in a large cycle. Then more and more zeros emerge inside the unit circle and stay mostly fixed as the polynomial changes. As the degree increases they congregate towards the unit circle, while at least one large cycle wraps most of them, often making snaking detours into the zeros near the unit circle and then broad bows outside it.

In the above example, there is a 21-cycle, as well as a 2-cycle around 2 o’clock. The other zeros stay mostly put.

The real question is what determines the cycles? To understand that, we need to change not just the argument but the magnitude of $c_n$.

What happens if we slowly increase the magnitude of the leading term, letting $c_n = re^{i\theta}$ for a r that increases from zero? It turns out that a new zero of the function zooms in from infinity towards the unit circle. A way of seeing this is to look at the polynomial as $P_n(z) = c_n z^n + P_{n-1}(z)$: the second term is nonzero and large in most places, so if $c_n$ is small the $z^n$ factor must be large (and opposite) to outweigh it and cause a zero. The exception is of course close to the zeros of $P_{n-1}(z)$, where the perturbation just moves them a tiny bit: there is a counterpart for each of the $n-1$ zeros of $P_{n-1}(z)$ among the zeros of $P_{n}(z)$. While the new root is approaching from outside, if we play with $\theta$ it will make a turn around the other zeros: it is alone in its orbit, which also encapsulates all the other zeros. Eventually it will start interacting with them, though.

If you instead start out with a large leading term, $|c_n| \gg |c_i|$, then the polynomial is essentially $P_n(z)=c_nz^n+[\mathrm{small stuff}]$ and the zeros the n-th roots of $-[\mathrm{small stuff}]/c_n$. All zeros belong to the same roughly circular orbit, moving together as $\theta$ makes a rotation. But as $|c_n|$ decreases the shared orbit develops bulges and dents, and some zeros pinch off from it into their own small circles. When does the pinching off happen? That corresponds to when two zeros coincide during the orbit: one continues on the big orbit, the other one settles down to be local. This is the one case where the analyticity of how they move depending on $c_n$ breaks down. They still move continuously, but there is a sharp turn in their movement direction. Eventually we end up in the small term case, with a single zero on a large radius orbit as $|c_n| \rightarrow 0$.

This pinching off scenario also suggests why it is rare to find shared orbits in general: they occur if two zeros coincide but with others in between them (e.g. if we number them along the orbit, $z_1=z_k$, with $z_2$ to $z_{k-1}$ separate). That requires a large pinch in the orbit, but since it is overall pretty convex and circle-like this is unlikely.

Allowing $|c_n|$ to run from $\infty$ to 0 and $\theta$ over $[0,2\pi]$ would cover the entire complex plane (except maybe the origin): for each z, there is some $c_n$ where $c_nz^n+\ldots+c_1z+c_0=0$. This is fairly obviously $c_n = f(z) = -P_{n-1}(z)/z^n$. This function has a central pole, surrounded by zeros corresponding to the zeros of $P_{n-1}(z)$. The orbits we have drawn above correspond to level sets $|f(z)|=\mathrm{const}$, and the pinching off to saddle points of this surface. To get a multi-zero orbit several zeros need to be close together enough to cause a broad valley.

There you have it, a rough theory of dancing zeros.

## References

Harris, G., & Martin, C. (1987). Shorter notes: The roots of a polynomial vary continuously as a function of the coefficients. Proceedings of the American Mathematical Society, 390-392.
Hughes, C. P., & Nikeghbali, A. (2008). The zeros of random polynomials cluster uniformly near the unit circle. Compositio Mathematica, 144(03), 734-746.

# A prime minimal surface

A short while ago I mentioned lacunary functions, complex functions that are analytic inside a disc but cannot be continued outside it. Then I remembered the wonderful Weierstrass-Enneper representation formula, which assigns a minimal surface to (nearly) any pair of complex functions. What happens when you make a minimal surface from a lacunary function?

I was not first with this idea. In fact, F.F. de Brito used this back in 1992 to demonstrate that there exist complete embedded minimal surfaces in 3-space that are contained between two planes.

Here is the surface defined by the function $g(z)=\sum_{p \mathrm{is prime}} z^p$, the Taylor series that only includes all prime powers, combined with $f(z)=1$.

Close to zero, the surface is flat. Away from zero it begins to wobble as increasingly high powers in the series begin to dominate. It behaves very much like a higher-degree Enneper surface, but with a wobble that is composed of smaller wobbles. It is cool to consider that this apparently irregular pattern corresponds to the apparently irregular pattern of all primes.

# Some math for Epiphany

## Analytic functions helping you out

Recently I chatted with a mathematician friend about generating functions in combinatorics. Normally they are treated as a neat symbolic trick: you have a sequence $a_n$ (typically how many there are of some kind of object of size $n$), you formally define a function $f(z)=\sum_{n=0}^\infty a_n z^n$, you derive some constraints on the function, and from this you get a formula for the $a_n$ or other useful data. Convergence does not matter, since this is purely symbolic. We used this in our paper counting tie knots. It is a delightful way of solving recurrence relations or bundle up moments of probability distributions.

I innocently wondered if the function (especially its zeroes and poles) held any interesting information. My friend told me that there was analytic combinatorics: you can actually take $f(z)$ seriously as a (complex) function and use the powerful machinery of complex analysis to calculate asymptotic behavior for the $a_n$ from the location and type of the “dominant” singularities. He pointed me at the excellent course notes from a course at Princeton linked to the textbook by Philippe Flajolet and Robert Sedgewick. They show a procedure for taking combinatorial objects, converting them symbolically into generating functions, and then get their asymptotic behavior from the properties of the functions. This is extraordinarily neat, both in terms of efficiency and in linking different branches of math.

In our case, one can show nearly by inspection that the number of Fink-Mao tie knots grow with the number of moves as $\sim 2^n$, while single tuck tie knots grow as $\sim \sqrt{6}^n$.

The second piece of math I found this weekend was about random Taylor series and lacunary functions.

If $f(z)=\sum_{n=0}^\infty X_n z^n$ where $X_n$ are independent random numbers, what kind of functions do we get? Trying it with complex Gaussian $X_n$ produces a disk of convergence with some nondescript function on the inside.

Replacing the complex Gaussian with a real one, or uniform random numbers, or even power-law numbers gives the same behavior. They all seem to have radius 1. This is not just a vanilla disk of convergence (where an analytic function reaches a pole or singularity somewhere on the boundary but is otherwise fine and continuable), but a natural boundary – that is, a boundary so dense with poles or singularities that continuation beyond it is not possible at all.

The locus classicus about random Taylor series is apparently Kahane, J.-P. (1985), Some Random Series of Functions. 2nd ed., Cambridge University Press, Cambridge.

A naive handwave argument is that for $|z|<1$ we have an exponentially decaying sequence of $z^n$, so if the $X_n$ have some finite average size $E(X)$ and not too divergent variance we should expect convergence, while outside the unit circle any nonzero $E(X)$ will allow it to diverge. We can even invoke the Markov inequality $P(X>t) \leq E(X)/t$ to argue that a series $\sum X_n f(n)$ would converge if $\sum f(n)/n$ converges. However, this is not correct enough for proper mathematics. One entirely possible Gaussian outcome is $X_n=1,10,100,1000,\ldots$ or worse. We need to speak of probabilistic convergence.

Andrés E. Caicedo has a good post about how to approach it properly. The “trick” is the awesome Kolmogorov zero-one law that implies that since the radius of convergence depends on the entire series X_n rather than any finite subset (and they are all independent) it will be a constant.

This kind of natural boundary disk of convergence may look odd to beginning students of complex analysis: after all, none of the functions we normally encounter behave like this. Except that this is of course selection bias. If you look at the example series for lacunary functions they all look like fairly reasonable sparse Taylor series like $z+z^4+z^8+z^16+^32+\lddots$. In calculus we are used to worrying that the coefficients in front of the z-terms of a series don’t diminish fast enough: having fewer nonzero terms seems entirely innocuous. But as Hadamard showed, it is enough that the size of the gaps grow geometrically for the function to get a natural boundary (in fact, even denser series do this – for example having just prime powers). The same is true for Fourier series. Weierstrass’ famous continuous but nowhere differentiable function is lacunary (in his 1880 paper on analytic continuation he gives the example $\sum a_n z^{b^n}$ of an uncontinuable function). In fact, as Emile Borel found and Steinhardt eventually proved in a stricter sense, in general (“almost surely”) a Taylor series isn’t continuable because of boundaries.

One could of course try to combine the analytic combinatorics with the lacunary stuff. In a sense a lacunary generating function is a worst case scenario for the singularity-measuring methods used in analytical combinatorics since you get an infinite number of them at a finite and equal distance, and now have to average them together somehow. Intuitively this case seems to correspond to counting something that becomes rarer at a geometric rate or faster. But the Borel-Steinhardt results suggest that even objects that do not become rare could have nasty natural boundaries – if the number $a_n$ were due to something close enough to random we should expect estimating asymptotics to be hard. The funniest example I can think of is the number of roots of Chaitin-style Diophantine equations where for each $n$ it is an independent arithmetic fact whether there are any: this is hardcore random, and presumably the exact asymptotic growth rate will be uncomputable both practically and theoretically.

# Did amphetamines help Erdős?

During my work on the Paris talk I began to wonder whether Paul Erdős (who I used as an example of a respected academic who used cognitive enhancers) could actually have been shown to have benefited from his amphetamine use, which began in 1971 according to Hill (2004). One way of investigating is his publication record: how many papers did he produce per year before or after 1971? Here is a plot, based on Jerrold Grossman’s 2010 bibliography:

The green dashed line is the start of amphetamine use, and the red dashed life is the date of death. Yes, there is a fairly significant posthumous tail: old mathematicians never die, they just asymptote towards zero. Overall, the later part is more productive per year than the early part (before 1971 the mean and standard deviation was 14.6±7.5, after 24.4±16.1; a Kruskal-Wallis test rejects that they are the same distribution, p=2.2e-10).

This does not prove anything. After all, his academic network was growing and he moved from topic to topic, so we cannot prove any causal effect of the amphetamine: for all we know, it might have been holding him back.

One possible argument might be that he did not do his best work on amphetamine. To check this, I took the Wikipedia article that lists things named after Erdős, and tried to find years for the discovery/conjecture. These are marked with red crosses in the diagram, slightly jittered. We can see a few clusters that may correspond to creative periods: one in 35-41, one in 46-51, one in 56-60. After 1970 the distribution was more even and sparse. 76% of the most famous results were done before 1971; given that this is 60% of the entire career it does not look that unlikely to be due to chance (a binomial test gives p=0.06).

Again this does not prove anything. Maybe mathematics really is a young man’s game, and we should expect key results early. There may also have been more time to recognize and name results from the earlier career.

In the end, this is merely a statistical anecdote. It does show that one can be a productive, well-renowned (if eccentric) academic while on enhancers for a long time. But given the N=1, firm conclusions or advice are hard to draw.

Erdős’s friends worried about his drug use, and in 1979 Graham bet Erdős \$500 that he couldn’t stop taking amphetamines for a month. Erdős accepted, and went cold turkey for a complete month. Erdős’s comment at the end of the month was “You’ve showed me I’m not an addict. But I didn’t get any work done. I’d get up in the morning and stare at a blank piece of paper. I’d have no ideas, just like an ordinary person. You’ve set mathematics back a month.” He then immediately started taking amphetamines again. (Hill 2004)

# Packing my circles

One of the first fractals I ever saw was the Apollonian gasket, the shape that emerges if you draw the circle internally tangent to three other tangent circles. It is somewhat similar to the Sierpinski triangle, but has a more organic flair. I can still remember opening my copy of Mandelbrot’s The Fractal Geometry of Nature and encountering this amazing shape. There is a lot of interesting things going on here.

Here is a simple algorithm for generating related circle packings, trading recursion for flexibility:

1. Start with a domain and calculate the distance to the border for all interior points.
2. Place a circle of radius $\alpha d^*$ at the point with maximal distance $d^*=\max d(x,y)$ from the border.
3. Recalculate the distances, treating the new circle as a part of the border.
4. Repeat (2-3) until the radius becomes smaller than some tolerance.

This is easily implemented in Matlab if we discretize the domain and use an array of distances $d(x,y)$, which is then updated $d(x,y) \leftarrow \min(d(x,y), D(x,y))$ where $D(x,y)$ is the distance to the circle. This trades exactness for some discretization error, but it can easily handle nearly arbitrary shapes.

It is interesting to note that the topology is Apollonian nearly everywhere: as soon as three circles form a curvilinear triangle the interior will be a standard gasket if $\alpha=1$.

In the above pictures the first circle tends to dominate. In fact, the size distribution of circles is a power law: the number of circles larger than r grows as $N(r)\propto r^-\delta$ as we approach zero, with $\delta \approx 1.3$. This is unsurprising: given a generic curved triangle, the inscribed circle will be a fraction of the radii of the bordering circles. If one looks at integral circle packings it is possible to see that the curvatures of subsequent circles grow quadratically along each “horn”, but different “horns” have different growths. Because of the curvature the self-similarity is nontrivial: there is actually, as far as I know, still no analytic expression of the fractal dimension of the gasket. Still, one can show that the packing exponent $\delta$ is the Hausdorff dimension of the gasket.

Anyway, to make the first circle less dominant we can either place a non-optimal circle somewhere, or use lower $\alpha$.

If we place a circle in the centre of a square with a radius smaller than the distance to the edge, it gets surrounded by larger circles.

If the circle is misaligned, it is no problem for the tiling: any discrepancy can be filled with sufficiently small circles. There is however room for arbitrariness: when a bow-tie-shaped region shows up there are often two possible ways of placing a maximal circle in it, and whichever gets selected breaks the symmetry, typically producing more arbitrary bow-ties. For “neat” arrangements with the right relationships between circle curvatures and positions this does not happen (they have circle chains corresponding to various integer curvature relationships), but the generic case is a mess. If we move the seed circle around, the rest of the arrangement both show random jitter and occasional large-scale reorganizations.

When we let $\alpha<1$ we get sponge-like fractals: these are relatives to the Menger sponge and the Cantor set. The domain gets an infinity of circles punched out of itself, with a total area approaching the area of the domain, so the total measure goes to zero.

That these images have an organic look is not surprising. Vascular systems likely grow by finding the locations furthest away from existing vascularization, then filling in the gaps recursively (OK, things are a bit more complex).

# Continued integrals

Many of the most awesome formulas you meet when getting into mathematics are continued fractions like

$\Phi = 1+\frac{1}{1+\frac{1}{1+\frac{1}{\ldots}}}$

$2 = \sqrt{2 + \sqrt{2 + \sqrt{2 + \ldots}}}$.

What about nested/continued integrals? Here is a simple one:

$e^x=1+x+\int x+\left(\int x+\left(\int x+\left(\ldots\right)dx\right)dx\right)dx$.

The way to see this is to recognize that the x in the first integral is going to integrate to $x^2/2$, the x in the second will be integrated twice $\int x^2/2 dx = x^3/3!$, and so on.

In general additive integrals of this kind turn into sums (assuming convergence, handwave, handwave…):

$I(x)=\int f(x)+\left(\int f(x)+\left(\int f(x)+\left(\ldots\right)dx\right)dx\right)dx = \sum_{n=1}^\infty \int^n f(x) dx$.

On the other hand, $I'(x)=f(x)+I(x)$.

So if we insert $f_k(x)=\sin(kx)$ we get the sum $I_k(x)=-\cos(kx)/k-\sin(kx)/k^2+\cos(kx)/k^3+\sin(x)/k^4-\cos(kx)/k^5-\ldots$. For $x=0$ we end up with $I_k(0)=\sum_{n=0}^\infty 1/k^{4n+2} - \sum_{n=0}^\infty 2/k^{4n+1}$. The differential equation has solution $I_k(x)=ce^x-\sin(kx)/(k^2+1) - k\cos(kx)/(k^2+1)$. Setting $k=0$ the integral is clearly zero, so $c=0$. Tying it together we get:

$\sum_{n=0}^\infty 1/k^{4n+2}-\sum_{n=0}^\infty 1/k^{4n+1}=-k/(k^2+1)$.

Things are trickier when the integrals are multiplicative, like $I(x)=\int x \int x \int x \ldots dx dx dx$. However, we can turn it into a differential equation: $I'(x)=x I(x)$ which has the well known solution $I(x)=ce^{x^2/2}$. Same thing for $f_k(x)=\sin(kx)$, giving us $I_k(x)=ce^{-\cos(kx)/k}$. Since we are running indefinite integrals we get those pesky constants.

Plugging in $f(x)=1/x$ gives $I(x)=cx$. If we set $c=1$ we get the mildly amusing and in retrospect obvious formula

$x=\int \frac{\int \frac{\int \frac{\ldots}{x} dx}{x} dx}{x} dx$.

We can of course mess things up further, like $I(x)=\int\sqrt{\int\sqrt{\int\sqrt{\ldots} dx} dx} dx$, where the differential equation becomes $I'^2=I$ with the solution $I(x)=(1/4)(c^2 + 2cx + x^2)$. A surprisingly simple solution to a weird-looking integral. In a similar vein:

$2\cot^{-1}(e^{c-x})=\int\sin\left(\int\sin\left(\int\sin\left(\ldots\right)dx\right) dx\right) dx$

$-\log(c-x)=\int \exp\left(\int \exp\left(\int \exp\left(\ldots \right) dx \right) dx \right) dx$

$1/(c-x)=\int \left(\int \left(\int \left(\ldots \right)^2 dx \right)^2 dx \right)^2 dx$

And if you want a real mind-bender, use the Lambert W function:

$I(x)=\int W\left(\int W\left(\int W\left(\ldots \right) dx \right) dx \right) dx$, then $x=\int_1^{I(x)}1/W(t) dt + c$.

(that is, you get an implicit but well defined expression for the (x,I(x)) values. With Lambert, the x and y axes always tend to switch place).

[And yes, convergence is handwavy in this essay. I think the best way of approaching it is to view the values of these integrals as the functions invariant under the functional consisting of the integral and its repeated function: whether nearby functions are attracted to it (or not) under repeated application of the functional depends on the case. ]

# Gamma function surfaces

The gamma function has a long and interesting history (check out (Davis 1963) excellent review), but one application does not seem to have shown up: minimal surfaces.

A minimal surface is one where the average curvature is always zero; it bends equally in two opposite directions. This is equivalent to having the (locally) minimal area given its boundary: such surfaces are commonly seen as soap films stretched from frames. There exists a rich theory for them, linking them to complex analysis through the Enneper-Weierstrass representation: if you have a meromorphic function g and an analytic function f such that $fg^2$ is holomorphic, then

$X(z)=\Re\left(\int_{z_0}^z f(1-g^2)/2 dz\right)$
$Y(z)=\Re\left(\int_{z_0}^z if(1+g^2)/2 dz\right)$
$Z(z)=\Re\left(\int_{z_0}^z fg dz\right)$

produces a minimal surface $(X(z),Y(z),Z(z))$.

When plugging in the hyperbolic tangent as g and using f=1 I got a new and rather nifty surface a few years back. What about plugging in the gamma function? Let $f=1, g=\Gamma(z)$.

We integrate from the regular point $z_0=1$ to different points $z$ in the complex plane. Let us start with the simple case of $\Re(z)>1/2$.

The surface is a billowing strip, and as we include z with larger and larger real parts the amplitude of the oscillations grow rapidly, making it self-intersect. The behaviour is somewhat similar to the Catalan minimal surface, except that we only get one period. If we go to larger imaginary parts the surface approaches a horizontal plane. OK, the surface is a plane with some wild waves, right?

Not so fast, we have not looked at the mess for Re(z)<0. First, let’s examine the area around the z=0 singularity. Since the values of the integrand blows up close to it, they produce a surface expanding towards infinity – very similar to a catenoid. Indeed, catenoid ends tend to show up where there are poles. But this one doesn’t close exactly: for re(z)<0 there is some overshoot producing a self-intersecting plane-like strip.

The problem is of course the singularity: when integrating in the complex plane we need to avoid them, and depending on the direction we go around them we can get a complex phase that gives us an entirely different value of the function. In this case the branch cut corresponds to the real line: integrating clockwise or counter-clockwise around z=0 to the same z gives different values. In fact, a clockwise turn adds [3.6268i, 3.6268, 6.2832i] (which looks like $\gamma\pi$ – a rather neat residue!) to the coordinates: a translation in the positive y-direction. If we extend the surface by going an extra turn clockwise or counterclockwise a number of times, we get copies that attach seamlessly.

OK, we have a surface with some planar strips that turn wobbly and self-intersecting in the x-direction, with elliptic catenoid ends repeating along the y-direction due to the z=0 singularity. Going down the negative x-direction things look plane between the catenoids… except of course for the catenoids due to all the other singularities for $z=-1,-2,\ldots$. They also introduce residues along the y-direction, but different ones from the z=0 – their extensions of the surface will be out of phase with each other, making the fully extended surface fantastically self-intersecting and confusing.

So, I think we have a simple answer to why the gamma function minimal surface is not well known: it is simply too messy and self-intersecting.

Of course, there may be related nifty surfaces. $1/\Gamma(z)$ is nicely behaved and looks very much like the Enneper surface near zero, with “wings” that oscillate ever more wildly as we move towards the negative reals. No doubt there are other beautiful things to look for in the vicinity.