Infinite Newton

Apropos Newton’s method in the complex plane, what happens when the degree of the polynomial goes to infinity?

Towards infinity

Obviously there will be more zeros, so there will be more attractors and we should expect the boundaries of the basins of attraction to become messier. But it is not entirely clear where the action will be, so it would be useful to compress the entire complex plane into a convenient square.

How do you depict the entire complex plane? While I have always liked the Riemann sphere here I tried mapping x+yi to [\tanh(x),\tanh(y)]. The origin is unchanged, and infinity becomes the edges of the square [-1,1]\times [-1,1]. This is not a conformal map, so things will get squished near the edges.

For color, I used (1/2)+(1/2)[\tanh(|z-1|-1), \tanh(|z+1|-1), \tanh(|z-i|-1)] to map complex coordinates to RGB. This makes the color depend on the distance to 1, -1 and i, making infinity white and zero some drab color (the -1 terms at the end determines the overall color range).

Here is the animated result:

What is going on? As I scale up the size of the leading term from zero, the root created by adding that term moves in from infinity towards the center, making the new basin of attraction grow. This behavior has been described in this post on dancing zeros. The zeros also tend to cluster towards the unit circle, crowding together and distributing themselves evenly. That distribution is the reason for the the colorful “flowers” that surround white spots (poles of the Newton formula, corresponding to zeros of the derivative of the polynomial). The central blob is just the attractor of the most “solid” zero, corresponding to the linear and constant terms of the polynomial.

The jostling is amusing: it looks like the roots do repel each other. This is presumably because close roots require a sharp turn of the function, but the “turning radius” is set by the coefficients that tend to be of order unity. Getting degenerate roots requires coefficients to be in a set of measure zero, so it is rare. Near-degenerate roots exist in a positive measure set surrounding that set, but it is still “small” compared to the general case.

At infinity

So what happens if we let the degree go to infinity? As I previously mentioned, the generic behaviour of \sum_{n=1}^\infty a_n z^n where a_n is independent random numbers is a lacunary function. So we should not expect anything outside the unit circle. Inside the circle there will be poles, so there will be copies of the undefined outside region (because of Great Picard’s Theorem (meromorphic version)). Since the function is analytic these copies will be conformal mappings of the exterior and hence roughly circular. There will also be zeros, and these will have their own basins of attraction. A few of the central ones dominate, but there is an infinite number of attractors as we approach the circular border which is crammed with poles and zeros.

Since we now know we will only deal with the unit disk, we can avoid transforming the entire plane and enjoy the results:

Attractors for random 10,000-degree polynomial.
Attractors for random 10,000-degree polynomial.
Attractors for random 10,000-degree polynomial.
Attractors for random 10,000-degree polynomial.

What happens here is that the white regions represents places where points get mapped onto the undefined outside, while the colored fractal regions are the attraction basins for the zeros. And between them there is a truly wild boundary. In the vanilla z^3+1 fractal every point on the boundary is a meeting point of the three basins, a tri-point. Here there is an infinite number of attractors: the boundary consists of points where an infinite number of different attractors meet.

Newtonmas fractals: conquering the second dimension!

Perturbed Newton "classic", epsilon=-3.75.
Perturbed Newton “classic”, epsilon=-3.75.

It is Newtonmas, so time to invent some new fractals.

Complex iteration of Newton’s method for finding zeros of a function is a well-known way of getting lovely filigree Julia sets: the basins of attraction of the different zeros have fractal borders.

But what if we looked at real functions? If we use a single function f(x,y) the zeros will typically form a curve in the plane. In order to get discrete zeros we typically need to have two functions to produce a zero set. We can think of it as a map from R2 to R2 F(x)=[f_1(x_1,x_2), f_2(x_1,x_2)] where the x’es are 2D vectors. In this case Newton’s method turns into solving the linear equation system J(x_n)(x_{n+1}-x_n)=-F(x_n) where J(x_n) is the Jacobian matrix (J_{ij}=dF_i/dx_j) and x_n now denotes the n’th iterate.

The simplest case of nontrivial dynamics of the method is for third degree polynomials, and we want the x- and y-directions to interact a bit, so a first try is F=[x^3-x-y, y^3-x-y]. Below is a plot of the first and second components (red and green), as well as a blue plane for zero values. The zeros of the function are the three points where red, green and blue meet.

We have three zeros, one at x=y=-\sqrt{2}, one at x=y=0, and one at x=y=\sqrt{2}. The middle one has a region of troublesomely similar function values – the red and green surfaces are tangent there.

Plot of x^3-x-y (red), y^3-x-y (green) and z=0 (blue). The zeros found using Newton's method are the points where red, green and blue meet.
Plot of x^3-x-y (red), y^3-x-y (green) and z=0 (blue). The zeros found using Newton’s method are the points where red, green and blue meet.

The resulting fractal has a decided modernist bent to it, all hyperbolae and swooshes:

Behavior of Newton's method in 2D for F=[x^3-x-y, y^3-x-y]. Color denotes value of x+y, with darkening for slow convergence.
Behavior of Newton’s method in 2D for F=[x^3-x-y, y^3-x-y]. Color denotes value of x+y, with darkening for slow convergence.
The troublesome region shows up, as well as the red and blue regions where iterates run off to large or small values: the three roots are green shades.

Why is the style modernist?

In complex iterations you typically multiply with complex numbers, and if they have an imaginary component (they better have, to be complex!) that introduces a rotation or twist. Hence baroque filaments are ubiquitous, and you get the typical complex “style”.

But here we are essentially multiplying with a real matrix. For a real 2×2 matrix to be a rotation matrix it has to have a pair of imaginary eigenvalues, and for it to at least twist things around the trace needs to be small enough compared to the determinant so that there are complex eigenvalues: T^2/4<D (where T=a+d and D=ad-bc if the matrix has the usual [a b; c d] form). So if the trace and determinant are randomly chosen, we should expect a majority of cases to be non-rotational.

Moreover, in this particular case, the Jacobian tends to be diagonally dominant (quadratic terms on the diagonal) and that makes the inverse diagonally dominant too: the trace will be big, and hence the chance of seeing rotation goes down even more. The two “knots” where a lot of basins of attraction come together are the points where the trace does vanish, but since the Jacobian is also symmetric there will not be any rotation anyway. Double guarantee.

Can we make a twisty real Newton fractal? If we start with a vanilla rotation matrix and try to find a function that produces it the simplest case is F=[x \cos(\theta) + y \sin(\theta), x\sin(\theta)+y\cos(\theta)]. This is of course just a rotation by the angle theta, and it does not have very interesting zeros.

To get something fractal we need more zeros, and a few zeros in the derivatives too (why? because they cause iterates to be thrown away far from where they were, ensuring a complicated structure of the basin boundaries). One attempt is F=[ (x^3-x-y) \cos(\theta) -(y^3-x-y) \sin(\theta), (x^3-x-y) \sin(\theta)+(y^3-y-x) \cos(\theta) ]. The result is fun, but still far from baroque:

Basins of attraction for Netwon's method of twisted cubic. theta=1.
Basins of attraction for Netwon’s method of twisted cubic. theta=1.
Basins of attraction for Netwon's method of twisted cubic. theta=0.1.
Basins of attraction for Netwon’s method of twisted cubic. theta=0.1.

The problem might be that the twistiness is not changing. So we can make \theta=x to make the dynamics even more complex:

Basins of attraction with rotation proportional to x.
Basins of attraction with rotation proportional to x.

Quite lovely, although still not exactly what I wanted (sounds like a Christmas present).

Back to the classics?

It might be easier just to hide the complex dynamics in an apparently real function like F=[x^3-3xy^2-1, 3x^2y-y^3] (which produces the archetypal f(z)=z^3-1 Newton fractal).

Newton fractal for F=[x^3-3xy^2-1, 3x^2y-y^3].
Newton fractal for F=[x^3-3xy^2-1, 3x^2y-y^3]. Red and blue circles mark regions where iterates venture far from the origin.
It is interesting to see how much perturbing it causes a modernist shift. If we use F=[x^3-3xy^2-1 + \epsilon x, 3x^2y-y^3], then for \epsilon=1 we get:

Perturbed z^3-1 Newton iteration, epsilon=1.
Perturbed z^3-1 Newton iteration, epsilon=1.

As we make the function more perturbed, it turns more modernist, undergoing a topological crisis for epsilon between 3.5 and 4:

Perturbed z^3-1 Newton iteration, epsilon=2.
Perturbed z^3-1 Newton iteration, epsilon=2.
Perturbed z^3-1 Newton iteration, epsilon=3.
Perturbed z^3-1 Newton iteration, epsilon=3.
Perturbed z^3-1 Newton iteration, epsilon=3.5.
Perturbed z^3-1 Newton iteration, epsilon=3.5.
Perturbed z^3-1 Newton iteration, epsilon=4.
Perturbed z^3-1 Newton iteration, epsilon=4.

In the end, we can see that the border between classic baroque complex fractals and the modernist swooshy real fractals is fuzzy. Or, indeed, fractal.

Packing my circles

One of the first fractals I ever saw was the Apollonian gasket, the shape that emerges if you draw the circle internally tangent to three other tangent circles. It is somewhat similar to the Sierpinski triangle, but has a more organic flair. I can still remember opening my copy of Mandelbrot’s The Fractal Geometry of Nature and encountering this amazing shape. There is a lot of interesting things going on here.

Here is a simple algorithm for generating related circle packings, trading recursion for flexibility:

  1. Start with a domain and calculate the distance to the border for all interior points.
  2. Place a circle of radius \alpha d^* at the point with maximal distance d^*=\max d(x,y) from the border.
  3. Recalculate the distances, treating the new circle as a part of the border.
  4. Repeat (2-3) until the radius becomes smaller than some tolerance.

This is easily implemented in Matlab if we discretize the domain and use an array of distances d(x,y), which is then updated d(x,y) \leftarrow \min(d(x,y), D(x,y)) where D(x,y) is the distance to the circle. This trades exactness for some discretization error, but it can easily handle nearly arbitrary shapes.

Apollonian circle packing in square.
Apollonian circle packing in square.
Apollonian circle packing in blob.
Apollonian circle packing in blob.
Apollonian circle packing in heart.
Apollonian circle packing in heart.

It is interesting to note that the topology is Apollonian nearly everywhere: as soon as three circles form a curvilinear triangle the interior will be a standard gasket if \alpha=1.

Number of circles larger than a certain radius in packing in blob shape.
Number of circles larger than a certain radius in packing in blob shape.

In the above pictures the first circle tends to dominate. In fact, the size distribution of circles is a power law: the number of circles larger than r grows as N(r)\propto r^-\delta as we approach zero, with \delta \approx 1.3. This is unsurprising: given a generic curved triangle, the inscribed circle will be a fraction of the radii of the bordering circles. If one looks at integral circle packings it is possible to see that the curvatures of subsequent circles grow quadratically along each “horn”, but different “horns” have different growths. Because of the curvature the self-similarity is nontrivial: there is actually, as far as I know, still no analytic expression of the fractal dimension of the gasket. Still, one can show that the packing exponent \delta is the Hausdorff dimension of the gasket.

Anyway, to make the first circle less dominant we can either place a non-optimal circle somewhere, or use lower \alpha.

Apollonian packing in square with central circle of radius 1/6.
Apollonian packing in square with central circle of radius 1/6.

If we place a circle in the centre of a square with a radius smaller than the distance to the edge, it gets surrounded by larger circles.

Randomly started Apollonian packing.
Randomly started Apollonian packing.

If the circle is misaligned, it is no problem for the tiling: any discrepancy can be filled with sufficiently small circles. There is however room for arbitrariness: when a bow-tie-shaped region shows up there are often two possible ways of placing a maximal circle in it, and whichever gets selected breaks the symmetry, typically producing more arbitrary bow-ties. For “neat” arrangements with the right relationships between circle curvatures and positions this does not happen (they have circle chains corresponding to various integer curvature relationships), but the generic case is a mess. If we move the seed circle around, the rest of the arrangement both show random jitter and occasional large-scale reorganizations.


When we let \alpha<1 we get sponge-like fractals: these are relatives to the Menger sponge and the Cantor set. The domain gets an infinity of circles punched out of itself, with a total area approaching the area of the domain, so the total measure goes to zero.

Apollonian packing with alpha=0.5.
Apollonian packing with alpha=0.5.

That these images have an organic look is not surprising. Vascular systems likely grow by finding the locations furthest away from existing vascularization, then filling in the gaps recursively (OK, things are a bit more complex).

Apollonian packing with alpha=1/4.
Apollonian packing with alpha=1/4.
Apollonian packing with alpha=0.1.
Apollonian packing with alpha=0.1.

Gamma function fractals

Another of my favourite functions if the Gamma function, \Gamma(z)=\int_0^\infty t^{z-1}e^{-t} dt, the continuous generalization of the factorial. While it grows rapidly for positive reals, it has fun poles for the negative integers and is generally complex. What happens when you iterate it?

First I started by just applying it to different starting points, z_{n+1} = \Gamma(z_n). The result is a nice fractal, with some domains approaching 1, and others running off to infinity.
Gamma function Julia set
Here I color points that go to infinity in green shades on the number of iterations before they become very large, and the points approaching 1 by |z_{30}-1|. Zooming in a bit more reveals neat self-similar patterns with alternating “beans”:
Gamma function Julia set detailGamma function Julia set zoom
In the outside regions we have thin tendrils stretching towards infinity. These are familiar to anybody who has been iterating exponentials or trigonometric functions: the combination of oscillation and (super)exponential growth leads to the pattern.

OK,that was a Julia set (different starting points, same formula). What about a counterpart to the Mandelbrot set? I looked at z_{n+1}=\Gamma(cz_n) where c is the control parameter. I start with z_0=1 and iterate:
Gamma function Mandelbrot set

Zooming in shows the same kind of motif copies of Julia sets as we see in the quadratic Mandelbrot set:
gammamandel2+izgammamandel1+1.2igammamandel1.5+1.2i
In fact, zooming in as above in the counterpart to the “seahorse valley” shows a remarkable similarity.

Shakespearian numbers

Number lineDuring a recent party I got asked the question “Since \pi has an infinite decimal expansion, does that mean the collected works of Shakespeare (suitably encoded) are in it somewhere?”

My first response was to point out that infinite decimal expressions are not enough: obviously 1/3=0.33333\ldots is a Shakespeare-free number (unless we have a bizarre encoding of the works in the form of all threes). What really matters is whether the number is suitably random. In mathematics this is known as the question about whether pi is a normal number.

If it is normal, then by the infinite monkey theorem then Shakespeare will almost surely be in the number. We actually do not know whether pi is normal, but it looks fairly likely. But that is not enough for a mathematician. A good overview of the problem can be found in a popular article by Bailey and Borwein. (Yep, one of the Borweins)

Where are the Shakespearian numbers?

This led to a second issue: what is the distribution of the Shakespeare-containing numbers?

We can encode Shakespeare in many ways. As an ASCII text the works take up 5.3 MB. One can treat this as a sequence of 7-bit characters and the works as  37,100,000 bits, or 11,168,212 decimal digits. A simple code where each pair of digits encode a character would encode 10,600,000 digits. This allows just a 100 character alphabet rather than a 127 character alphabet, but is likely OK for Shakespeare: we can use the ASCII code minus 32, for example.

If we denote the encoded works of Shakespeare by [Shakespeare], all numbers of the form 0.[Shakespeare]xxxxx\ldots are Shakespeare-containing.

They form a rather tiny interval: since the works start with ‘The’, [Shakespeare] starts as “527269…” and the interval lies inside the interval [0. 527269000\ldots , 0.52727], a mere millionth of [0,1]. The actual interval is even shorter.

But outside that interval there are numbers of the form 0.y[Shakespeare]xxxx\ldots , where y is a digit different from the starting digit of [Shakespeare] and x anything else. So there are 9 such second level intervals, each ten times thinner than the first level interval.

This pattern continues, with the intervals at each level ten times thinner but also 9 times as numerous. This is fairly similar to the Cantor set and gives rise to a fractal. But since the intervals are very tiny it is hard to see.

One way of visualizing this is to assume the weird encoding [Shakespeare]=3, so all numbers containing the digit 3 in the decimal expansion are Shakespearian and the rest are Shakespeare-free.

Distribution of Shakespeare-free numbers in the unit interval, assuming Shakespeare's collected works are encoded as the digit "3".
Distribution of Shakespeare-free numbers in the unit interval, assuming Shakespeare’s collected works are encoded as the digit “3”.

The fractal dimension of this Shakespeare-free set is \log(9)/\log(10)\approx 0.9542. This is less than 1: most points are Shakespearian and in one of the intervals, but since they are thin compared to the line the Shakespeare-free set is nearly one dimensional. Like the Cantor set, each Shakespeare-free number is isolated from any other Shakespeare-free number: there is always some Shakespearian numbers between them.

In the case of the full 5.3MB [Shakespeare] the interval length is around 10^{-10,600,000}. The fractal dimension of the Shakespeare-free set is \log(10^{10,600,000} - 1)/\log(10^{10,600,600}) \approx 1-\epsilon, for some tiny \epsilon \approx 10^{-10,600,000}.  It is very nearly an unbroken line… except for that nearly every point actually does contain Shakespeare.

We have been looking at the unit interval. We can of course look at the entire real line too, but the pattern is similar: just magnify the unit interval pattern by 10, 100, 1000, … times. Somewhere around  $10^{10,600,000}$ there are the numbers that have an integer part equal to [Shakespeare]. And above them are the intervals that start with his works followed by something else, a decimal point and then any decimals. And beyond them there are the [Shakespeare][Shakespeare]xxx\ldots numbers…

Shakespeare is common

One way of seeing that Shakespearian numbers are the generic case is to imagine choosing a number randomly. It has probability S of being in the level 1 interval of Shakespearian numbers. If not, then it will be in one of the 9 intervals 1/10 long that don’t start with the correct first digit, where the probability of starting with Shakespeare in the second digit is S. If that was all there was, the total probability would be S+(9/10)S+(9/10^2)S+\ldots = 10S<1. But the 1/10 interval around the first Shakespearian interval also counts: a number that has the right first digit but wrong second digit can still be Shakespearian. So it will add probability.

Another way of thinking about it is just to look at the initial digits: the probability of starting with [Shakespeare] is S, the probability of starting with [Shakespeare] in position 2 is (1-S)S (the first factor is the probability of not having Shakespeare first), and so on. So the total probability of finding Shakespeare is S + (1-S)S + (1-S)^2S + (1-S)^3S + \ldots = S/(1-(1-S))=1. So nearly all numbers are Shakespearian.

This might seem strange, since any number you are likely to mention is very likely Shakespeare-free. But this is just like the case of transcendental, normal or uncomputable numbers: they are actually the generic case in the reals, but most everyday numbers belong to the algebraic, non-normal and computable numbers.

It is also worth remembering that while all normal numbers are (almost surely) Shakespearian, there are non-normal Shakespearian numbers. For example, the fractional number 0.[Shakespeare]000\ldots is non-normal but Shakespearian. So is 0.[Shakespeare][Shakespeare][Shakespeare]\ldots We can throw in arbitrary finite sequences of digits between the Shakespeares, biasing numbers as close or far as we want from normality. There is a number 0.[Shakespeare]3141592\ldots that has the digits of \pi plus Shakespeare. And there is a number that looks like \pi until Graham’s number digits, then has a single Shakespeare and then continues. Shakespeare can hide anywhere.

In things of great receipt with case we prove,
Among a number one is reckoned none.
Then in the number let me pass untold,
Though in thy store’s account I one must be
-Sonnet 136

Rational fractal distributions

Up among the peaksMost of the time we encounter probability distributions over the reals, the positive reals, or integers. But one can use the rational numbers as a probability space too.

Recently I found the paper Vladimir Trifonov, Laura Pasqualucci, Riccardo Dalla-Favera & Raul Rabadan. Fractal-like Distributions over the Rational Numbers in High-throughput Biological and Clinical Data. Scientific Reports 1:191 DOI: 10.1038/srep00191. They discuss the distribution of ratios of the number of reads from the same spot of DNA that come from each chromosome in a pair: the number of reads is an integer, so the ratio is rational. They get a peaky, self-similar distribution empirically, and the paper explains why. 

If you take positive independent integers from some distribution f(n) and generate ratios q=a/(a+b), then those ratios will have a distribution that is a convolution over the rational numbers: g(q) = g(a/(a+b)) = \sum_{m=0}^\infty \sum_{n=0}^\infty f(m) g(n) \delta \left(\frac{a}{a+b} - \frac{m}{m+n} \right ) = \sum_{t=0}^\infty f(ta)f(tb)

One can of course do the same for non-independent and different distributions of the integers. Oh, and by the way: this whole thing has little to do with ratio distributions (alias slash distributions), which is what happens in the real case.

The authors found closed form solutions for integers distributed as a power-law with an exponential cut-off and for the uniform distribution; unfortunately the really interesting case, the Poisson distribution, doesn’t seem to have a neat closed form solution.

In the case of a uniform distributions on the set \{1,2,\ldots , L\} they get g(a/(a+b)) = (1/L^2) \lfloor L/\max(a,b) \rfloor .

The rational distribution g(a/(a+b))=1/max(a,b) of Trifonov et al.
The rational distribution g(a/(a+b))=1/max(a,b) of Trifonov et al.

They note that this is similar to Thomae’s function, a somewhat well-known (and multiply named) counterexample in real analysis. That function is defined as f(p/q)=1/q (where the fraction is in lowest terms). In fact, both graphs have the same fractal dimension of 1.5.

It is easy to generate other rational distributions this way. Using a power law as an input produces a sparser pattern, since the integers going into the ratio tend to be small numbers, putting more probability at simple ratios:

The rational distribution g(a/(a+b))=C(ab)^-2 of Trifonov et al.
The rational distribution g(a/(a+b))=C(ab)^-2 (rational convolution of two index -2 power-law distributed integers).

If we use exponential distributions the pattern is fairly similar, but we can of course change the exponent to get something that ranges over a lot of numbers, putting more probability at nonsimple ratios p/q where p+q \gg 1:

The rational distribution of two convolved Exp[0.1] distributions.
The rational distribution of two convolved Exp[0.1] distributions.
Not everything has to be neat and symmetric. Taking the ratio of two unequal Poisson distributions can produce a rather appealing pattern:

Rational distribution of ratio between a Poisson[10] and a Poisson[5] variable.
Rational distribution of ratio between a Poisson[10] and a Poisson[5] variable.
Of course, full generality would include ratios of non-positive numbers. Taking ratios of normal variates rounded to the nearest integer produces a fairly sparse distribution since high numerators or denominators are rare.

Rational distribution of normal variates rounded to nearest integer.
Rational distribution of a/(a+b) ratios of normal variates rounded to nearest integer.

But multiplying the variates by 10 produces a nice distribution.

Rational distribution of ratios of normal variates multiplied by 10 and rounded.
Rational distribution of a/(a+b) ratios of normal variates that have been multiplied by 10 and rounded.

This approaches the Chauchy distribution as the discretisation gets finer. But note the fun microstructure (very visible in the Poisson case above too), where each peak at a simple ratio is surrounded by a “moat” of low probability. This is reminiscent of the behaviour of roots of random polynomials with integer coefficients (see also John Baez page on the topic).

The rational numbers do tend to induce a fractal recursive structure on things, since most measures on them will tend to put more mass at simple ratios than at complex ratios, but when plotting the value of the ratio everything gets neatly folded together. The lower approximability of numbers near the simple ratios produce moats. Which also suggests a question to ponder further: what role does the über-unapproximable golden ratio have in distributions like these?