# Newtonmass fractals 2018

It is Newtonmass, and that means doing some fun math. I try to invent a new fractal or something similar every year: this is the result for 2018.

The Newton fractal is an old classic. Newton’s method for root finding iterates an initial guess $z_0$ to find a better approximation $z_{n+1}=z_{n}-f(z_{n})/f'(z_{n})$. This will work as long as $f'(z)\neq 0$, but which root one converges to can be sensitively dependent on initial conditions. Plotting which root a given initial value ends up with gives the Newton fractal.

The Newton-Gauss method is a method for minimizing the total squared residuals $S(\beta)=\sum_{i=1}^m r_i^2(\beta)$ when some function dependent on n-dimensional parameters $\beta$ is fitted to $m$ data points $r_i(\beta)=f(x_i;\beta)-y_i$. Just like the root finding method it iterates towards a minimum of $S(\beta)$: $\beta_{n+1} = \beta_n - (J^t J)^{-1}J^t r(\beta)$ where $J$ is the Jacobian $J_{ij}=\frac{\partial r_i}{\partial \beta_j}$. This is essentially Newton’s method but in a multi-dimensional form.

So we can make fractals by trying to fit (say) a function consisting of the sum of two Gaussians with different means (but fixed variances) to a random set of points. So we can set $f(x;\beta_1,\beta_2)=(1/\sqrt{2\pi})[e^{-(x-\beta_1)^2/2}+(1/4)e^{-(x-\beta_1)^2/8}]$ (one Gaussian with variance 1 and one with 1/4 – the reason for this is not to make the diagram boringly symmetric as for the same variance case). Plotting the location of the final $\beta(50)$ (by stereographically mapping it onto a unit sphere in (r,g,b) space) gives a nice fractal:

It is a bit modernistic-looking. As I argued in 2016, this is because the generic local Jacobian of the dynamics doesn’t have much rotation.

As more and more points are added the attractor landscape becomes simpler, since it is hard for the Gaussians to “stick” to some particular clump of points and the gradients become steeper.

This fractal can obviously be generalized to more dimensions by using more parameters for the Gaussians, or more Gaussians etc.

The fractality is guaranteed by the generic property of systems with several attractors that points at the border of two basins of attraction will tend to find their ways to other attractors than the two equally balanced neighbors. Hence a cut transversally across the border will find a Cantor-set of true basin boundary points (corresponding to points that eventually get mapped to a singular Jacobian in the iteration formula, like the boundary of the Newton fractal is marked by points mapped to $f'(z_n)=0$ for some n) with different basins alternating.

Merry Newtonmass!

# Fractals and Steiner chains

I recently came across this nice animated gif of a fractal based on a Steiner chain, due to Eric Martin Willén. I immediately wanted to replicate it.

### Make Steiner chains easily

First, how do you make a Steiner chain? It is easy using inversion geometry. Just decide on the number of circles tangent to the inner circle ($n$). Then the ratio of the radii of the inner and outer circle will be $r/R = (1-\sin(\pi/n))/(1+\sin(\pi/n))$. The radii of the circles in the ring will be $(R-r)/2$ and their centres are located at distance $(R+r)/2$ from the origin. This produces a staid concentric arrangement. Now invert with relation to an arbitrary circle: all the circles are mapped to other circles, their tangencies preserved. Voila! A suitably eccentric Steiner chain to play with.

Since the original concentric chain obviously can be rotated continuously without losing touch with the inner and outer circle, this also generates a continuous family of circles after the inversion. This is why Steiner’s porism is true: if you can make the initial chain, you get an infinite number of other chains with the same number of circles.

### Iterated function systems with circle maps

The fractal works by putting copies of the whole set of circles in the chain into each circle, recursively. I remap the circles so that the outer circle becomes the unit circle, and then it is easy to see that for a given small circle with (complex) centre $z$ and radius $r$ the map $f(w)=(w+z)r$ maps the interior of the unit circle to it. Use the ease of rotating the original concentric ring to produce an animation, and we can reconstruct the fractal.

Done.

Except… it feels a bit dry.

Ever since I first encountered iterated function systems in the 1980s I have felt they tend towards a geometric aesthetics that is not me, ferns notwithstanding. A lot has to do with the linearity of the transformations. One can of course add rotations, which cheers up the fractal a bit.

But still, I love the nonlinearity and harmony of conformal mappings.

### Inversion makes things better!

Enter the circle inversion fractals. They are the sets of the plane that map to themselves when being inverted in any and all of a set of generating circles (or, equivalently, the limit set of points under these inversions). As a rule of thumb, when the circles do not touch the fractal will be Cantor/Fatou-style fractal dust. When the circles are tangent the fractal will pass through the point of tangency. If three circles are tangent the fractal will contain a circle passing through these points. Since Steiner chains have lots of tangencies, we should get a lot of delicious fractals by using them as generators.

I use nearly the same code I used for the elliptic inversion fractals, mostly because I like the colours. The “real” fractal is hidden inside the nested circles, composed of an infinite Apollonian gasket of circles.

Note how the fractal extends outside the generators, forming a web of circles. Convergence is slow near tangent points, making it “fuzzy”. While it is easy to see the circles that belong to the invariant set that are empty, there are also circles going through the foci inside the coloured disks, touching the more obvious circles near those fuzzy tangent points. There is a lot going on here.

But we can complicate things by allowing the chain to slide and see how the fractal changes.

This is pretty neat.

# Håkan’s surface

Here is a minimal surface based on the Weierstrass-Enneper representation $f(z)=1, g(z)=\tanh^2(z)$. Written explicitly as a function from the complex number z to 3-space it is $\Re([-\tanh(z)(\mathrm{sech}^2(z)-4)/6,i(6z+\tanh(z)(\mathrm{sech}^2(z)-4))/6,z-\tanh(z)])$.

It is based on my old tanh surface, but has a wilder style. It gets helped by the fact that my triangulation in the picture is pretty jagged. On one hand it has two flat ends, but also a infinite number of catenoid openings (only two shown here).

I call it Håkan’s surface, since I came up with it on my dear husband’s birthday. Happy birthday, Håkan!

# An elliptic remark

I recently returned to toying around with circle and sphere inversion fractals, that is, fractal sets that are invariant under inversion in a given set of circles or spheres.

That got me thinking: can you invert points in other things than circles? Of course you can! José L. Ramírez has written a nice overview of inversion in ellipses. Basically a point $P$ is projected to another point $P'$ so that $||P-O||\cdot ||P'-O||=||Q-O||^2$ where $O$ is the centre of the ellipse and $Q$ is the point where the ray between $O, P', P$ intersects the ellipse.

In Cartesian coordinates, for an ellipse centered on the origin and with semimajor and minor axes $a,b$, the inverse point of $P=(u,v)$ is $P'=(x,y)$ where $x=\frac{a^2b^2u}{b^2u^2+a^2v^2}$ and $y=\frac{a^2b^2v}{b^2u^2+a^2v^2}$. Basically this is a squashed version of the circle formula.

Many of the properties remain the same. Lines passing through the centre of the ellipse are unchanged. Other lines get mapped to ellipses; if they intersect the inversion ellipse the new ellipse also intersect it at those points. Hence tangent lines are mapped to tangent ellipses. Ellipses with parallel axes and equal eccentricities map onto other ellipses (or lines if they intersect the centre of the inversion ellipse). Other conics get turned into cubics; for example a hyperbola gets mapped to a lemniscate. (See also this paper for more examples).

Now, from a fractal standpoint this means that if you have a set of ellipses that are tangent you should expect a fractal passing through their points of tangency. Basically all of the standard circle inversion fractals hence have elliptic counterparts. Here is the result for a ring of 4 or 6 mutually tangent ellipses:

These pictures were generated by taking points in the plane and inverting them with randomly selected ellipses; as the process continues they get attracted to the invariant set (this is basically a standard iterated function system). It also has the known problem of finding the points at the tangencies, since the iteration has to loop consistently between inverting in the two ellipses to get there, but it is likely that a third will be selected at some point.

One approach is to deliberately recurse downward to find the points using a depth first search. We can take look at where each ellipse is mapped by each of the inversions, and since the fractal is inside each of the mapped ellipses we can then continue mapping the chain of mapped ellipses, getting nice bounds on where it is going (as long as everything is shrinking: this is guaranteed as long as it is mappings from the outside to the inside of the generating ellipses, but if they were to overlap things can explode). Doing this for just one step reveals one reason for the quirky shapes above: some of the ellipses get mapped into crescents or pears, adding a lot of bends:

Now, continuing this process makes a nested structure where the invariant set is hidden inside all the other mapped ellipses.

It is still hard to reach the tangent points, but at least now they are easier to detect. They are also numerically tough: most points on the ellipse circumferences are mapped away from them towards the interior of the generating ellipse. Still, if we view the mapped ellipses as uncertainties and shade them in we can get a very pleasing map of the invariant set:

Here are a few other nice fractals based on these ideas:

Using a mix of circles and ellipses produces a nice mix of the regularity of the circle-based Apollonian gaskets and the swooshy, Hénon fractal shape the ellipses induce.

# Appendix: Matlab code

% center=[-1 -1 2 1; -1 1 1 2; 1 -1 1 2; 1 1 2 1];
% center(:,3:4)=center(:,3:4)*(2/3);
%
%center=[-1 -1 2 1; -1 1 1 2; 1 -1 1 2; 1 1 2 1; 3 1 1 2; 3 -1 2 1];
%center(:,3:4)=center(:,3:4)*(2/3);
%center(:,1)=center(:,1)-1;
%
% center=[-1 -1 2 1; -1 1 1 2; 1 -1 1 2; 1 1 2 1];
% center(:,3:4)=center(:,3:4)*(2/3);
% center=[center; 0 0 .51 .51];
%
% egg
% center=[0 0 0.6666 1; 2 2 2 2; -2 2 2 2; -2 -2 2 2; 2 -2 2 2];
%
% double
%r=0.5;
%center=[-r 0 r r; r 0 r r; 2 2 2 2; -2 2 2 2; -2 -2 2 2; 2 -2 2 2];
%
% Double egg
center=[0.3 0 0.3 0.845; -0.3 0 0.3 0.845; 2 2 2 2; -2 2 2 2; -2 -2 2 2; 2 -2 2 2];
%
M=size(center,1); % number of ellipses
N=100; % points on fill curves
X=randn(N+1,2);
clf
hold on
tt=2*pi*(0:N)/N;
alpha 0.2
for i=1:M
X(:,1)=center(i,1)+center(i,3)*cos(tt);
X(:,2)=center(i,2)+center(i,4)*sin(tt);
plot(X(:,1),X(:,2),'k');
for j=1:M
if (i~=j)
recurseDown(X,[i j],10,center)
drawnow
end
end
end


## Recursedown.m

function recurseDown(X,ellword,maxlevel,center)
i=ellword(end); % invert in latest ellipse
%
% Perform inversion
C=center(i,1:2);
A2=center(i,3).^2;
B2=center(i,4).^2;
Y(:,1)=X(:,1)-C(:,1);
Y(:,2)=X(:,2)-C(:,2);
X(:,1)=C(:,1)+A2.*B2.*Y(:,1)./(B2.*Y(:,1).^2+A2.*Y(:,2).^2);
X(:,2)=C(:,2)+A2.*B2.*Y(:,2)./(B2.*Y(:,1).^2+A2.*Y(:,2).^2);
%
if (norm(max(X)-min(X))<0.005) return; end
%
co=hsv(size(center,1));
coco=mean([1 1 1; 1 1 1; co(ellword,:)]);
%
%    plot(X(:,1),X(:,2),'Color',coco)
fill(X(:,1),X(:,2),coco,'FaceAlpha',.2,'EdgeAlpha',0)
%
if (length(ellword)<maxlevel)
for j=1:size(center,1)
if (j~=i)
recurseDown(X,[ellword j],maxlevel,center)
end
end
end


# Calabi-Yau and Hanson’s surfaces

I have a glass cube on my office windowsill containing a slice of a Calabi-Yau manifold, one of Bathsheba Grossman’s wonderful creations. It is an intricate, self-intersecting surface with lots of unexpected symmetries. A visiting friend got me into trying to make my own version of the surface.

First, what is the equation for it? Grossman-Hanson’s explanation is somewhat involved, but basically what we are seeing is a 2D slice through a 6-dimensional manifold in a projective space expressed as the 4D manifold $z_1^5+z_2^5=1$, where the variables are complex. Hanson shows that this is a kind of complex superquadric in this paper. This leads to the formulae:

$z_1(\theta,\xi,k_1)=e^{2\pi i k_1 / n}\cosh(\theta+\xi i)^{2/n}$

$z_2(\theta,\xi,k_2)=e^{2 \pi i k_2 / n}\sinh(\theta+\xi i)^{2/n}/i$

where the k’s run through $0 \leq k \leq (n-1)$. Each pair $k_1,k_2$ corresponds to one patch of what is essentially a complex catenoid. This is still a 4D object. To plot it, we plot the points

$(\Re(z_1),\Re(z_2),\cos(\alpha)\Im(z_1)+\sin(\alpha)\Im(z_2))$

where $\alpha$ is some suitable angle to tilt the projection into 3-space. Hanson’s explanation is very clear; I originally reverse-engineered the same formula from the code at Ziyi Zhang’s site.

The result is pretty nifty. It is tricky to see how it hangs together in 2D; rotating it in 3D helps a bit. It is composed of 16 identical patches:

The boundary of the patches meet other patches except along two open borders (corresponding to large or small values of $\theta$): these form the edges of the manifold and strictly speaking I ought to have rendered them to infinity. That would have made it unbounded and somewhat boring to look at: four disks meeting at an angle, with the interesting part hidden inside. By marking the edges we can see that the boundary are four linked wobbly circles:

A surface bounded by a knot or a link is called a Seifert surface. While these surfaces look a lot like minimal surfaces they are not exactly minimal when I estimate the mean curvature (it should be exactly zero); while this could be because of lack of numerical precision I think it is real: while minimal surfaces are Ricci-flat, the converse is not necessarily true.

Changing N produces other surfaces. N=2 is basically a catenoid (tilted and self-intersecting). As N increases it becomes more like a barrel or a pufferfish, with one direction dominated by circular saddle regions, one showing a meshwork of spaces reminiscent of spacefilling minimal surfaces, and one a lot of overlapping “barbs”.

Note that just like for minimal surfaces one can multiply $z_1, z_2$ by $e^{i\omega}$ to get another surface in an associate family. In this case it circulates the patches along their circles without changing the surface much.

Hanson also notes that by changing the formula to $z_1^{n_1}+z_2^{n_2}=1$ we can get boundaries that are torus-knot-like. This leads to the formulae:

$z_1(\theta,\xi,k_1)=e^{2\pi i k_1 / n_1}\cosh(\theta+\xi i)^{2/n_1}$

$z_2(\theta,\xi,k_2)=e^{2 \pi i k_2 / n_2}\sinh(\theta+\xi i)^{2/n_2}/i$

## Appendix: Matlab code

%% Initialization edge=0; % Mark edge? coloring=1; % Patch coloring type n=4; s=0.1; % Gridsize alp=1; ca=cos(alp); sa=sin(alp); % Projection [theta,xi]=meshgrid(-1.5:s:1.5,1*(pi/2)*(0:1:16)/16); z=theta+xi*i; % Color scheme tt=2*pi*(1:200)'/200; co=.5+.5*[cos(tt) cos(tt+1) cos(tt+2)]; colormap(co) %% Plot clf hold on for k1=0:(n-1) for k2=0:(n-1) z1=exp(k1*2*pi*i/n)*cosh(z).^(2/n); z2=exp(k2*2*pi*i/n)*(1/i)*sinh(z).^(2/n); X=real(z1); Y=real(z2); Z=ca*imag(z1)+sa*imag(z2); if (coloring==0) surf(X,Y,Z); else switch (coloring) case 1 C=z1*0+(k1+k2*n); % Color by patch case 2 C=abs(z1); case 3 C=theta; case 4 C=xi; case 5 C=angle(z1); case 6 C=z1*0+1; end h=surf(X,Y,Z,C); set(h,'EdgeAlpha',0.4) end if (edge>0) plot3(X(:,end),Y(:,end),Z(:,end),'r','LineWidth',2) plot3(X(:,1),Y(:,1),Z(:,1),'r','LineWidth',2) end end end view([2 3 1]) camlight h=camlight('left'); set(h,'Color',[1 1 1]*.5) axis equal axis vis3d axis off 

# Infinite Newton

Apropos Newton’s method in the complex plane, what happens when the degree of the polynomial goes to infinity?

# Towards infinity

Obviously there will be more zeros, so there will be more attractors and we should expect the boundaries of the basins of attraction to become messier. But it is not entirely clear where the action will be, so it would be useful to compress the entire complex plane into a convenient square.

How do you depict the entire complex plane? While I have always liked the Riemann sphere here I tried mapping $x+yi$ to $[\tanh(x),\tanh(y)]$. The origin is unchanged, and infinity becomes the edges of the square $[-1,1]\times [-1,1]$. This is not a conformal map, so things will get squished near the edges.

For color, I used $(1/2)+(1/2)[\tanh(|z-1|-1), \tanh(|z+1|-1), \tanh(|z-i|-1)]$ to map complex coordinates to RGB. This makes the color depend on the distance to 1, -1 and i, making infinity white and zero some drab color (the -1 terms at the end determines the overall color range).

Here is the animated result:

What is going on? As I scale up the size of the leading term from zero, the root created by adding that term moves in from infinity towards the center, making the new basin of attraction grow. This behavior has been described in this post on dancing zeros. The zeros also tend to cluster towards the unit circle, crowding together and distributing themselves evenly. That distribution is the reason for the the colorful “flowers” that surround white spots (poles of the Newton formula, corresponding to zeros of the derivative of the polynomial). The central blob is just the attractor of the most “solid” zero, corresponding to the linear and constant terms of the polynomial.

The jostling is amusing: it looks like the roots do repel each other. This is presumably because close roots require a sharp turn of the function, but the “turning radius” is set by the coefficients that tend to be of order unity. Getting degenerate roots requires coefficients to be in a set of measure zero, so it is rare. Near-degenerate roots exist in a positive measure set surrounding that set, but it is still “small” compared to the general case.

# At infinity

So what happens if we let the degree go to infinity? As I previously mentioned, the generic behaviour of $\sum_{n=1}^\infty a_n z^n$ where $a_n$ is independent random numbers is a lacunary function. So we should not expect anything outside the unit circle. Inside the circle there will be poles, so there will be copies of the undefined outside region (because of Great Picard’s Theorem (meromorphic version)). Since the function is analytic these copies will be conformal mappings of the exterior and hence roughly circular. There will also be zeros, and these will have their own basins of attraction. A few of the central ones dominate, but there is an infinite number of attractors as we approach the circular border which is crammed with poles and zeros.

Since we now know we will only deal with the unit disk, we can avoid transforming the entire plane and enjoy the results:

What happens here is that the white regions represents places where points get mapped onto the undefined outside, while the colored fractal regions are the attraction basins for the zeros. And between them there is a truly wild boundary. In the vanilla $z^3+1$ fractal every point on the boundary is a meeting point of the three basins, a tri-point. Here there is an infinite number of attractors: the boundary consists of points where an infinite number of different attractors meet.

# Newtonmas fractals: conquering the second dimension!

It is Newtonmas, so time to invent some new fractals.

Complex iteration of Newton’s method for finding zeros of a function is a well-known way of getting lovely filigree Julia sets: the basins of attraction of the different zeros have fractal borders.

But what if we looked at real functions? If we use a single function $f(x,y)$ the zeros will typically form a curve in the plane. In order to get discrete zeros we typically need to have two functions to produce a zero set. We can think of it as a map from R2 to R2 $F(x)=[f_1(x_1,x_2), f_2(x_1,x_2)]$ where the x’es are 2D vectors. In this case Newton’s method turns into solving the linear equation system $J(x_n)(x_{n+1}-x_n)=-F(x_n)$ where $J(x_n)$ is the Jacobian matrix ($J_{ij}=dF_i/dx_j$) and $x_n$ now denotes the n’th iterate.

The simplest case of nontrivial dynamics of the method is for third degree polynomials, and we want the x- and y-directions to interact a bit, so a first try is $F=[x^3-x-y, y^3-x-y]$. Below is a plot of the first and second components (red and green), as well as a blue plane for zero values. The zeros of the function are the three points where red, green and blue meet.

We have three zeros, one at $x=y=-\sqrt{2}$, one at $x=y=0$, and one at $x=y=\sqrt{2}.$ The middle one has a region of troublesomely similar function values – the red and green surfaces are tangent there.

The resulting fractal has a decided modernist bent to it, all hyperbolae and swooshes:

The troublesome region shows up, as well as the red and blue regions where iterates run off to large or small values: the three roots are green shades.

# Why is the style modernist?

In complex iterations you typically multiply with complex numbers, and if they have an imaginary component (they better have, to be complex!) that introduces a rotation or twist. Hence baroque filaments are ubiquitous, and you get the typical complex “style”.

But here we are essentially multiplying with a real matrix. For a real 2×2 matrix to be a rotation matrix it has to have a pair of imaginary eigenvalues, and for it to at least twist things around the trace needs to be small enough compared to the determinant so that there are complex eigenvalues: $T^2/4 (where $T=a+d$ and $D=ad-bc$ if the matrix has the usual $[a b; c d]$ form). So if the trace and determinant are randomly chosen, we should expect a majority of cases to be non-rotational.

Moreover, in this particular case, the Jacobian tends to be diagonally dominant (quadratic terms on the diagonal) and that makes the inverse diagonally dominant too: the trace will be big, and hence the chance of seeing rotation goes down even more. The two “knots” where a lot of basins of attraction come together are the points where the trace does vanish, but since the Jacobian is also symmetric there will not be any rotation anyway. Double guarantee.

Can we make a twisty real Newton fractal? If we start with a vanilla rotation matrix and try to find a function that produces it the simplest case is $F=[x \cos(\theta) + y \sin(\theta), x\sin(\theta)+y\cos(\theta)]$. This is of course just a rotation by the angle theta, and it does not have very interesting zeros.

To get something fractal we need more zeros, and a few zeros in the derivatives too (why? because they cause iterates to be thrown away far from where they were, ensuring a complicated structure of the basin boundaries). One attempt is $F=[ (x^3-x-y) \cos(\theta)$ $-(y^3-x-y) \sin(\theta),$ $(x^3-x-y) \sin(\theta)+(y^3-y-x) \cos(\theta) ]$. The result is fun, but still far from baroque:

The problem might be that the twistiness is not changing. So we can make $\theta=x$ to make the dynamics even more complex:

Quite lovely, although still not exactly what I wanted (sounds like a Christmas present).

# Back to the classics?

It might be easier just to hide the complex dynamics in an apparently real function like $F=[x^3-3xy^2-1, 3x^2y-y^3]$ (which produces the archetypal $f(z)=z^3-1$ Newton fractal).

It is interesting to see how much perturbing it causes a modernist shift. If we use $F=[x^3-3xy^2-1 + \epsilon x, 3x^2y-y^3]$, then for $\epsilon=1$ we get:

As we make the function more perturbed, it turns more modernist, undergoing a topological crisis for epsilon between 3.5 and 4:

In the end, we can see that the border between classic baroque complex fractals and the modernist swooshy real fractals is fuzzy. Or, indeed, fractal.

# How much should we spread out across future scenarios?

Robin Hanson mentions that some people take him to task for working on one scenario (WBE) that might not be the most likely future scenario (“standard AI”); he responds by noting that there are perhaps 100 times more people working on standard AI than WBE scenarios, yet the probability of AI is likely not a hundred times higher than WBE. He also notes that there is a tendency for thinkers to clump onto a few popular scenarios or issues. However:

In addition, due to diminishing returns, intellectual attention to future scenarios should probably be spread out more evenly than are probabilities. The first efforts to study each scenario can pick the low hanging fruit to make faster progress. In contrast, after many have worked on a scenario for a while there is less value to be gained from the next marginal effort on that scenario.

This is very similar to my own thinking about research effort. Should we focus on things that are likely to pan out, or explore a lot of possibilities just in case one of the less obvious cases happens? Given that early progress is quick and easy, we can often get a noticeable fraction of whatever utility the topic has by just a quick dip. The effective altruist heuristic of looking at neglected fields also is based on this intuition.

# A model

But under what conditions does this actually work? Here is a simple model:

There are $N$ possible scenarios, one of which ($j$) will come about. They have probability $P_i$. We allocate a unit budget of effort to the scenarios: $\sum a_i = 1$. For the scenario that comes about, we get utility $\sqrt{a_j}$ (diminishing returns).

Here is what happens if we allocate proportional to a power of the scenarios, $a_i \propto P_i^\alpha$. $\alpha=0$ corresponds to even allocation, 1 proportional to the likelihood, >1 to favoring the most likely scenarios. In the following I will run Monte Carlo simulations where the probabilities are randomly generated each instantiation. The outer bluish envelope represents the 95% of the outcomes, the inner ranges from the lower to the upper quartile of the utility gained, and the red line is the expected utility.

This is the $N=2$ case: we have two possible scenarios with probability $p$ and $1-p$ (where $p$ is uniformly distributed in [0,1]). Just allocating evenly gives us $1/\sqrt{2}$ utility on average, but if we put in more effort on the more likely case we will get up to 0.8 utility. As we focus more and more on the likely case there is a corresponding increase in variance, since we may guess wrong and lose out. But 75% of the time we will do better than if we just allocated evenly. Still, allocating nearly everything to the most likely case means that one does lose out on a bit of hedging, so the expected utility declines slowly for large $\alpha$.

The  $N=100$ case (where the probabilities are allocated based on a flat Dirichlet distribution) behaves similarly, but the expected utility is smaller since it is less likely that we will hit the right scenario.

# What is going on?

This doesn’t seem to fit Robin’s or my intuitions at all! The best we can say about uniform allocation is that it doesn’t produce much regret: whatever happens, we will have made some allocation to the possibility. For large N this actually works out better than the directed allocation for a sizable fraction of realizations, but on average we get less utility than betting on the likely choices.

The problem with the model is of course that we actually know the probabilities before making the allocation. In reality, we do not know the likelihood of AI, WBE or alien invasions. We have some information, and we do have priors (like Robin’s view that $P_{AI} < 100 P_{WBE}$), but we are not able to allocate perfectly.  A more plausible model would give us probability estimates instead of the actual probabilities.

# We know nothing

Let us start by looking at the worst possible case: we do not know what the true probabilities are at all. We can draw estimates from the same distribution – it is just that they are uncorrelated with the true situation, so they are just noise.

In this case uniform distribution of effort is optimal. Not only does it avoid regret, it has a higher expected utility than trying to focus on a few scenarios ($\alpha>0$). The larger N is, the less likely it is that we focus on the right scenario since we know nothing. The rationality of ignoring irrelevant information is pretty obvious.

Note that if we have to allocate a minimum effort to each investigated scenario we will be forced to effectively increase our $\alpha$ above 0. The above result gives the somewhat optimistic conclusion that the loss of utility compared to an even spread is rather mild: in the uniform case we have a pretty low amount of effort allocated to the winning scenario, so the low chance of being right in the nonuniform case is being balanced by having a slightly higher effort allocation on the selected scenarios. For high $\alpha$ there is a tail of rare big “wins” when we hit the right scenario that drags the expected utility upwards, even though in most realizations we bet on the wrong case. This is very much the hedgehog predictor story: ocasionally they have analysed the scenario that comes about in great detail and get intensely lauded, despite looking at the wrong things most of the time.

# We know a bit

We can imagine that knowing more should allow us to gradually interpolate between the different results: the more you know, the more you should focus on the likely scenarios.

If we take the mean of the true probabilities with some randomly drawn probabilities (the “half random” case) the curve looks quite similar to the case where we actually know the probabilities: we get a maximum for $\alpha\approx 2$. In fact, we can mix in just a bit ($\beta$) of the true probability and get a fairly good guess where to allocate effort (i.e. we allocate effort as $a_i \propto (\beta P_i + (1-\beta)Q_i)^\alpha$ where $Q_i$ is uncorrelated noise probabilities). The optimal alpha grows roughly linearly with $\beta$, $\alpha_{opt} \approx 4\beta$ in this case.

# We learn

Adding a bit of realism, we can consider a learning process: after allocating some effort $\gamma$ to the different scenarios we get better information about the probabilities, and can now reallocate. A simple model may be that the standard deviation of noise behaves as $1/\sqrt{\tilde{a}_i}$ where $\tilde{a}_i$ is the effort placed in exploring the probability of scenario $i$. So if we begin by allocating uniformly we will have noise at reallocation of the order of $1/\sqrt{\gamma/N}$. We can set $\beta(\gamma)=\sqrt{\gamma/N}/C$, where $C$ is some constant denoting how tough it is to get information. Putting this together with the above result we get $\alpha_{opt}(\gamma)=\sqrt{2\gamma/NC^2}$. After this exploration, now we use the remaining $1-\gamma$ effort to work on the actual scenarios.

This is surprisingly inefficient. The reason is that the expected utility declines as $\sqrt{1-\gamma}$ and the gain is just the utility difference between the uniform case $\alpha=0$ and optimal $\alpha_{opt}$, which we know is pretty small. If C is small (i.e. a small amount of effort is enough to figure out the scenario probabilities) there is an optimal nonzero  $\gamma$. This optimum $\gamma$ decreases as C becomes smaller. If C is large, then the best approach is just to spread efforts evenly.

# Conclusions

So, how should we focus? These results suggest that the key issue is knowing how little we know compared to what can be known, and how much effort it would take to know significantly more.

If there is little more that can be discovered about what scenarios are likely, because our state of knowledge is pretty good, the world is very random,  or improving knowledge about what will happen will be costly, then we should roll with it and distribute effort either among likely scenarios (when we know them) or spread efforts widely (when we are in ignorance).

If we can acquire significant information about the probabilities of scenarios, then we should do it – but not overdo it. If it is very easy to get information we need to just expend some modest effort and then use the rest to flesh out our scenarios. If it is doable but costly, then we may spend a fair bit of our budget on it. But if it is hard, it is better to go directly on the object level scenario analysis as above. We should not expect the improvement to be enormous.

Here I have used a square root diminishing return model. That drives some of the flatness of the optima: had I used a logarithm function things would have been even flatter, while if the returns diminish more mildly the gains of optimal effort allocation would have been more noticeable. Clearly, understanding the diminishing returns, number of alternatives, and cost of learning probabilities better matters for setting your strategy.

In the case of future studies we know the number of scenarios are very large. We know that the returns to forecasting efforts are strongly diminishing for most kinds of forecasts. We know that extra efforts in reducing uncertainty about scenario probabilities in e.g. climate models also have strongly diminishing returns. Together this suggests that Robin is right, and it is rational to stop clustering too hard on favorite scenarios. Insofar we learn something useful from considering scenarios we should explore as many as feasible.

# Uriel’s stacking problem

In Scott Alexander’s kabbalistic sf story Unsong, the archangel Uriel works on a problem while other things are going on in heaven:

All the angels listened in rapt attention except Uriel, who was sort of half-paying attention while trying to balance several twelve-dimensional shapes on top of each other.

There was utter silence throughout the halls of Heaven, except a brief curse as Uriel’s hyperdimensional tower collapsed on itself and he picked up the pieces to try to rebuild it.

A great clamor arose from all the heavenly hosts, save Uriel, who took advantage of the brief lapse to conjure a parchment and pen and start working on a proof about the optimal configuration of twelve-dimensional shapes.

## A polytope on a plane

This got me thinking about the stability of stacking polytopes. That seemed complicated (I am no archangel) so I started toying with the stability of polytopes on a flat surface.

(Terminology note: I will consistently use “face” to denote the D-1 dimensional elements that bound the polytope, although “facet” is in some use.)

A face of a 3D polyhedron is stable if the polyhedron can rest on it without tipping over. This means that the projection of the center of mass onto the plane containing the face is inside the polygon. The platonic polyhedra are stable on all faces, but it is not hard to make a few faces unstable by moving a vertex far away from the center. A polyhedron has at least one stable face (if it did not, it would be a perpetual motion device: every tip will move the center of mass downwards, but there is a bound on how low it can go. A uni-stable or monostatic polyhedron has  just one stable face. It is an unsolved problem what the simplest uni-stable 3D polyhedron is, with the current record 14 faces. Also, it seems unclear whether there are monostatic simplices in dimension 9 (they exist in 10 or more dimensions, but not in 8 or fewer).

So, how many faces of a polytope will typically be unstable?

I wrote a Matlab script to generate random convex polytopes by selecting N points randomly on the surface of a D-dimensional sphere and calculating their convex hull. Using a Delaunay decomposition I can split them into simplices, which allow me to calculate the center of mass. The center of mass of a simplex is just the average of the corners $\vec{x^c_j}=\sum_{i=1}^N \vec{x}_{ij}$, and the center of mass of the polyhedron is just the sum of the simplex centers of mass weighted by their volumes: $\vec{x^p} = \sum_{j=1} V_j \vec{x^c_j}$. The volume of a simplex is $V_j=(1/D!)\mathrm{det}(X_j)$ where $X_j=[x_{1j};x_{2j};\ldots;x_{Dj}]$, the matrix made by sticking together the coordinate vectors of a simplex. Once we know this we can project the center of mass onto the plane of a face by finding its nullspace (the higher dimensional counterpart to a normal) $\vec{p}= \vec{x^p} -(\vec{x^p}\cdot \vec{n})\vec{n}$. Finally, to check whether the projection is inside the face, we can look at the matrix A where each column is the coordinates of one of the faces minus $\vec{p}$ and the final row just ones, and solve for Ax=b where b is zero except for a one in the last row (I found this neat algorithm due to elisbben on stack overflow).  If the answer vector is all positive, then the point is inside the face. Repeat for all the faces.

Whew. This math is of course really simple to do in Matlab.

The 12 dimensional case is a bit messier:

So, what is the average fraction of stable faces on a 3D polyhedron?

It tends to converge to 50%. Doing this in higher dimensions shows the same kind of convergence, although to lower fractions.

It looks like the fraction of stable faces declines exponentially with dimensionality.

Does this mean that for a sufficiently high dimension it is likely that a random polytope is unistable? The answer is no: the number of faces increases pretty exponentially (as $2^{1.7680D}$), but the number of stable faces also increases exponentially with D (as $latex 2^{0.9273 D}$).

This was based on runs with N=100. Obviously things go much faster if you select a lower N, such as 30. However, as you approach N=D the polytopes become more and more simplex-like, and simplices tend to both have fewer faces and be less stable in high dimensions, so the exponential growth stops. This actually happens far below D; for N=30 the effect is felt already in 11 dimensions. The face growth rates were also lower, with coefficients 1.1621 and 0.4730.

(There are some asymptotic formulas known for the growth of the number of faces for random convex hulls; they grow linearly with N but at an accelerating rate with D.)

Stuart Armstrong gave me a very heuristic argument for why there would be so many unstable faces. Consider building up the polytope vertex by vertex, essentially just adding together the simplices from the Delaunay decomposition. If you start from a stable state, eventually you will likely end up with an unstable face. Adding the next vertex will add a simplex to the polyhedron, and the center of mass will move in the direction of the new simplex. To have the face become stable again the shift in center of mass needs to be large enough along the directions parallel to the face to bring the projection back inside the face. But in high dimensional spaces there are many directions you can move in: the probability of a random vector being nearly parallel to another vector is very low. Hence, the next step and the following are likely to preserve the instability. So high dimensional polytopes are likely to have many unstable faces even if they are nicely inscribed in spheres.

The number of steps the polytope rolls over  until finding a stable face is also limited: the “drainage basin” of a stable face is a tree, with a branching degree set by D-1 (if faces are D-simplexes). So the number of steps will scale as $\log_{D-1}(2^{(1.7680 - 0.9273)D})=0.8407 D \ln(2) / \ln(D-1) \propto D/\ln(D)$. Even high-dimensional polytopes will stop flipping quickly in general. (A unistable polytope on the other hand can run through at least half of its faces, so there are some very slow ones too).

The expected minimum distance between two points on this kind of random polytope scales as $N^{-2/D}$ (if they were optimally distributed it would be $N^{-1/D}$). At the same time, if N is relatively small compared to D (the polytope is simplex-like), the average diameter (the longest edge) of each face seems to approach $\sqrt{\pi}$. Why? I think this is because  $\Gamma(1/2)$, the mean of a flipped k=2 Weibull distribution that shows up because of extreme value theory. Meanwhile the average and median cord length between random points on hyperspheres tends towards $\sqrt{2}$. Faces hence tends to be fairly wide unless N is large compared to D, but there will typically always be a few very narrow ones that are tricky to balance on.

## Stacking no-slip polytopes

What about stacking polytopes?

If you put a polytope on top of another one (assuming no slipping) at first it seems you need to use a stable face of the top polytope, but this is not enough nor necessary.

Since the underlying face is likely tilted from the horizontal, the vertical projection of the center of mass has to be within the top face. The upper polytope can be rotated, moving the projection point. The tilt angle $\theta$ (or rather, tilt angles – we are doing this in higher dimensions, remember?) generates a hypersphere of radius $d \tan(\theta)$ around the normal projection point (which is at distance d from the center of mass) where the vertical projection can intersect the face. Only parts of the hypersphere surface that are inside the face represent orientations that are stable. Even an unstable face can (sometimes) be stabilized if you turn it so that the tilted projection is inside, but for sufficiently high angles the hypersphere will be bigger than the face and it cannot be stable.

Having the top polytope stay in place is the first requirement. The second is that the bottom polytope should not become unstable. The new center of mass is moved to a point somewhere along the connecting line between the individual centers of mass of the polytopes, with exact position dependent on their volume ratio (note that turning the top polytope can move the center of mass too). This moves the projection point along the plane of the bottom face, and if it gets outside that face the assembly will tip over.

One can imagine this as adding random (D-1)-dimensional vectors of length 1/N until they reach the edge of the face. I am a bit uncertain about the properties of such random walks (all works on decreasing step size walks I have seen have been in 1D). The harmonic random walk in 1D apparently converges with probability 1, so I think the (D-1)-dimensional one also does it since the distance from the origin to the walker will be smaller than if the walker just kept to a 1D line. Since the expected distance traversed in 1D is $latex E[|X|] \approx 1.0761$ this is actually not a very extreme  shift. Given the surprisingly large diameters of the faces if $N \ll D$ the first condition might be tougher to meet than the second, but this is just a guess.

The no slipping constraint is important. If the polytopes are frictionless, then any transverse force will move them. Hence only polytopes that have some parallel top and bottom stable faces can be stacked, and the problem becomes simpler. There are still surprises there, though: even stacks of rectangular blocks can do surprising things. The block stacking problem also demonstrates that one can have 1/N overhangs (counting downwards), enabling arbitrarily large total overhangs without tipping over. With polytopes with shapes that act as counterweights the overhangs can be even larger.

## Uriel’s stacking problems

This leads to what we might call “Uriel’s stacking problem”: given a collection of no-slip convex D-dimensional polytopes, what is the tallest tower that can be constructed from them?

I suspect that this problem is NP-hard. It sounds very much like a knapsack problem, but there is a dependency on previous steps when you add a new polytope that seem to make it harder. It seems that it would not be too difficult to fool a greedy algorithm just trying to put the next polytope on the most topmost face into adding one that makes subsequent steps too unstable, forcing backtracking.

Another related problem: if the polytopes are random convex hulls of N points, what is the distribution of maximum tower heights? What if we just try random stacking?

And finally, what is the maximum overhang that can be done by stacking polytopes from a given set?

# Simple instability

As a side effect of a chat about dynamical systems models of metabolic syndrome, I came up with the following nice little toy model showing two kinds of instability: instability because of insufficient dampening, and instability because of too slow dampening.

$x'(t) = Ax(t)/N - px(t-\tau) -x(t)^3$

Where $x$ is a N-dimensional vector, A is a $N \times N$ matrix with Gaussian random numbers, and $p \geq 0, \tau \geq 0$ constants. The last term should strictly speaking be written as $||x(t)||^2 x(t)$ but I am lazy.

The first term causes chaos, as we will see below. The 1/N factor is just to compensate for the N terms. The middle term represents dampening trying to force the system to the origin, but acting with a delay $\tau$. The final term keeps the dynamics bounded: as $||x||$ becomes large this term will dominate and bring back the trajectory to the vicinity of the origin. However, it is a soft spring that has little effect close to the origin.

## Chaos

Let us consider the obvious fixed point $x=0$. Is it stable? If we calculate the Jacobian matrix there it becomes $J = A/N - pI$. First, consider the case where $p=0$. The eigenvalues of J will be the ones of a random Gaussian matrix with no symmetry conditions. If it had been symmetric, then Wigner’s semicircle rule implies that they would tend to be distributed as $P(\lambda)=(2/\pi)\sqrt{1-\lambda^2}$ as $N \rightarrow \infty$. However, it turns out that this is true for the non-symmetric Gaussian case too. (and might be true for any i.i.d. random numbers). This means that about half of them will have a positive real part, and that implies that the fixed point is unstable: for $p=0$ the system will be orbiting the origin in some fashion, and generically this means a chaotic attractor.

## Stability

If $p$ grows the diagonal elements of J will become more and more negative. If they are really negative then we essentially have a matrix with a negative diagonal and some tiny off-diagonal terms: the eigenvalues will almost be the diagonal ones, and they are all negative. The origin is a stable attractive fixed point in this limit.

In between, if we plot the eigenvalues as a function of $p$, we see that the semicircle just linearly moves towards the negative side and when all of it passes over, we shift from the chaotic dynamics to the fixed point. Exactly when this happens depends on the particular A we are looking at and its largest eigenvalue (which is distributed as the Tracy-Widom distribution), but it is generally pretty sharp for large N.

## Delay

But what if $\tau$ becomes large? In this case the force moving the trajectory towards the origin will no longer be based on where it is right now, but on where it was $\tau$ seconds earlier. If $p$ is small, then this is just minor noise/bias (and the dynamics is chaotic anyway). If it is large, then the trajectory will be pushed in some essentially random direction: we get instability again.

A (very slightly) more stringent way of thinking of it is to plug in $x_j(t)=c_j e^{i\lambda_j t}$ into the equation. To simplify, let’s throw away the cubic term since we want to look at behavior close to zero, and let’s use a coordinate system where the matrix is a diagonal matrix $\Lambda$. Then for $p=0$ we get $\lambda_j = \Lambda_j$, that is, the origin is a fixed point that repels or attracts trajectories depending on its eigenvalues (and we know from above that we can be pretty confident some are positive, so it is unstable overall). For $p>0$ we get $\lambda_j + pe^{-i\lambda_j \tau} = \Lambda_j$. Taylor expansion to the first order and rearranging gives us $\lambda_j \approx(\Lambda_j - p)/(1 - i p \tau)$. The  numerator means that as $p$ grows, each eigenvalue will eventually get a negative real part: that particular direction of dynamics becomes stable and attracted to the origin. But the denominator can sabotage this: it $p \tau$ gets large enough it can move the eigenvalue anywhere, causing instability.

So there you are: if you try to keep a system stable, make sure the force used is up to the task so the inherent recalcitrance cannot overwhelm it, and make sure the direction actually corresponds to the current state of the system.