Gamma function fractals

Another of my favourite functions if the Gamma function, \Gamma(z)=\int_0^\infty t^{z-1}e^{-t} dt, the continuous generalization of the factorial. While it grows rapidly for positive reals, it has fun poles for the negative integers and is generally complex. What happens when you iterate it?

First I started by just applying it to different starting points, z_{n+1} = \Gamma(z_n). The result is a nice fractal, with some domains approaching 1, and others running off to infinity.
Gamma function Julia set
Here I color points that go to infinity in green shades on the number of iterations before they become very large, and the points approaching 1 by |z_{30}-1|. Zooming in a bit more reveals neat self-similar patterns with alternating “beans”:
Gamma function Julia set detailGamma function Julia set zoom
In the outside regions we have thin tendrils stretching towards infinity. These are familiar to anybody who has been iterating exponentials or trigonometric functions: the combination of oscillation and (super)exponential growth leads to the pattern.

OK,that was a Julia set (different starting points, same formula). What about a counterpart to the Mandelbrot set? I looked at z_{n+1}=\Gamma(cz_n) where c is the control parameter. I start with z_0=1 and iterate:
Gamma function Mandelbrot set

Zooming in shows the same kind of motif copies of Julia sets as we see in the quadratic Mandelbrot set:
In fact, zooming in as above in the counterpart to the “seahorse valley” shows a remarkable similarity.

Shakespearian numbers

Number lineDuring a recent party I got asked the question “Since \pi has an infinite decimal expansion, does that mean the collected works of Shakespeare (suitably encoded) are in it somewhere?”

My first response was to point out that infinite decimal expressions are not enough: obviously 1/3=0.33333\ldots is a Shakespeare-free number (unless we have a bizarre encoding of the works in the form of all threes). What really matters is whether the number is suitably random. In mathematics this is known as the question about whether pi is a normal number.

If it is normal, then by the infinite monkey theorem then Shakespeare will almost surely be in the number. We actually do not know whether pi is normal, but it looks fairly likely. But that is not enough for a mathematician. A good overview of the problem can be found in a popular article by Bailey and Borwein. (Yep, one of the Borweins)

Where are the Shakespearian numbers?

This led to a second issue: what is the distribution of the Shakespeare-containing numbers?

We can encode Shakespeare in many ways. As an ASCII text the works take up 5.3 MB. One can treat this as a sequence of 7-bit characters and the works as  37,100,000 bits, or 11,168,212 decimal digits. A simple code where each pair of digits encode a character would encode 10,600,000 digits. This allows just a 100 character alphabet rather than a 127 character alphabet, but is likely OK for Shakespeare: we can use the ASCII code minus 32, for example.

If we denote the encoded works of Shakespeare by [Shakespeare], all numbers of the form 0.[Shakespeare]xxxxx\ldots are Shakespeare-containing.

They form a rather tiny interval: since the works start with ‘The’, [Shakespeare] starts as “527269…” and the interval lies inside the interval [0. 527269000\ldots , 0.52727], a mere millionth of [0,1]. The actual interval is even shorter.

But outside that interval there are numbers of the form 0.y[Shakespeare]xxxx\ldots , where y is a digit different from the starting digit of [Shakespeare] and x anything else. So there are 9 such second level intervals, each ten times thinner than the first level interval.

This pattern continues, with the intervals at each level ten times thinner but also 9 times as numerous. This is fairly similar to the Cantor set and gives rise to a fractal. But since the intervals are very tiny it is hard to see.

One way of visualizing this is to assume the weird encoding [Shakespeare]=3, so all numbers containing the digit 3 in the decimal expansion are Shakespearian and the rest are Shakespeare-free.

Distribution of Shakespeare-free numbers in the unit interval, assuming Shakespeare's collected works are encoded as the digit "3".
Distribution of Shakespeare-free numbers in the unit interval, assuming Shakespeare’s collected works are encoded as the digit “3”.

The fractal dimension of this Shakespeare-free set is \log(9)/\log(10)\approx 0.9542. This is less than 1: most points are Shakespearian and in one of the intervals, but since they are thin compared to the line the Shakespeare-free set is nearly one dimensional. Like the Cantor set, each Shakespeare-free number is isolated from any other Shakespeare-free number: there is always some Shakespearian numbers between them.

In the case of the full 5.3MB [Shakespeare] the interval length is around 10^{-10,600,000}. The fractal dimension of the Shakespeare-free set is \log(10^{10,600,000} - 1)/\log(10^{10,600,600}) \approx 1-\epsilon, for some tiny \epsilon \approx 10^{-10,600,000}.  It is very nearly an unbroken line… except for that nearly every point actually does contain Shakespeare.

We have been looking at the unit interval. We can of course look at the entire real line too, but the pattern is similar: just magnify the unit interval pattern by 10, 100, 1000, … times. Somewhere around  $10^{10,600,000}$ there are the numbers that have an integer part equal to [Shakespeare]. And above them are the intervals that start with his works followed by something else, a decimal point and then any decimals. And beyond them there are the [Shakespeare][Shakespeare]xxx\ldots numbers…

Shakespeare is common

One way of seeing that Shakespearian numbers are the generic case is to imagine choosing a number randomly. It has probability S of being in the level 1 interval of Shakespearian numbers. If not, then it will be in one of the 9 intervals 1/10 long that don’t start with the correct first digit, where the probability of starting with Shakespeare in the second digit is S. If that was all there was, the total probability would be S+(9/10)S+(9/10^2)S+\ldots = 10S<1. But the 1/10 interval around the first Shakespearian interval also counts: a number that has the right first digit but wrong second digit can still be Shakespearian. So it will add probability.

Another way of thinking about it is just to look at the initial digits: the probability of starting with [Shakespeare] is S, the probability of starting with [Shakespeare] in position 2 is (1-S)S (the first factor is the probability of not having Shakespeare first), and so on. So the total probability of finding Shakespeare is S + (1-S)S + (1-S)^2S + (1-S)^3S + \ldots = S/(1-(1-S))=1. So nearly all numbers are Shakespearian.

This might seem strange, since any number you are likely to mention is very likely Shakespeare-free. But this is just like the case of transcendental, normal or uncomputable numbers: they are actually the generic case in the reals, but most everyday numbers belong to the algebraic, non-normal and computable numbers.

It is also worth remembering that while all normal numbers are (almost surely) Shakespearian, there are non-normal Shakespearian numbers. For example, the fractional number 0.[Shakespeare]000\ldots is non-normal but Shakespearian. So is 0.[Shakespeare][Shakespeare][Shakespeare]\ldots We can throw in arbitrary finite sequences of digits between the Shakespeares, biasing numbers as close or far as we want from normality. There is a number 0.[Shakespeare]3141592\ldots that has the digits of \pi plus Shakespeare. And there is a number that looks like \pi until Graham’s number digits, then has a single Shakespeare and then continues. Shakespeare can hide anywhere.

In things of great receipt with case we prove,
Among a number one is reckoned none.
Then in the number let me pass untold,
Though in thy store’s account I one must be
-Sonnet 136

My Newtonmass Fractal


I like the hyperbolic tangent function. It is useful for making sigmoid curves for neurons and fitting growth rates, it enables a cute minimal surface. So of course it should be iterated to make fractals! And there is no better way to celebrate Newtonmass than to make fractals!

As iteration formula I choose z_{n+1} = f(z_n) = \tanh(cz_n) , where c is a multiplicative constant. Iterating some number like 1 and plotting its fate produces the following “Mandelbrot set” in the c-plane – the colours here do not denote the time until escape to infinity but rather where in the complex plane the point ended up, as a function of c. In a normal Mandelbrot set infinity is an attractive fixed point; here it is just one place in the (extended) complex plane like any other.

"Mandelbrot set" for the hyperbolic tanh function tanh(cz).
“Mandelbrot set” for the hyperbolic tanh function tanh(cz).

The pinkish surroundings of the pattern represent points attracted to the positive solution of z=\tanh(cz). There is of course a corresponding negative solution since tanh is antisymmetric: if z is an attractive fixed point or cycle, so is -z. So the dynamics is always bistable.

Incidentally, the color scheme is achieved by doing a stereographic projection of the complex plane onto a sphere, which is then fitted into the RBG cube. Infinity corresponds to (0.5,0.5,1) and zero to (0.5,0.5,0) – the brownish middle of the Mandelbrot set, where points are attracted towards zero for small c.


Sphere used to stereographically map complex numbers to colors.
Sphere used to stereographically map complex numbers to colors.

Another property of tanh is that the function has singularities wherever z=\pm \pi n i / 2 c for integer n>0. Since Great Picard’s Theorem, that means that in the vicinity of those points it takes on nearly all other values in the complex plane. So whatever the pattern of the corresponding Julia set is, it will repeat itself near there (including images of the image, and so on).This means that despite most z points being attracted towards zero for c-values inside the unit circle, there will be a complex stitching of undefined points since they will be mapped to infinity, or are preimages of points that get mapped there.

Zoom into the tanh Mandelbrot set, showing chaotic regions with interspersed periodic regions.
Zoom into the tanh Mandelbrot set, showing chaotic regions with interspersed periodic regions.

Zooming into the messy regions shows that they are full of circle-cusp areas where there is a periodic attractor cycle. Between them are the regions where most of the z-plane where the Julia sets live is just pure chaos. Thanks to various classic theorems in the theory of complex iteration we know that if the Julia set has non-empty interior it is the entire complex plane.

Walking around the outside edge of the boring brown circle gives a fun sequence of patterns. At c=1 there are two real fixed points and a straight line border along the imaginary axis. This line of course contains the singularity points where things get sent to infinity, and near them the preimages of all the other singularities on the line: dramatic, but visually uninteresting.

Tanh 'Julia set' for c=1.
Tanh ‘Julia set’ for c=1.

As we move along the circle towards more imaginary c, there is a twisting of the border since each multiplication by c corresponds to a twist: it is now a fractal spiral covered by little spirals. As the twisting gets stronger, the spirals get bigger and wilder (especially when we are very close to the unit circle, where the dynamics has a lot of intermittency: the iterates almost but not quite gets stuck close to certain points, speed away, and then return to make rather elliptic spirals).

Tanh 'Julia set' for c=1.1*exp(0.23*i).
Tanh ‘Julia set’ for c=1.1*exp(0.23*i).
Tanh 'Julia set' for c=1.1*exp(0.5*i).
Tanh ‘Julia set’ for c=1.1*exp(0.5*i).
Tanh 'Julia set' for c=1.1*exp(0.55*i).
Tanh ‘Julia set’ for c=1.1*exp(0.55*i).

When we advance towards a cuspy border in the c-plane we see the spirals unfold into long twisty tentacles just before touching, turning into borders between chains of periodic domains.

Tanh 'Julia set' for c=1.1*exp(0.6*i).
Tanh ‘Julia set’ for c=1.1*exp(0.6*i).

But then the periodic domains start to snake out, filling the plane wildly.

Tanh 'Julia set' for c=1.1*exp(0.6594*i).
Tanh ‘Julia set’ for c=1.1*exp(0.6594*i).

until we get a plane-filling, ergodic Julia set with no discernible structure. For some c-values there are complex tesselations of basins of attraction, and quite often some places are close enough to weakly repelling fixed points to produce small circular false basins of attraction where divergence is slow.


Tanh 'Julia set' for c=1.1*exp(0.66*i).
Tanh ‘Julia set’ for c=1.1*exp(0.66*i).

One way of visualizing this is to make a bifurcation diagram like we do for real iteration. Following a curve r e^{i\theta} we plot where iterates end up projected along some line (for example their real or imaginary part, or some combination). To make structure stand out a bit more I decided to color points after where in the whole plane they are, producing a colorful diagram for r=1.1:


(I have some others on Flickr for the imaginary axis, r=1.25 and r=1.5).

Another, more fun way is to turn them into animated gifs. Since Flickr doesn’t handle them well, I have stored them locally instead:

  • Growth of the Mandelbrot set – shows the behaviour of test iterates in the c-plane near the edge. Note the intermittent spirals.
  • Unit circle – following the unit circle.
  • Tanh 1.0 – the same as above, but inverted coordinates: z=\infty is at the center, zero outside the borders.
  • Tanh 1.1 – r=1.1.
  • Tanh 1.5 – r=1.5.
  • Tanh 2.5 – r=2.5.
  • Tanh 5.0 – r=5.0. Rather sedate except for a brief window near \theta=\pi/2.

Note how spirals unfold until they touch each other, forming periodic domains or exploding across the entire plane, making a chaotic full-plane attractor… which often blinks into complex patterns of periodic domains only to return to chaos.



Thanks for the razor, Bill!

Thermocouple, Piotr KowalskiI like the idea of a thanksgiving day, leaving out all the Americana turkeys, problematic immigrant-native relations and family logistics: just the moment to consider what really matters to you and why life is good. And giving thanks for intellectual achievements and tools makes eminent sense: This thanksgiving Sean Carroll gave thanks for the Fourier transform.

Inspired by this, I want to give thanks for Occam’s razor.

These days a razor in philosophy denotes a rule of thumb that allows one to eliminate something unnecessary or unlikely. Occam’s was the first: William of Ockham (ca. 1285-1349) stated “Pluralitas non est ponenda sine neccesitate” (“plurality should not be posited without necessity.”) Today we usually phrase it as “the simplest theory that fits is best”.

Principles of parsimony have been suggested for a long time; Aristotle had one, so did Maimonides and various other medieval thinkers. But let’s give Bill from Ockham the name in the spirit of Stigler’s law of eponymy.

Of course, it is not always easy to use. Is the many worlds interpretation of quantum mechanics possible to shave away? It posits an infinite number of worlds that we cannot interact with… except that it does so by taking the quantum mechanics formalism seriously (each possible world is assigned a probability) and not adding extra things like wavefunction collapse or pilot waves. In many ways it is conceptually simpler: just because there are a lot of worlds doesn’t mean they are wildly different. Somebody claiming there is a spirit world is doubling the amount of stuff in the universe, but that there is a lot of ordinary worlds is not too different from the existence of a lot of planets.

Simplicity is actually quite complicated. One can argue about which theory has the fewest and most concise basic principles, but also the number of kinds of entities postulated by the theory. Not to mention why one should go for parsimony at all.

In my circles, we like to think of the principle in terms of Bayesian statistics and computational complexity. The more complex a theory is, the better it can typically fit known data – but it will also generalize worse to new data, since it overfits the first set of data points. Parsimonious theories have fewer degrees of freedom, so they cannot fit as well as complex theories, but they are less sensitive to noise and generalize better. One can operationalize the optimal balance using various statistical information criteria (AIC = minimize the information lost when fitting, BIC = maximize likeliehood of the model). And Solomonoff gave a version of the razor in theoretical computer science: for computable sequences of bits there exists a unique (up to choice of Turing machine) prior that promotes sequences generated by simple programs and has awesome powers of inference.

But in day-to-day life Occam works well, especially with a maximum probability principle (you are more likely to see likely things than unlikely; if you see hoofprints in the UK, think horses not zebras). A surprising number of people fall for the salient stories inherent in unlikely scenarios and then choose to ignore Occam (just think of conspiracy theories). If the losses from low-probability risks are great enough one should rationally focus on them, but then one must check one’s priors for such risks. Starting out with a possibilistic view that anything is possible (and hence have roughly equal chance) means that one becomes paranoid or frozen with indecision. Occam tells you to look for the simple, robust ways of reasoning about the world. When they turn out to be wrong, shift gears and come up with the next simplest thing.

Simplicity might sometimes be elegant, but that is not why we should choose it. To me it is the robustness that matters: given our biased, flawed thought processes and our limited and noisy data, we should not build too elaborate castles on those foundations.


Anthropic negatives

Inverted cumulusStuart Armstrong has come up with another twist on the anthropic shadow phenomenon. If existential risk needs two kinds of disasters to coincide in order to kill everybody, then observers will notice the disaster types to be anticorrelated.

The minimal example would be if each risk had 50% independent chance of happening: then the observable correlation coefficient would be -0.5 (not -1, since there is 1/3 chance to get neither risk; the possible outcomes are: no event, risk A, and risk B). If the probability of no disaster happening is N/(N+2) and the risks are equal 1/(N+2), then the correlation will be -1/(N+1).

I tried a slightly more elaborate model. Assume X and Y to be independent power-law distributed disasters (say war and pestillence outbreaks), and that if X+Y is larger than seven billion no observers will remain to see the outcome. If we ramp up their size (by multiplying X and Y with some constant) we get the following behaviour (for alpha=3):

(Top) correlation between observed power-law distributed independent variables multiplied by an increasing multiplier, where observation is contingent on their sum being smaller than 7 billion. Each point corresponds to 100,000 trials. (Bottom) Fraction of trials where observers were wiped out.
(Top) correlation between observed power-law distributed independent variables multiplied by an increasing multiplier, where observation is contingent on their sum being smaller than 7 billion. Each point corresponds to 100,000 trials. (Bottom) Fraction of trials where observers were wiped out.

As the situation gets more deadly the correlation becomes more negative. This also happens when allowing the exponent run from the very fat (alpha=1) to the thinner (alpha=3):

(top) Correlation between observed independent power-law distributed variables  (where observability requires their sum to be smaller than seven billion) for different exponents. (Bottom) fraction of trials ending in existential disaster. Multiplier=500 million.
(top) Correlation between observed independent power-law distributed variables (where observability requires their sum to be smaller than seven billion) for different exponents. (Bottom) fraction of trials ending in existential disaster. Multiplier=500 million.

The same thing also happens if we multiply X and Y.

I like the phenomenon: it gives us a way to look for anthropic effects by looking for suspicious anticorrelations. In particular, for the same variable the correlation ought to shift from near zero for small cases to negative for large cases. One prediction might be that periods of high superpower tension would be anticorrelated with mishaps in the nuclear weapon control systems. Of course, getting the data might be another matter. We might start by looking at extant companies with multiple risk factors like insurance companies and see if capital risk becomes anticorrelated with insurance risk at the high end.

Rational fractal distributions

Up among the peaksMost of the time we encounter probability distributions over the reals, the positive reals, or integers. But one can use the rational numbers as a probability space too.

Recently I found the paper Vladimir Trifonov, Laura Pasqualucci, Riccardo Dalla-Favera & Raul Rabadan. Fractal-like Distributions over the Rational Numbers in High-throughput Biological and Clinical Data. Scientific Reports 1:191 DOI: 10.1038/srep00191. They discuss the distribution of ratios of the number of reads from the same spot of DNA that come from each chromosome in a pair: the number of reads is an integer, so the ratio is rational. They get a peaky, self-similar distribution empirically, and the paper explains why. 

If you take positive independent integers from some distribution f(n) and generate ratios q=a/(a+b), then those ratios will have a distribution that is a convolution over the rational numbers: g(q) = g(a/(a+b)) = \sum_{m=0}^\infty \sum_{n=0}^\infty f(m) g(n) \delta \left(\frac{a}{a+b} - \frac{m}{m+n} \right ) = \sum_{t=0}^\infty f(ta)f(tb)

One can of course do the same for non-independent and different distributions of the integers. Oh, and by the way: this whole thing has little to do with ratio distributions (alias slash distributions), which is what happens in the real case.

The authors found closed form solutions for integers distributed as a power-law with an exponential cut-off and for the uniform distribution; unfortunately the really interesting case, the Poisson distribution, doesn’t seem to have a neat closed form solution.

In the case of a uniform distributions on the set \{1,2,\ldots , L\} they get g(a/(a+b)) = (1/L^2) \lfloor L/\max(a,b) \rfloor .

The rational distribution g(a/(a+b))=1/max(a,b) of Trifonov et al.
The rational distribution g(a/(a+b))=1/max(a,b) of Trifonov et al.

They note that this is similar to Thomae’s function, a somewhat well-known (and multiply named) counterexample in real analysis. That function is defined as f(p/q)=1/q (where the fraction is in lowest terms). In fact, both graphs have the same fractal dimension of 1.5.

It is easy to generate other rational distributions this way. Using a power law as an input produces a sparser pattern, since the integers going into the ratio tend to be small numbers, putting more probability at simple ratios:

The rational distribution g(a/(a+b))=C(ab)^-2 of Trifonov et al.
The rational distribution g(a/(a+b))=C(ab)^-2 (rational convolution of two index -2 power-law distributed integers).

If we use exponential distributions the pattern is fairly similar, but we can of course change the exponent to get something that ranges over a lot of numbers, putting more probability at nonsimple ratios p/q where p+q \gg 1:

The rational distribution of two convolved Exp[0.1] distributions.
The rational distribution of two convolved Exp[0.1] distributions.
Not everything has to be neat and symmetric. Taking the ratio of two unequal Poisson distributions can produce a rather appealing pattern:

Rational distribution of ratio between a Poisson[10] and a Poisson[5] variable.
Rational distribution of ratio between a Poisson[10] and a Poisson[5] variable.
Of course, full generality would include ratios of non-positive numbers. Taking ratios of normal variates rounded to the nearest integer produces a fairly sparse distribution since high numerators or denominators are rare.

Rational distribution of normal variates rounded to nearest integer.
Rational distribution of a/(a+b) ratios of normal variates rounded to nearest integer.

But multiplying the variates by 10 produces a nice distribution.

Rational distribution of ratios of normal variates multiplied by 10 and rounded.
Rational distribution of a/(a+b) ratios of normal variates that have been multiplied by 10 and rounded.

This approaches the Chauchy distribution as the discretisation gets finer. But note the fun microstructure (very visible in the Poisson case above too), where each peak at a simple ratio is surrounded by a “moat” of low probability. This is reminiscent of the behaviour of roots of random polynomials with integer coefficients (see also John Baez page on the topic).

The rational numbers do tend to induce a fractal recursive structure on things, since most measures on them will tend to put more mass at simple ratios than at complex ratios, but when plotting the value of the ratio everything gets neatly folded together. The lower approximability of numbers near the simple ratios produce moats. Which also suggests a question to ponder further: what role does the über-unapproximable golden ratio have in distributions like these?

Mathematical anti-beauty

Browsing Mindfuck Math I came across a humorous Venn diagram originally from It got me to look up the Borwein integral. Wow. A kind of mathematical anti-beauty.

“As we all know”, sinc(x)=sin(x)/x for x\neq 0 and defined to be 1 for x=0. It is not that uncommon as a function. Now look at the following series of integrals:

\int_0^{\infty} sinc(x) dx = \pi/2 ,

\int_0^{\infty} sinc(x) sinc(x/3) dx = \pi/2 ,

\int_0^\infty sinc(x) sinc(x/3) sinc(x/5) dx = \pi/2 .

The pattern continues:

\int_0^\infty sinc(x) sinc(x/3) sinc(x/5) \cdots sinc(x/13) dx = \pi/2 .

And then…

\int_0^\infty sinc(x) sinc(x/3) sinc(x/5) \cdots sinc(x/13) sinc(x/15) dx

It is around 0.499999999992646 \pi – nearly a half, but not quite.

What is going on here? Borwein & Borwein give proofs, but they are not entirely transparent. Basically the reason is that 1/3+1/5+…1/13 < 1, while 1/3+1/5+…1/13 + 1/15 > 1, but why this changes things is clear as mud. Thankfully Hanspeter Schmid has a very intuitive explanation that makes what is going on possible to visualize. At least if you like convolutions.

In any case, there is something simultaneously ugly and exciting when neat patterns in math just ends for no apparent reason.

Another good example is the story of the Doomsday conjecture. Gwern tells the story well, based on Klarreich: a certain kind of object is found in dimension 2, 6, 14, 30 and 62… aha! They are conjectured to occur in all  2^n-2  dimensions. A branch of math was built on this conjecture… and then the pattern failed in dimension 254. Oops. 

It is a bit like the opposite case of the number of regular convex polytopes in different dimensions: 1, infinity, 5, 6, 3, 3, 3, 3… Here the series start out crazy, and then becomes very regular.

Another dimensional “imperfection” is the behaviour of the volume of the n-sphere: V_n(r)=\frac{\pi^{n/2}r^n}{\Gamma(1+n/2)}

Volume of unit hyperspheres as a function of dimension

The volume of a unit sphere increases with dimension until n \approx 5.25 , and then decreases. Leaving the non-intuitiveness of why volumes would shrink aside, the real oddness is that the maximum is for a non-integer dimension. We might argue that the formula is needlessly general and only the integer values count, but many derivations naturally bring in the Gamma function and hence the possibility of non-integer values.

Another association is to this integral problem: given a set of integers x_i, is the integral \int_0^\pi \prod_i \sin(x_i \theta) d\theta = 0 ? As shown in Moore and Mertens, this is NP-complete. Here the strangeness is that integrals normally are pretty well behaved. It seems absurd that a particular not very scary trigonometric integral should require exponential work to analyze. But in fact, multivariate integrals are NP-hard to approximate, and calculating the volume of a n-dimensional polytope is actually #P-complete.

We tend to assume that mathematics is smoother and more regular than reality. Everything is regular and exceptionless because it is generated by universal rules… except when it isn’t. The rules often act as constraints, and when they do not mesh exactly odd things happen. Similarly we may assume that we know what problems are hard or not, but this is an intuition built in our own world rather than the world of mathematics. Finally, some mathematical truths maybe just are. As Gregory Chaitin has argued, some things in math are irreducible; there is no real reason (at least in the sense of a comprehensive explanation) for why they are true.

Mathematical anti-beauty can be very deep. Maybe it is like the insects, rot and other memento mori in classical still life paintings: a deviation from pleasantness and harmony that adds poignancy and a bit of drama. Or perhaps more accurately, it is wabi-sabi.