Gamma function surfaces

The gamma function has a long and interesting history (check out (Davis 1963) excellent review), but one application does not seem to have shown up: minimal surfaces.

A minimal surface is one where the average curvature is always zero; it bends equally in two opposite directions. This is equivalent to having the (locally) minimal area given its boundary: such surfaces are commonly seen as soap films stretched from frames. There exists a rich theory for them, linking them to complex analysis through the Enneper-Weierstrass representation: if you have a meromorphic function g and an analytic function f such that $fg^2$ is holomorphic, then

$X(z)=\Re\left(\int_{z_0}^z f(1-g^2)/2 dz\right)$
$Y(z)=\Re\left(\int_{z_0}^z if(1+g^2)/2 dz\right)$
$Z(z)=\Re\left(\int_{z_0}^z fg dz\right)$

produces a minimal surface $(X(z),Y(z),Z(z))$ .

When plugging in the hyperbolic tangent as g and using f=1 I got a new and rather nifty surface a few years back. What about plugging in the gamma function? Let $f=1, g=\Gamma(z)$ .

We integrate from the regular point $z_0=1$ to different points $z$ in the complex plane. Let us start with the simple case of $\Re(z)>1/2$ .

Gamma function minimal surface for z in 0.5<Re(z)<3.5, -8<Im(z)<8. Color denotes Re(z). — Gamma function minimal surface for z in 0.5<Re(z)<3.5, -8<Im(z)

The surface is a billowing strip, and as we include z with larger and larger real parts the amplitude of the oscillations grow rapidly, making it self-intersect. The behaviour is somewhat similar to the Catalan minimal surface, except that we only get one period. If we go to larger imaginary parts the surface approaches a horizontal plane. OK, the surface is a plane with some wild waves, right?

Not so fast, we have not looked at the mess for Re(z)<0. First, let’s examine the area around the z=0 singularity. Since the values of the integrand blows up close to it, they produce a surface expanding towards infinity – very similar to a catenoid. Indeed, catenoid ends tend to show up where there are poles. But this one doesn’t close exactly: for re(z)<0 there is some overshoot producing a self-intersecting plane-like strip.

Gamma function minimal surface close to the z=0 singularity. Colour denotes Re(z). Integration contours from 1 to z run clockwise for Im(z)0. — Gamma function minimal surface close to the z=0 singularity. Colour denotes Re(z). Integration contours from 1 to z run clockwise for Im(z)<0 and counterclockwise for Im(z)>0.

The problem is of course the singularity: when integrating in the complex plane we need to avoid them, and depending on the direction we go around them we can get a complex phase that gives us an entirely different value of the function. In this case the branch cut corresponds to the real line: integrating clockwise or counter-clockwise around z=0 to the same z gives different values. In fact, a clockwise turn adds [3.6268i, 3.6268, 6.2832i] (which looks like $\gamma\pi$ – a rather neat residue!) to the coordinates: a translation in the positive y-direction. If we extend the surface by going an extra turn clockwise or counterclockwise a number of times, we get copies that attach seamlessly.

Gamma minimal surface extended by integration paths between the -1 and 0 singularities (blue patches).

Gamma minimal surface patch that can be repeated by translation along the y-axis. Colour denotes Re(z).

OK, we have a surface with some planar strips that turn wobbly and self-intersecting in the x-direction, with elliptic catenoid ends repeating along the y-direction due to the z=0 singularity. Going down the negative x-direction things look plane between the catenoids… except of course for the catenoids due to all the other singularities for $z=-1,-2,\ldots$ . They also introduce residues along the y-direction, but different ones from the z=0 – their extensions of the surface will be out of phase with each other, making the fully extended surface fantastically self-intersecting and confusing.

Gamma function minimal surface extended by integrating around poles.

So, I think we have a simple answer to why the gamma function minimal surface is not well known: it is simply too messy and self-intersecting.

Of course, there may be related nifty surfaces. $1/\Gamma(z)$ is nicely behaved and looks very much like the Enneper surface near zero, with “wings” that oscillate ever more wildly as we move towards the negative reals. No doubt there are other beautiful things to look for in the vicinity.

Canine mechanics and banking

There are some texts that are worth reading, even if you are outside the group they are intended for. Here is one that I think everybody should read at least the first half of:

Andrew G Haldane and Vasileios Madouros: The dog and the frisbee

Haldane, the Executive Director for Financial Stability at Bank of England, brings up the topic is how to act in situations of uncertainty, and the role of our models of reality in making the right decision. How complex should they be in the face of a complex reality? The answer, based on the literature on heuristics, biases and modelling, and the practical world of financial disasters, is simple: they should be simple.

Using too complex models means that they tend to overfit scarce data, weight data randomly, require significant effort to set up – and tends to promote overconfidence. As Haldane then moves on to his own main topic, banking regulation. Complex regulations – which are in a sense models of how banks ought to act – have the same problem, and also act as incentives for playing the rules to gain advantage. The end result is an enormous waste of everybody’s time and effort that does not give the desired reduction of banking risk.

It is striking how many people have been seduced by the siren call of complex regulation or models, thinking their ability to include every conceivable special case is a sign of strength. Finance and finance regulation are full of smart people who make the same mistake, as is science. If there is one thing I learned in computational biology is that your model better produce more nontrivial results than the number of parameters it has.

But coming up with simple rules or models is not easy: knowing what to include and what not to include requires expertise and effort. In many ways this may be why people like complex models, since there are no tricky judgement calls.

Gamma function fractals

Another of my favourite functions if the Gamma function, $\Gamma(z)=\int_0^\infty t^{z-1}e^{-t} dt$ , the continuous generalization of the factorial. While it grows rapidly for positive reals, it has fun poles for the negative integers and is generally complex. What happens when you iterate it?

First I started by just applying it to different starting points, $z_{n+1} = \Gamma(z_n)$ . The result is a nice fractal, with some domains approaching 1, and others running off to infinity.

Here I color points that go to infinity in green shades on the number of iterations before they become very large, and the points approaching 1 by $|z_{30}-1|$ . Zooming in a bit more reveals neat self-similar patterns with alternating “beans”:

In the outside regions we have thin tendrils stretching towards infinity. These are familiar to anybody who has been iterating exponentials or trigonometric functions: the combination of oscillation and (super)exponential growth leads to the pattern.

OK,that was a Julia set (different starting points, same formula). What about a counterpart to the Mandelbrot set? I looked at $z_{n+1}=\Gamma(cz_n)$ where c is the control parameter. I start with $z_0=1$ and iterate:

Zooming in shows the same kind of motif copies of Julia sets as we see in the quadratic Mandelbrot set:

In fact, zooming in as above in the counterpart to the “seahorse valley” shows a remarkable similarity.

Energy requirements of the singularity

After a recent lecture about the singularity I got asked about its energy requirements. It is a good question. As my inquirer pointed out, humanity uses more and more energy and it generally has an environmental cost. If it keeps on growing exponentially, something has to give. And if there is a real singularity, how do you handle infinite energy demands?

First I will look at current trends, then different models of the singularity.

I will not deal directly with environmental costs here. They are relative to some idea of a value of an environment, and there are many ways to approach that question.

Current trends

Current computers are energy hogs. Currently general purpose computing consumes about one Petawatt-hour per year, with the entire world production somewhere above 22 Pwh. While large data centres may be obvious, the vast number of low-power devices may be an even more significant factor; up to 10% of our electricity use may be due to ICT.

Together they perform on the order of $10^{20}$ operations per second, or somewhere in the zettaFLOPS range.

Koomey’s law states that the number of computations per joule of energy dissipated has been doubling approximately every 1.57 years. This might speed up as the pressure to make efficient computing for wearable devices and large data centres makes itself felt. Indeed, these days performance per watt is often more important than performance per dollar.

Meanwhile, general-purpose computing capacity has a growth rate of 58% per annum, doubling every 18 months. Since these trends cancel rather neatly, the overall energy need is not changing significantly.

The push for low-power computing may make computing greener, and it might also make other domains more efficient by moving tasks to the virtual world, making them efficient and allowing better resource allocation. On the other hand, as things become cheaper and more efficient usage tends to go up, sometimes outweighing the gain. Which trend wins out in the long run is hard to predict.

Semilog plot of global energy consumption over time. — Semilog plot of global energy (all types) consumption over time.

Looking at overall energy use trends it looks like overall energy use increases exponentially (but has stayed at roughly the same per capita level since the 1970s). In fact, plotting it on a semilog graph suggests that it is increasing faster than exponential (otherwise it would be a straight line). This is presumably due to a combination of population increase and increased energy use. The best fit exponential has a doubling time of 44.8 years.

Electricity use is also roughly exponential, with a doubling time of 19.3 years. So we might be shifting more and more to electricity, and computing might be taking over more and more of that.

Extrapolating wildly, we would need the total solar input on Earth in about 300 years and the total solar luminosity in 911 years. In about 1,613 years we would have used up the solar system’s mass energy. So, clearly, long before then these trends will break one way or another.

Physics places a firm boundary due to the Landauer principle: in order to erase on bit of information $k T \ln(2)$ joules of energy have to be dissipated. Given current efficiency trends we will reach this limit around 2048.

The principle can be circumvented using reversible computation, either classical or quantum. But as I often like to point out, it still bites in the form of the need for error correction (erasing accidentally flipped bits) and formatting new computational resources (besides the work in turning raw materials into bits). We should hence expect a radical change in computation within a few decades, even if the cost per computation and second continues to fall exponentially.

What kind of singularity?

But how many joules of energy does a technological singularity actually need? It depends on what kind of singularity. In my own list of singularity meanings we have the following kinds:

A. Accelerating change
B. Self improving technology
C. Intelligence explosion
D. Emergence of superintelligence
E. Prediction horizon
F. Phase transition
G. Complexity disaster
H. Inflexion point
I. Infinite progress

Case A, acceleration, at first seems to imply increasing energy demands, but if efficiency grows faster they could of course go down.

Eric Chaisson has argued that energy rate density, how fast and densely energy get used (watts per kilogram), might be an indicator of complexity and growing according to a universal tendency. By this account, we should expect the singularity to have an extreme energy rate density – but it does not have to be using enormous amounts of energy if it is very small and light.

He suggests energy rate density may increase as Moore’s law, at least in our current technological setting. If we assume this to be true, then we would have $\Phi(t) = \exp(kt) = P(t)/M(t)$ , where $P(t)$ is the power of the system and $M(t)$ is the mass of the system at time t. One can maintain exponential growth by reducing the mass as well as increasing the power.

However, waste heat will need to be dissipated. If we use the simplest model where a radius R system with density $\rho$ radiates it away into space, then the temperature will be $T=[\rho \Phi R/3 \sigma]^{1/4}$ , or, if we have a maximal acceptable temperature, $R < 3\sigma T^4 / \rho \Phi$ . So the system needs to become smaller as $\Phi$ increases. If we use active heat transport instead (as outlined in my previous post), covering the surface with heat pipes that can remove X watts/square meter, then $R < 3 X$ $/ \Phi \rho$ . Again, the radius will be inversely proportional to $\Phi$ . This is similar to our current computers, where the CPU is a tiny part surrounded by cooling and energy supply.

If we assume the waste heat is just due to erasing bits, the rate of computation will be $I = P/kT \ln(2) = \Phi M / kT\ln(2) = [4 \pi \rho /3 k \ln(2)] \Phi R^3 / T$ bits per second. Using the first cooling model gives us $I \propto T^{11}/ \Phi^2$ – a massive advantage for running extremely hot and dense computation. In the second cooling model $I \propto \Phi^{-2}$ : in both cases higher energy rate densities make it harder to compute when close to the thermodynamic limit. Hence there might be an upper limit to how much we may want to push $\Phi$ .

Also, a system with mass M will use up its own mass-energy in time $Mc^2/P = c^2/\Phi$ : the higher the rate, the faster it will run out (and it is independent of size!). If the system is expanding at speed v it will gain and use up mass at a rate $M'= 4\pi\rho v t^2 - M\Phi(t)/c^2$ ; if $\Phi$ grows faster than quadratic with time it will eventually run out of mass to use. Hence the exponential growth must eventually reduce simply because of the finite lightspeed.

The Chaisson scenario does not suggest a “sustainable” singularity. Rather, it suggests a local intense transformation involving small, dense nuclei using up local resources. However, such local “detonations” may then spread, depending on the long-term goals of involved entities.

Cases B, C, D (intelligence explosions, superintelligence) have an unclear energy profile. We do not know how complex code would become or what kind of computational search is needed to get to superintelligence. It could be that it is more a matter of smart insights, in which case the needs are modest, or a huge deep learning-like project involving massive amounts of data sloshing around, requiring a lot of energy.

Case E, a prediction horizon, is separate from energy use. As this essay shows, there are some things we can say about superintelligent computational systems based on known physics that likely remains valid no matter what.

Case F, phase transition, involves a change in organisation rather than computation, for example the formation of a global brain out of previously uncoordinated people. However, this might very well have energy implications. Physical phase transitions involve discontinuities of the derivatives of the free energy. If the phases have different entropies (first order transitions) there has to be some addition or release of energy. So it might actually be possible that a societal phase transition requires a fixed (and possibly large) amount of energy to reorganize everything into the new order.

There are also second order transitions. These are continuous do not have a latent heat, but show divergent susceptibilities (how much the system responds to an external forcing). These might be more like how we normally imagine an ordering process, with local fluctuations near the critical point leading to large and eventually dominant changes in how things are ordered. It is not clear to me that this kind of singularity would have any particular energy requirement.

Case G, complexity disaster, is related to superexponential growth, such as the city growth model of Bettancourt, West et al. or the work on bubbles and finite time singularities by Didier Sornette. Here the rapid growth rate leads to a crisis, or more accurately a series of crises increasingly rapidly succeeding each other until a final singularity. Beyond that the system must behave in some different manner. These models typically predict rapidly increasing resource use (indeed, this is the cause of the crisis sequence as one kind of growth runs into resource scaling problems and is replaced with another one), although as Sornette points out the post-singularity state might well be a stable non-rivalrous knowledge economy.

Case H, an inflexion point, is very vanilla. It would represent the point where our civilization is halfway from where we started to where we are going. It might correspond to “peak energy” where we shift from increasing usage to decreasing usage (for whatever reason), but it does not have to. It could just be that we figure out most physics and AI in the next decades, become a spacefaring posthuman civilization, and expand for the next few billion years, using ever more energy but not having the same intense rate of knowledge growth as during the brief early era when we went from hunter gatherers to posthumans.

Case I, infinite growth, is not normally possible in the physical universe. Information can as far as we know not be stored beyond densities set by the Bekenstein bound ( $I \leq k_I MR$ where $k_I\approx 2.577\cdot 10^{43}$ bits per kg per meter), and we only have access to a volume $4 \pi c^3 t^3/3$ with mass density $\rho$ , so the total information growth must be bounded by $I \leq 4 \pi k_I c^4 \rho t^4/3$ . It grows quickly, but still just polynomially.

The exception to the finitude of growth is if we approach the boundaries of spacetime. Frank J. Tipler’s omega point theory shows how information processing could go infinite in a finite (proper) time in the right kind of collapsing universe with the right kind of physics. It doesn’t look like we live in one, but the possibility is tantalizing: could we arrange the right kind of extreme spacetime collapse to get the right kind of boundary for a mini-omega? It would be way beyond black hole computing and never be able to send back information, but still allow infinite experience. Most likely we are stuck in finitude, but it won’t hurt poking at the limits.

Conclusions

Indefinite exponential growth is never possible for physical properties that have some resource limitation, whether energy, space or heat dissipation. Sooner or later they will have to shift to a slower rate of growth – polynomial for expanding organisational processes (forced to this by the dimensionality of space, finite lightspeed and heat dissipation), and declining growth rate for processes dependent on a non-renewable resource.

That does not tell us much about the energy demands of a technological singularity. We can conclude that it cannot be infinite. It might be high enough that we bump into the resource, thermal and computational limits, which may be what actually defines the singularity energy and time scale. Technological singularities may also be small, intense and localized detonations that merely use up local resources, possibly spreading and repeating. But it could also turn out that advanced thinking is very low-energy (reversible or quantum) or requires merely manipulation of high level symbols, leading to a quiet singularity.

My own guess is that life and intelligence will always expand to fill whatever niche is available, and use the available resources as intensively as possible. That leads to instabilities and depletion, but also expansion. I think we are – if we are lucky and wise – set for a global conversion of the non-living universe into life, intelligence and complexity, a vast phase transition of matter and energy where we are part of the nucleating agent. It might not be sustainable over cosmological timescales, but neither is our universe itself. I’d rather see the stars and planets filled with new and experiencing things than continue a slow dance into the twilight of entropy.

…contemplate the marvel that is existence and rejoice that you are able to do so. I feel I have the right to tell you this because, as I am inscribing these words, I am doing the same.
– Ted Chiang, Exhalation

Maps of mindspace

The ever awesome Scott Alexander made a map of the rationalist blogosphere (webosphere? infosphere?) that I just saw (hat tip to Waldemar Ingdahl). Besides having plenty of delightful xkcd-style in-jokes, it is also useful by showing me parts of my intellectual neighbourhood I did not know well and might want to follow (want to follow, but probably can’t follow because of time constraints).

He starts out with pointing at some other concept maps like that, both the classic xkcd one and Julia Galef’s map of Bay Area memespace, which was a pleasant surprise to me. The latter explains the causal/influence links between communities in a very clear way.

One can of course quibble endlessly on what is left in or out (I loved the comments about the apparent lack of dragons on the rationalist map), but the two maps also show two different approaches to relatedness. In the rationalist map distance is based on some form of high-dimensional similarity, crunching it down to 2D using an informal version of a Kohonen map. Bodies of water can be used to “cheat” and add discontinuities/tears. In the memespace map the world is a network of causal/influence links, and the overall similarities between linked groups can be slight even when they share core memes. Here the cheating consists of leaving out broad links (Burning Man is mentioned; it would connect many nodes weakly to each other). In both cases what is left out is important, just as the choice of resolution. Good maps show the information the creator wants to show, and communicates it well.

It is tempting to write endless posts about good mindspace maps and how they work, what they can and cannot show, and various design choices. There are quite a few out there. Some are network layouts made automatically, typically from co-citations. Others are designed by hand. Some are artworks in themselves. I don’t have the time today. But starting the day with two delightful ones that trigger much new thoughts and planning is a good way of starting the day.

Just how efficient can a Jupiter brain be?

Large information processing objects have some serious limitations due to signal delays and heat production.

Latency

Consider a spherical “Jupiter-brain” of radius $R$ . It will take maximally $2R/c$ seconds to signal across it, and the average time between two random points (selected uniformly) will be $36R/35 c$ .

Whether this is too much depends on the requirements of the system. Typically the relevant question is if the transmission latency $L$ is long compared to the processing time $t$ of the local processing. In the case of the human brain delays range between a few milliseconds up to 100 milliseconds, and neurons have typical frequencies up to maximally 100 Hz. The ratio $L/t$ between transmission time and a “processing cycle” will hence be between 0.1-10, i.e. not far from unity. In a microprocessor the processing time is on the order of $10^{-9}$ s and delays across the chip (assuming 10% c signals) $\approx 3\cdot 10^{-10}$ s, $L/t\approx 0.3$ .

If signals move at lightspeed and the system needs to maintain a ratio close to unity, then the maximal size will be $R < tc/2$ (or $tc/4$ if information must also be sent back after a request). For nanosecond cycles this is on the order of centimeters, for femtosecond cycles 0.1 microns; conversely, for a planet-sized system (R=6000 km) $t=0.04$ s, 25 Hz.

The cycle size is itself bounded by lightspeed: a computational element such as a transistor needs to have a radius smaller than the time it takes to signal across it, otherwise it would not function as a unitary element. Hence it must be of size $r < c t$ or, conversely, the cycle time must be slower than $r/c$ seconds. If a unit volume performs $C$ computations per second close to this limit, $C=(c/r)(1/r)^3$ , or $C=c/r^4$ . (More elaborate analysis can deal with quantum limitations to processing, but this post will be classical.)

This does not mean larger systems are impossible, merely that the latency will be long compared to local processing (compare the Web). It is possible to split the larger system into a hierarchy of subsystems that are internally synchronized and communicate on slower timescales to form a unified larger system. It is sometimes claimed that very fast solid state civilizations will be uninterested in the outside world since it both moves immeasurably slowly and any interaction will take a long time as measured inside the fast civilization. However, such hierarchical arrangements may be both very large and arbitrarily slow: the civilization as a whole may find the universe moving at a convenient speed, despite individual members finding it frozen.

Waste heat dissipation

Information processing leads to waste heat production at some rate $P$ Watts per cubic meter.

Passive cooling

If the system just cools by blackbody radiation, the maximal radius for a given maximal temperature $T$ is

$R = \frac{3 \sigma T^4}{P}$

where $\sigma \approx$ $5.670\cdot 10^{-8}$ is the Stefan–Boltzmann constant. This assumes heat is efficiently distributed in the interior.

If it does $C$ computations per volume per second, the total computations are $4 \pi R^3 C / 3=108 \pi \sigma^3 T^{12} C /P^3$ – it really pays off being able to run it hot!

Still, molecular matter will melt above 3600 K, giving a max radius of around $29,000/P$ km. Current CPUs have power densities somewhat below 100 Watts per cm $^2$ ; if we assume 100 W per cubic centimetre $P=10^8$ and $R<29$ cm! If we assume a power dissipation similar to human brains $P=1.43\cdot 10^4$ the the max size becomes 2 km. Clearly the average power density needs to be very low to motivate a large system.

Using quantum dot logic gives a power dissipation of 61,787 W/m^3 and a radius of 470 meters. However, by slowing down operations by a factor $\sqrt{f}$ the energy needs decrease by the factor $f$ . A reduction of speed to 3% gives a reduction of dissipation by a factor $10^{-3}$ , enabling a 470 kilometre system. Since the total computations per second for the whole system scales with the size as $R^3 \sqrt{f}$ $= \sqrt{f}/P^3$ $= f^{-2.5}$ slow reversible computing produces more computations per second in total than hotter computing. The slower clockspeed also makes it easier to maintain unitary subsystems. The maximal size of each such system scales as $r=1/\sqrt{f}$ , and the total amount of computation inside them scales as $r^3=f^{-1.5}$ . In the total system the number of subsystems change as $(R/r)^3 = f^{-3/2}$ : although they get larger, the whole system grows even faster and becomes less unified.

The limit of heat emissions is set by the Landauer principle: we need to pay at least $k_B T\ln(2)$ Joules for each erased bit. So $I$ the number of bit erasures per second and cubic meter will be less than $P/k_B T\ln(2)$ . To get a planet-sized system P will be around 1-10 W, implying $I < 6.7\cdot 10^{19-20}$ for a hot 3600 K system, and $I < 8.0\cdot 10^{22-23}$ for a cold 3 K system.

Active cooling

Passive cooling just uses the surface area of the system to radiate away heat to space. But we can pump coolants from the interior to the surface, and we can use heat radiators much larger than the surface area. This is especially effective for low temperatures, where radiation cooling is very weak and heat flows normally gentle (remember, they are driven by temperature differences: not much room for big differences when everything is close to 0 K).

If we have a sphere with radius R with internal volume $V(R)$ of heat-emitting computronium, the surface must have $PV(R)/X$ area devoted to cooling pipes to get rid of the heat, where $X$ is the amount of Watts of heat that can b carried away by a square meter of piping. This can be formulated as the differential equation:

$V'(R)= 4\pi R^2 - PV(R)/X$ .

The solution is

$V(R)=4 \pi ( (P/X)^2R^2 - 2 (P/X) R - 2 \exp(-(P/X)R) + 2) (X^3/P^3)$ .

This grows as $R^2$ for larger $R$ . The average computronium density across the system falls as $1/R$ as the system becomes larger.

If we go for a cooling substance with great heat capacity per mass at 25 degrees C, hydrogen has 14.30 J/g/K. But in terms of volume water is better at 4.2 J/cm $^3$ /K. However, near absolute zero heat capacities drop down towards zero and there are few choices of fluids. One neat possibility is superfluid cooling. They carry no thermal energy – they can however transport heat by being converted into normal fluid and have a frictionless countercurrent bringing back superfluid from the cold end. The rate is limited by the viscosity of the normal fluid, and apparently there are critical velocities of the order of mm/s. A CERN paper gives the formula $Q=[A \rho_n / \rho_s^3 S^4 T^3 \Delta T ]^{1/3}$ for the heat transport rate per square meter, where $A$ is 800 ms/kg at 1.8K, $\rho_n$ is the density of normal fluid, $\rho_s$ the superfluid, $S$ is the entropy per unit mass. Looking at it as a technical coolant gives a steady state heat flux along a pipe around 1.2 W/cm $^2$ in a 1 meter pipe for a 1.9-1.8K difference in temperature. There are various nonlinearities and limitations due to the need to keep things below the lambda point. Overall, this produces a heat transfer coefficient of about $1.2\cdot 10^{4}$ , in line with the range 10,000-100,000 W/m^2/K found in forced convection (liquid metals have maximal transfer ability).

So if we assume about 1 K temperature difference, then for quantum dots at full speed $P/X=61787/10^5=0.61787$ we have a computational volume for a one km system 7.7 million cubic meters of computronium, or about 0.001 of the total volume. Slowing it down to 3% (reducing emissions by 1000) boosts the density to 86%. At this intensity a 1000 km system would look the same as the previous low-density one.

Conclusion

If the figure of merit is just computational capacity, then obviously a larger computer is always better. But if it matters that parts stay synchronized, then there is a size limit set by lightspeed. Smaller components are better in this analysis, which leaves out issues of error correction – below a certain size level thermal noise, quantum tunneling and cosmic rays will start to induce errors. Handling high temperatures well pays off enormously for a computer not limited by synchronization or latency in terms of computational power; after that, reducing volume heat production has a higher influence on total computation than actual computation density.

Active cooling is better than passive cooling, but the cost is wasted volume, which means longer signal delays. In the above model there is more computronium at the centre than at the periphery, somewhat ameliorating the effect (the mean distance is just 0.03R). However, this ignores the key issue of wiring, which is likely to be significant if everything needs to be connected to everything else.

In short, building a Jupiter-sized computer is tough. Asteroid-sized ones are far easier. If we ever find or build planet-sized systems they will either be reversible computing, or mostly passive storage rather than processing. Processors by their nature tend to be hot and small.

[Addendum: this article has been republished in H+ Magazine thanks to Peter Rothman. ]

What use is a Doomsday Clock?

I blog at The Conversation about the Doomsday Clock. How well can we predict our closeness to the end of the world? And does it really matter?

Fair brains?

Yesterday I gave a lecture at the London Futurists, “What is a fair distribution of brains?”:

My slides can be found here (PDF, 4.5 Mb).

My main take-home messages were:

Cognitive enhancement is potentially very valuable to individuals and society, both in pure economic terms but also for living a good life. Intelligence protects against many bad things (from ill health to being a murder victim), increases opportunity, and allows you to create more for yourself and others. Cognitive ability interacts in a virtuous cycle with education, wealth and social capital.

That said, intelligence is not everything. Non-cognitive factors like motivation are also important. And societies that leave out people – due to sexism, racism, class divisions or other factors – will lose out on brain power. Giving these groups education and opportunities is a very cheap way of getting a big cognitive capital boost for society.

I was critiqued for talking about “cognitive enhancement” when I could just have talked about “cognitive change”. Enhancement has a built in assumption of some kind of improvement. However, a talk about fairness and cognitive change becomes rather anaemic: it just becomes a talk about what opportunities we should give people, not whether these changes affect their relationship in a morally relevant way.

Distributive justice

Theories of distributive justice typically try to answer: what goods are to be distributed, among whom, and what is the proper distribution? In our case it would be cognitive enhancements, and the interested parties are at least existing people but could include future generations (especially if we use genetic means).

Egalitarian theories argue that there has to be some form of equality, either equality of opportunity (everybody gets to enhance if they want), equality of outcome (everybody equally smart). Meritocratic theories would say the enhancement should be distributed by merit, presumably mainly to those who work hard at improving themselves or have already demonstrated great potential. Conversely, need-based theories and prioritarians argue we should prioritize those who are worst off or need the enhancement the most. Utilitarian justice requires the maximization of the total or average welfare across all relevant individuals.

Most of these theories agree with Rawls that impartiality is important: it should not matter who you are. Rawls famously argued for two principles of justice: (1) “Each person is to have an equal right to the most extensive total system of equal basic liberties compatible with a similar system of liberty for all.”, and (2) “Social and economic inequalities are to be arranged so that they are both (a) to the greatest benefit of the least advantaged, consistent with the just savings principle, and (b) attached to offices and positions open to all under conditions of fair equality of opportunity.”

It should be noted that a random distribution is impartial: if we cannot afford to give enhancement to everybody, we could have a lottery (meritocrats, prioritarians and utilitarians might want this lottery to be biased by some merit/need weighting, or to be just between the people relevant for getting the enhancement, while egalitarians would want everybody to be in).

Why should we even care about distributive justice? One argument is that we all have individual preferences and life goals we seek to achieve; if all relevant resources are in the hands of a few, there will be less preference satisfaction than if everybody had enough. In some cases there might be satiation, where we do not need more than a certain level of stuff to be satisfied and the distribution of the rest becomes irrelevant, but given the unbounded potential ambitions and desires of people it is unlikely to apply generally.

Many unequal situations are not seen as unjust because that is just the way the world is: it is a brute biological fact that males on average live shorter than females, and that there is a random distribution of cognitive ability. But if we change the technological conditions, these facts become possible to change: now we can redistribute stuff to affect them. Ironically, transhumanism hopes/aims to change conditions so that some states, which are at present not unjust, will become unjust!

Some enhancements are absolute: they help you or society no matter what others do, others are merely positional. Positional enhancements are a zero-sum game. However, doing the reversal test demonstrates that cognitive ability has absolute components: a world where everybody got a bit more stupid is not a better world, despite the unchanged relative rankings. There is more accidents and mistakes, more risk that some joint threat cannot be handled, and many life projects become harder and impossible to achieve. And the Flynn effect demonstrates that we are unlikely to be at some particular optimum right now.

The Rawlsian principles are OK with enhancement of the best-off if that helps the worst-off. This is not unreasonable for cognitive enhancement: the extreme high performers have a disproportionate output (patents, books, lectures) that benefit the rest of society, and the network effects of a generally smarter society might benefit everyone living in it. However, less cognitively able people are also less able to make use of opportunities created by this: intelligence is fundamentally a limit to equality of opportunity, and the more you have, the more you are able to select what opportunities and projects to aim for. So a Rawlsian would likely be fairly keen on giving more enhancement to the worst off.

Would a world where everybody had same intelligence be better than the current one? Intuitively it seems emotionally neutral. The reason is that we have conveniently and falsely talked about intelligence as one thing. As several audience members argued, there are many parts of intelligence. Even if one does not buy Gardner’s multiple intelligence theory, it is clear that there are different styles of problem-solving and problem-posing. This is true even if measurements of the magnitude of mental abilities are fairly correlated. A world where everybody thought in the same way would be a bad place. We might not want bad thinking, but there are many forms of good thinking. And we benefit from diversity of thinking styles. Different styles of cognition can make the world more unequal but not more unjust.

Inequality over time

As I have argued before, enhancements in the forms of gadgets and pills are likely to come down in price and become easy to distribute, while service-based enhancements are more problematic since they will tend to remain expensive. Modelling the spread of enhancement suggests that enhancements that start out expensive but then become cheaper first leads to a growth of inequality and then a decrease. If there is a levelling off effect where it becomes harder to enhance beyond a certain point this eventually leads to a more cognitively equal society as everybody catches up and ends up close to the efficiency boundary.

When considering inequality across time we should likely accept early inequality if it leads to later equality. After all, we should not treat spatially remote people differently from nearby people, and the same is true across time. As Claudio Tamburrini said, “Do not sacrifice poor of the future for the poor of the present.”

The risk is if there is compounding: enhanced people can make more money, and use that to enhance themselves or their offspring more. I seriously doubt this works for biomedical enhancement since there are limits to what biological brains can do (and human generation times are long compared to technology change), but it may be risky in regards to outsourcing cognition to machines. If you can convert capital into cognitive ability by just buying more software, then things could become explosive if the payoffs from being smart in this way are large. However, then we are likely to have an intelligence explosion anyway, and the issue of social justice takes back seat compared to the risks of a singularity. Another reason to think it is not strongly compounding is that geniuses are not all billionaires, and billionaires – while smart – are typically not the very most intelligent people. Pickety’s argument actually suggests that it is better to have a lot of money than a lot of brains since you can always hire smart consultants.

Francis Fukuyama famously argued that enhancement was bad for society because it risks making people fundamentally unequal. However, liberal democracy is already based on idea of common society of unequal individuals – they are different in ability, knowledge and power, yet treated fairly and impartially as “one man, one vote”. There is a difference between moral equality and equality measured in wealth, IQ or anything else. We might be concerned about extreme inequalities in some of the latter factors leading to a shift in moral equality, or more realistically, that those factors allow manipulation of the system to the benefit of the best off. This is why strengthening the “dominant cooperative framework” (to use Allen Buchanan’s term) is important: social systems are resilient, and we can make them more resilient to known or expected future challenges.

Conclusions

My main conclusions were:

Enhancing cognition can make society more or less unequal. Whether this is unjust depends both on the technology, one’s theory of justice, and what policies are instituted.
Some technologies just affect positional goods, and they make everybody worse off. Some are win-win situations, and I think much of intelligence enhancement is in this category.
Cognitive enhancement is likely to individually help the worst off, but make the best off compete harder.
Controlling mature technologies is hard, since there are both vested interests and social practices around them. We have an opportunity to affect the development of cognitive enhancement now, before it becomes very mainstream and hard to change.
Strengthening the “dominant cooperative framework” of society is a good idea in any case.
Individual morphological freedom must be safeguarded.
Speeding up progress and diffusion is likely to reduce inequality over time – and promote diversity.
Different parts of the world likely to approach CE differently and at different speeds.

As transhumanists, what do we want?

The transhumanist declaration makes wide access a point, not just on fairness or utilitarian grounds, but also for learning more. We have a limited perspective and cannot know well beforehand were the best paths are, so it is better to let people pursue their own inquiry. There may also be intrinsic values in freedom, autonomy and open-ended life projects: not giving many people the chance to this may lose much value.

Existential risk overshadows inequality: achieving equality by dying out is not a good deal. So if some enhancements increases existential risk we should avoid them. Conversely, if enhancements look like they reduce existential risk (maybe some moral or cognitive enhancements) they may be worth pursuing even if they are bad for (current) inequality.

We will likely end up with a diverse world that will contain different approaches, none universal. Some areas will prohibit enhancement, others allow it. No view is likely to become dominant quickly (without rather nasty means or some very surprising philosophical developments). That strongly speaks for the need to construct a tolerant world system.

If we have morphological freedom, then preventing cognitive enhancement needs to point at a very clear social harm. If the social harm is less than existing practices like schooling, then there is no legitimate reason to limit enhancement. There are also costs of restrictions: opportunity costs, international competition, black markets, inequality, losses in redistribution and public choice issues where regulators become self-serving. Controlling technology is like controlling art: it is an attempt to control human creativity and exploration, and should be done very cautiously.

Threat reduction Thursday

Today seems to have been “doing something about risk”-day. Or at least, “let’s investigate risk so we know what we ought to do”-day.

First, the World Economic Forum launched their 2015 risk perception report. (Full disclosure: I am on the advisory committee)

Second, Elon Musk donated $10M to AI safety research. Yes, this is quite related to the FLI open letter.

Today has been a good day. Of course, it will be an even better day if and when we get actual results in risk mitigation.

Don’t be evil and make things better

I am one of the signatories of an open letter calling for a stronger aim at socially beneficial artificial intelligence.

It might seem odd to call for something like that: who in their right mind would not want AI to be beneficial? But when we look at the field (and indeed, many other research fields) the focus has traditionally been on making AI more capable. Besides some pure research interest and no doubt some “let’s create life”-ambition, the bulk of motivation has been to make systems that do something useful (or push in the direction of something useful).

“Useful” is normally defined in term of performing some task – translation, driving, giving medical advice – rather than having a good impact on the world. Better done tasks are typically good locally – people get translations more cheaply, things get transported, advice may be better – but have more complex knock-on effects: fewer translators, drivers or doctors needed, or that their jobs get transformed, plus potential risks from easy (but possibly faulty) translation, accidents and misuse of autonomous vehicles, or changes in liability. Way messier. Even if the overall impact is great, socially disruptive technologies that appear surprisingly fast can cause trouble, emergent misbehaviour and bad design choices can lead to devices that amplify risk (consider high frequency trading, badly used risk models, or anything that empowers crazy people). Some technologies may also lend themselves to centralizing power (surveillance, autonomous weapons) but reduce accountability (learning algorithms internalizing discriminatory assumptions in an opaque way). These considerations should of course be part of any responsible engineering and deployment, even if handling them is by no means solely the job of the scientist or programmer. Doing it right will require far more help from other disciplines.

The most serious risks come from the very smart systems that may emerge further into the future: they either amplify human ability in profound ways, or they are autonomous themselves. In both cases they make achieving goals easier, but do not have any constraints on what goals are sane, moral or beneficial. Solving the problem of how to keep such systems safe is a hard problem we ought to start on early. One of the main reasons for the letter is that so little effort has gone into better ways of controlling complex, adaptive and possibly self-improving technological systems. It makes sense even if one doesn’t worry about superintelligence or existential risk.

This is why we have to change some research priorities. In many cases it is just putting problems on the agenda as useful to work on: they are important but understudied, and a bit of investigation will likely go a long way. In some cases it is more a matter of signalling that different communities need to talk more to each other. And in some instances we really need to have our act together before big shifts occur – if unemployment soars to 50%, engineering design-ahead enables big jumps in tech capability, brains get emulated, or systems start self-improving we will not have time to carefully develop smart policies.

My experience with talking to the community is that there is not a big split between AI and AI safety practitioners: they roughly want the same thing. There might be a bigger gap between the people working on the theoretical, far out issues and the people working on the applied here-and-now stuff. I suspect they can both learn from each other. More research is, of course, needed.

Andart II

Part of Anders' Exoself

Gamma function surfaces

Canine mechanics and banking

Gamma function fractals

Energy requirements of the singularity

Current trends

What kind of singularity?

Conclusions

Maps of mindspace

Just how efficient can a Jupiter brain be?

Latency

Waste heat dissipation

Passive cooling

Active cooling

Conclusion

What use is a Doomsday Clock?

Fair brains?

Distributive justice

Inequality over time

Conclusions

As transhumanists, what do we want?

Threat reduction Thursday

Don’t be evil and make things better