Just how efficient can a Jupiter brain be?

Large information processing objects have some serious limitations due to signal delays and heat production.

Latency

XIX: The Dyson SunConsider a spherical “Jupiter-brain” of radius R. It will take maximally 2R/c seconds to signal across it, and the average time between two random points (selected uniformly) will be 36R/35 c.

Whether this is too much depends on the requirements of the system. Typically the relevant question is if the transmission latency L is long compared to the processing time t of the local processing. In the case of the human brain delays range between a few milliseconds up to 100 milliseconds, and neurons have typical frequencies up to maximally 100 Hz. The ratio L/t between transmission time and a “processing cycle” will hence be between 0.1-10, i.e. not far from unity. In a microprocessor the processing time is on the order of 10^{-9} s and delays across the chip (assuming 10% c signals) \approx 3\cdot 10^{-10} s, L/t\approx 0.3.

If signals move at lightspeed and the system needs to maintain a ratio close to unity, then the maximal size will be R < tc/2 (or tc/4 if information must also be sent back after a request). For nanosecond cycles this is on the order of centimeters, for femtosecond cycles 0.1 microns; conversely, for a planet-sized system (R=6000 km) t=0.04 s, 25 Hz.

The cycle size is itself bounded by lightspeed: a computational element such as a transistor needs to have a radius smaller than the time it takes to signal across it, otherwise it would not function as a unitary element. Hence it must be of size r < c t or, conversely, the cycle time must be slower than r/c seconds. If a unit volume performs C computations per second close to this limit, C=(c/r)(1/r)^3, or C=c/r^4. (More elaborate analysis can deal with quantum limitations to processing, but this post will be classical.)

This does not mean larger systems are impossible, merely that the latency will be long compared to local processing (compare the Web). It is possible to split the larger system into a hierarchy of subsystems that are internally synchronized and communicate on slower timescales to form a unified larger system. It is sometimes claimed that very fast solid state civilizations will be uninterested in the outside world since it both moves immeasurably slowly and any interaction will take a long time as measured inside the fast civilization. However, such hierarchical arrangements may be both very large and arbitrarily slow: the civilization as a whole may find the universe moving at a convenient speed, despite individual members finding it frozen.

Waste heat dissipation

Information processing leads to waste heat production at some rate P Watts per cubic meter.

Passive cooling

If the system just cools by blackbody radiation, the maximal radius for a given maximal temperature T is

R = \frac{3 \sigma T^4}{P}

where \sigma \approx 5.670\cdot 10^{-8} is the Stefan–Boltzmann constant. This assumes heat is efficiently distributed in the interior.

If it does C computations per volume per second, the total computations are 4 \pi R^3 C / 3=108 \pi \sigma^3 T^{12} C /P^3  – it really pays off being able to run it hot!

Still, molecular matter will melt above 3600 K, giving a max radius of around 29,000/P km. Current CPUs have power densities somewhat below 100 Watts per cm^2; if we assume 100 W per cubic centimetre P=10^8 and $R<29$ cm! If we assume a power dissipation similar to human brains P=1.43\cdot 10^4 the the max size becomes 2 km. Clearly the average power density needs to be very low to motivate a large system.

Using quantum dot logic gives a power dissipation of 61,787 W/m^3 and a radius of 470 meters. However, by slowing down operations by a factor \sqrt{f} the energy needs decrease by the factor f. A reduction of speed to 3% gives a reduction of dissipation by a factor 10^{-3}, enabling a 470 kilometre system. Since the total computations per second for the whole system scales with the size as R^3 \sqrt{f} = \sqrt{f}/P^3 = f^{-2.5} slow reversible computing produces more computations per second in total than hotter computing. The slower clockspeed also makes it easier to maintain unitary subsystems. The maximal size of each such system scales as r=1/\sqrt{f}, and the total amount of computation inside them scales as r^3=f^{-1.5}. In the total system the number of subsystems change as (R/r)^3 = f^{-3/2}: although they get larger, the whole system grows even faster and becomes less unified.

The limit of heat emissions is set by the Landauer principle: we need to pay at least k_B T\ln(2) Joules for each erased bit. So I the number of bit erasures per second and cubic meter will be less than P/k_B T\ln(2). To get a planet-sized system P will be around 1-10 W, implying I < 6.7\cdot 10^{19-20} for a hot 3600 K system, and I < 8.0\cdot 10^{22-23} for a cold 3 K system.

Active cooling

Passive cooling just uses the surface area of the system to radiate away heat to space. But we can pump coolants from the interior to the surface, and we can use heat radiators much larger than the surface area. This is especially effective for low temperatures, where radiation cooling is very weak and heat flows normally gentle (remember, they are driven by temperature differences: not much room for big differences when everything is close to 0 K).

If we have a sphere with radius R with internal volume V(R) of heat-emitting computronium, the surface must have PV(R)/X area devoted to cooling pipes to get rid of the heat, where X is the amount of Watts of heat that can b carried away by a square meter of piping. This can be formulated as the differential equation:

V'(R)= 4\pi R^2 - PV(R)/X.
The solution is
V(R)=4 \pi ( (P/X)^2R^2 - 2 (P/X) R - 2 \exp(-(P/X)R) + 2) (X^3/P^3).

This grows as R^2 for larger R. The average computronium density across the system falls as 1/R as the system becomes larger.

If we go for a cooling substance with great heat capacity per mass at 25 degrees C, hydrogen has 14.30 J/g/K. But in terms of volume water is better at 4.2 J/cm^3/K. However, near absolute zero heat capacities drop down towards zero and there are few choices of fluids. One neat possibility is superfluid cooling. They carry no thermal energy – they can however transport heat by being converted into normal fluid and have a frictionless countercurrent bringing back superfluid from the cold end. The rate is limited by the viscosity of the normal fluid, and apparently there are critical velocities of the order of mm/s. A CERN paper gives the formula Q=[A \rho_n / \rho_s^3 S^4 T^3 \Delta T ]^{1/3} for the heat transport rate per square meter, where A is 800 ms/kg at 1.8K, \rho_n is the density of normal fluid, \rho_s the superfluid, S is the entropy per unit mass. Looking at it as a technical coolant gives a steady state heat flux along a pipe around 1.2 W/cm^2 in a 1 meter pipe for a 1.9-1.8K difference in temperature. There are various nonlinearities and limitations due to the need to keep things below the lambda point. Overall, this produces a heat transfer coefficient of about 1.2\cdot 10^{4} , in line with the range 10,000-100,000 W/m^2/K found in forced convection (liquid metals have maximal transfer ability).

So if we assume about 1 K temperature difference, then for quantum dots at full speed P/X=61787/10^5=0.61787 we have a computational volume for a one km system 7.7 million cubic meters of computronium, or about 0.001 of the total volume. Slowing it down to 3% (reducing emissions by 1000) boosts the density to 86%. At this intensity a 1000 km system would look the same as the previous low-density one.

 Conclusion

If the figure of merit is just computational capacity, then obviously a larger computer is always better. But if it matters that parts stay synchronized, then there is a size limit set by lightspeed. Smaller components are better in this analysis, which leaves out issues of error correction – below a certain size level thermal noise, quantum tunneling and cosmic rays will start to induce errors. Handling high temperatures well pays off enormously for a computer not limited by synchronization or latency in terms of computational power; after that, reducing volume heat production has a higher influence on total computation than actual computation density.

Active cooling is better than passive cooling, but the cost is wasted volume, which means longer signal delays. In the above model there is more computronium at the centre than at the periphery, somewhat ameliorating the effect (the mean distance is just 0.03R). However, this ignores the key issue of wiring, which is likely to be significant if everything needs to be connected to everything else.

In short, building a Jupiter-sized computer is tough. Asteroid-sized ones are far easier. If we ever find or build planet-sized systems they will either be reversible computing, or mostly passive storage rather than processing. Processors by their nature tend to be hot and small.

[Addendum: this article has been republished in H+ Magazine thanks to Peter Rothman. ]

Fair brains?

Yesterday I gave a lecture at the London Futurists, “What is a fair distribution of brains?”:

My slides can be found here (PDF, 4.5 Mb).

My main take-home messages were:

Cognitive enhancement is potentially very valuable to individuals and society, both in pure economic terms but also for living a good life. Intelligence protects against many bad things (from ill health to being a murder victim), increases opportunity, and allows you to create more for yourself and others. Cognitive ability interacts in a virtuous cycle with education, wealth and social capital.

That said, intelligence is not everything. Non-cognitive factors like motivation are also important. And societies that leave out people – due to sexism, racism, class divisions or other factors – will lose out on brain power. Giving these groups education and opportunities is a very cheap way of getting a big cognitive capital boost for society.

I was critiqued for talking about “cognitive enhancement” when I could just have talked about “cognitive change”. Enhancement has a built in assumption of some kind of improvement. However, a talk about fairness and cognitive change becomes rather anaemic: it just becomes a talk about what opportunities we should give people, not whether these changes affect their relationship in a morally relevant way.

Distributive justice

Theories of distributive justice typically try to answer: what goods are to be distributed, among whom, and what is the proper distribution? In our case it would be cognitive enhancements, and the interested parties are at least existing people but could include future generations (especially if we use genetic means).

Egalitarian theories argue that there has to be some form of equality, either equality of opportunity (everybody gets to enhance if they want), equality of outcome (everybody equally smart). Meritocratic theories would say the enhancement should be distributed by merit, presumably mainly to those who work hard at improving themselves or have already demonstrated great potential. Conversely, need-based theories and prioritarians argue we should prioritize those who are worst off or need the enhancement the most. Utilitarian justice requires the maximization of the total or average welfare across all relevant individuals.

Most of these theories agree with Rawls that impartiality is important: it should not matter who you are. Rawls famously argued for two principles of justice: (1) “Each person is to have an equal right to the most extensive total system of equal basic liberties compatible with a similar system of liberty for all.”, and (2) “Social and economic inequalities are to be arranged so that they are both (a) to the greatest benefit of the least advantaged, consistent with the just savings principle, and (b) attached to offices and positions open to all under conditions of fair equality of opportunity.”

It should be noted that a random distribution is impartial: if we cannot afford to give enhancement to everybody, we could have a lottery (meritocrats, prioritarians and utilitarians might want this lottery to be biased by some merit/need weighting, or to be just between the people relevant for getting the enhancement, while egalitarians would want everybody to be in).

Why should we even care about distributive justice? One argument is that we all have individual preferences and life goals we seek to achieve; if all relevant resources are in the hands of a few, there will be less preference satisfaction than if everybody had enough. In some cases there might be satiation, where we do not need more than a certain level of stuff to be satisfied and the distribution of the rest becomes irrelevant, but given the unbounded potential ambitions and desires of people it is unlikely to apply generally.

Many unequal situations are not seen as unjust because that is just the way the world is: it is a brute biological fact that males on average live shorter than females, and that there is a random distribution of cognitive ability. But if we change the technological conditions, these facts become possible to change: now we can redistribute stuff to affect them. Ironically, transhumanism hopes/aims to change conditions so that some states, which are at present not unjust, will become unjust!

Some enhancements are absolute: they help you or society no matter what others do, others are merely positional. Positional enhancements are a zero-sum game. However, doing the reversal test demonstrates that cognitive ability has absolute components: a world where everybody got a bit more stupid is not a better world, despite the unchanged relative rankings. There is more accidents and mistakes, more risk that some joint threat cannot be handled, and many life projects become harder and impossible to achieve. And the Flynn effect demonstrates that we are unlikely to be at some particular optimum right now.

The Rawlsian principles are OK with enhancement of the best-off if that helps the worst-off. This is not unreasonable for cognitive enhancement: the extreme high performers have a disproportionate output (patents, books, lectures) that benefit the rest of society, and the network effects of a generally smarter society might benefit everyone living in it. However, less cognitively able people are also less able to make use of opportunities created by this: intelligence is fundamentally a limit to equality of opportunity, and the more you have, the more you are able to select what opportunities and projects to aim for. So a Rawlsian would likely be fairly keen on giving more enhancement to the worst off.

Would a world where everybody had same intelligence be better than the current one? Intuitively it seems emotionally neutral. The reason is that we have conveniently and falsely talked about intelligence as one thing. As several audience members argued, there are many parts of intelligence. Even if one does not buy Gardner’s multiple intelligence theory, it is clear that there are different styles of problem-solving and problem-posing. This is true even if measurements of the magnitude of mental abilities are fairly correlated. A world where everybody thought in the same way would be a bad place. We might not want bad thinking, but there are many forms of good thinking. And we benefit from diversity of thinking styles. Different styles of cognition can make the world more unequal but not more unjust.

Inequality over time

As I have argued before, enhancements in the forms of gadgets and pills are likely to come down in price and become easy to distribute, while service-based enhancements are more problematic since they will tend to remain expensive. Modelling the spread of enhancement suggests that enhancements that start out expensive but then become cheaper first leads to a growth of inequality and then a decrease. If there is a levelling off effect where it becomes harder to enhance beyond a certain point this eventually leads to a more cognitively equal society as everybody catches up and ends up close to the efficiency boundary.

When considering inequality across time we should likely accept early inequality if it leads to later equality. After all, we should not treat spatially remote people differently from nearby people, and the same is true across time. As Claudio Tamburrini said, “Do not sacrifice poor of the future for the poor of the present.”

The risk is if there is compounding: enhanced people can make more money, and use that to enhance themselves or their offspring more. I seriously doubt this works for biomedical enhancement since there are limits to what biological brains can do (and human generation times are long compared to technology change), but it may be risky in regards to outsourcing cognition to machines. If you can convert capital into cognitive ability by just buying more software, then things could become explosive if the payoffs from being smart in this way are large. However, then we are likely to have an intelligence explosion anyway, and the issue of social justice takes back seat compared to the risks of a singularity. Another reason to think it is not strongly compounding is that geniuses are not all billionaires, and billionaires – while smart – are typically not the very most intelligent people. Pickety’s argument actually suggests that it is better to have a lot of money than a lot of brains since you can always hire smart consultants.

Francis Fukuyama famously argued that enhancement was bad for society because it risks making people fundamentally unequal. However, liberal democracy is already based on idea of common society of unequal individuals – they are different in ability, knowledge and power, yet treated fairly and impartially as “one man, one vote”. There is a difference between moral equality and equality measured in wealth, IQ or anything else. We might be concerned about extreme inequalities in some of the latter factors leading to a shift in moral equality, or more realistically, that those factors allow manipulation of the system to the benefit of the best off. This is why strengthening the “dominant cooperative framework” (to use Allen Buchanan’s term) is important: social systems are resilient, and we can make them more resilient to known or expected future challenges.

Conclusions

My main conclusions were:

  • Enhancing cognition can make society more or less unequal. Whether this is unjust depends both on the technology, one’s theory of justice, and what policies are instituted.
  • Some technologies just affect positional goods, and they make everybody worse off. Some are win-win situations, and I think much of intelligence enhancement is in this category.
  • Cognitive enhancement is likely to individually help the worst off, but make the best off compete harder.
  • Controlling mature technologies is hard, since there are both vested interests and social practices around them. We have an opportunity to affect the development of cognitive enhancement now, before it becomes very mainstream and hard to change.
  • Strengthening the “dominant cooperative framework” of society is a good idea in any case.
  • Individual morphological freedom must be safeguarded.
  • Speeding up progress and diffusion is likely to reduce inequality over time – and promote diversity.
  • Different parts of the world likely to approach CE differently and at different speeds.

As transhumanists, what do we want?

The transhumanist declaration makes wide access a point, not just on fairness or utilitarian grounds, but also for learning more. We have a limited perspective and cannot know well beforehand were the best paths are, so it is better to let people pursue their own inquiry. There may also be intrinsic values in freedom, autonomy and open-ended life projects: not giving many people the chance to this may lose much value.

Existential risk overshadows inequality: achieving equality by dying out is not a good deal. So if some enhancements increases existential risk we should avoid them. Conversely, if enhancements look like they reduce existential risk (maybe some moral or cognitive enhancements) they may be worth pursuing even if they are bad for (current) inequality.

We will likely end up with a diverse world that will contain different approaches, none universal. Some areas will prohibit enhancement, others allow it. No view is likely to become dominant quickly (without rather nasty means or some very surprising philosophical developments). That strongly speaks for the need to construct a tolerant world system.

If we have morphological freedom, then preventing cognitive enhancement needs to point at a very clear social harm. If the social harm is less than existing practices like schooling, then there is no legitimate reason to limit enhancement.  There are also costs of restrictions: opportunity costs, international competition, black markets, inequality, losses in redistribution and public choice issues where regulators become self-serving. Controlling technology is like controlling art: it is an attempt to control human creativity and exploration, and should be done very cautiously.

Threat reduction Thursday

Today seems to have been “doing something about risk”-day. Or at least, “let’s investigate risk so we know what we ought to do”-day.
First, the World Economic Forum launched their 2015 risk perception report. (Full disclosure: I am on the advisory committee)
Second, Elon Musk donated $10M to AI safety research. Yes, this is quite related to the FLI open letter.
Today has been a good day. Of course, it will be an even better day if and when we get actual results in risk mitigation.

Don’t be evil and make things better

Autonomous deviceI am one of the signatories of an open letter calling for a stronger aim at socially beneficial artificial intelligence.

It might seem odd to call for something like that: who in their right mind would not want AI to be beneficial? But when we look at the field (and indeed, many other research fields) the focus has traditionally been on making AI more capable. Besides some pure research interest and no doubt some “let’s create life”-ambition, the bulk of motivation has been to make systems that do something useful (or push in the direction of something useful).

“Useful” is normally defined in term of performing some task – translation, driving, giving medical advice – rather than having a good impact on the world. Better done tasks are typically good locally – people get translations more cheaply, things get transported, advice may be better – but have more complex knock-on effects: fewer translators, drivers or doctors needed, or that their jobs get transformed, plus potential risks from easy (but possibly faulty) translation, accidents and misuse of autonomous vehicles, or changes in liability. Way messier. Even if the overall impact is great, socially disruptive technologies that appear surprisingly fast can cause trouble, emergent misbehaviour and bad design choices can lead to devices that amplify risk (consider high frequency trading, badly used risk models, or anything that empowers crazy people). Some technologies may also lend themselves to centralizing power (surveillance, autonomous weapons) but reduce accountability (learning algorithms internalizing discriminatory assumptions in an opaque way). These considerations should of course be part of any responsible engineering and deployment, even if handling them is by no means solely the job of the scientist or programmer. Doing it right will require far more help from other disciplines.

Halloween scenarioThe most serious risks come from the very smart systems that may emerge further into the future: they either amplify human ability in profound ways, or they are autonomous  themselves. In both cases they make achieving goals easier, but do not have any constraints on what goals are sane, moral or beneficial. Solving the problem of how to keep such systems safe is a hard problem we ought to start on early. One of the main reasons for the letter is that so little effort has gone into better ways of controlling complex, adaptive and possibly self-improving technological systems. It makes sense even if one doesn’t worry about superintelligence or existential risk.

This is why we have to change some research priorities. In many cases it is just putting problems on the agenda as useful to work on: they are important but understudied, and a bit of investigation will likely go a long way. In some cases it is more a matter of signalling that different communities need to talk more to each other. And in some instances we really need to have our act together before big shifts occur – if unemployment soars to 50%, engineering design-ahead enables big jumps in tech capability, brains get emulated, or systems start self-improving we will not have time to carefully develop smart policies.

My experience with talking to the community is that there is not a big split between AI and AI safety practitioners: they roughly want the same thing. There might be a bigger gap between the people working on the theoretical, far out issues and the people working on the applied here-and-now stuff. I suspect they can both learn from each other. More research is, of course, needed.

Existential risk and hope

Spes altera vitaeToby and Owen started 2015 by defining existential hope, the opposite of existential risk.

In their report “Existential Risk and Existential Hope: Definitions” they look at definitions of existential risk. The initial definition was just the extinction of humanity, but that leaves out horrible scenarios where humanity suffers indefinitely, or situations where there is a tiny chance of humanity escaping. Chisholming their way through successive definitions they end up with:

An existential catastrophe is an event which causes the loss of most expected value.

They also get the opposite:

An existential eucatastrophe is an event which causes there to be much more expected value after the event than before.

So besides existential risk, where the value of our future can be lost, there is existential hope: the chance that our future is much greater than we expect. Just as we should work hard to avoid existential threats, we should explore to find potential eucatastrophes that vastly enlarge our future.

Infinite hope or fear

One problem with the definitions I can see is that expectations can be undefined or infinite, making “loss of most expected value” undefined. That would require potentially unbounded value, and that the probability of reaching a certain level has a sufficiently heavy tail. I guess most people would suspect the unbounded potential to be problematic, but at least some do think there could be infinite value somewhere in existence (I think this is what David Deutsch believes). The definition ought to work regardless of what kind of value structure exists in the universe.

There are a few approaches in Nick’s “Infinite ethics” paper. However, there might be simpler approaches based on stochastic dominance. Cutting off the upper half of a Chauchy distribution does change the situation despite the expectation remaining undefined (and in this case, changes the balance between catastrophe and eucatastrophe completely). It is clear that there is now more probability on the negative side: one can do a (first order) stochastic ordering of the distributions, even though the expectations diverge.

There are many kinds of stochastic orderings; which ones make sense likely depends on the kind of value one uses to evaluate the world. Toby and Owen point out that this what actually does the work in the definitions: without a somewhat precise value theory existential risk and hope will not be well defined. Just as there may be unknown threats and opportunities, there might be surprise twists in what is valuable – we might in the fullness of time discover that some things that looked innocuous or worthless were actually far more weighty than we thought, perhaps so much that they were worth the world.

 

 

Born this way

On Practical Ethics I blog about the ethics of attempts to genetically select sexual preferences.

Basically, it can only tilt probabilities and people develop preferences in individual and complex ways. I am not convinced selection is inherently bad, but it can embody bad societal norms. However, those norms are better dealt with on a societal/cultural level than by trying to regulate technology. This essay is very much a tie-in with our brave new love paper.

Shakespearian numbers

Number lineDuring a recent party I got asked the question “Since \pi has an infinite decimal expansion, does that mean the collected works of Shakespeare (suitably encoded) are in it somewhere?”

My first response was to point out that infinite decimal expressions are not enough: obviously 1/3=0.33333\ldots is a Shakespeare-free number (unless we have a bizarre encoding of the works in the form of all threes). What really matters is whether the number is suitably random. In mathematics this is known as the question about whether pi is a normal number.

If it is normal, then by the infinite monkey theorem then Shakespeare will almost surely be in the number. We actually do not know whether pi is normal, but it looks fairly likely. But that is not enough for a mathematician. A good overview of the problem can be found in a popular article by Bailey and Borwein. (Yep, one of the Borweins)

Where are the Shakespearian numbers?

This led to a second issue: what is the distribution of the Shakespeare-containing numbers?

We can encode Shakespeare in many ways. As an ASCII text the works take up 5.3 MB. One can treat this as a sequence of 7-bit characters and the works as  37,100,000 bits, or 11,168,212 decimal digits. A simple code where each pair of digits encode a character would encode 10,600,000 digits. This allows just a 100 character alphabet rather than a 127 character alphabet, but is likely OK for Shakespeare: we can use the ASCII code minus 32, for example.

If we denote the encoded works of Shakespeare by [Shakespeare], all numbers of the form 0.[Shakespeare]xxxxx\ldots are Shakespeare-containing.

They form a rather tiny interval: since the works start with ‘The’, [Shakespeare] starts as “527269…” and the interval lies inside the interval [0. 527269000\ldots , 0.52727], a mere millionth of [0,1]. The actual interval is even shorter.

But outside that interval there are numbers of the form 0.y[Shakespeare]xxxx\ldots , where y is a digit different from the starting digit of [Shakespeare] and x anything else. So there are 9 such second level intervals, each ten times thinner than the first level interval.

This pattern continues, with the intervals at each level ten times thinner but also 9 times as numerous. This is fairly similar to the Cantor set and gives rise to a fractal. But since the intervals are very tiny it is hard to see.

One way of visualizing this is to assume the weird encoding [Shakespeare]=3, so all numbers containing the digit 3 in the decimal expansion are Shakespearian and the rest are Shakespeare-free.

Distribution of Shakespeare-free numbers in the unit interval, assuming Shakespeare's collected works are encoded as the digit "3".
Distribution of Shakespeare-free numbers in the unit interval, assuming Shakespeare’s collected works are encoded as the digit “3”.

The fractal dimension of this Shakespeare-free set is \log(9)/\log(10)\approx 0.9542. This is less than 1: most points are Shakespearian and in one of the intervals, but since they are thin compared to the line the Shakespeare-free set is nearly one dimensional. Like the Cantor set, each Shakespeare-free number is isolated from any other Shakespeare-free number: there is always some Shakespearian numbers between them.

In the case of the full 5.3MB [Shakespeare] the interval length is around 10^{-10,600,000}. The fractal dimension of the Shakespeare-free set is \log(10^{10,600,000} - 1)/\log(10^{10,600,600}) \approx 1-\epsilon, for some tiny \epsilon \approx 10^{-10,600,000}.  It is very nearly an unbroken line… except for that nearly every point actually does contain Shakespeare.

We have been looking at the unit interval. We can of course look at the entire real line too, but the pattern is similar: just magnify the unit interval pattern by 10, 100, 1000, … times. Somewhere around  $10^{10,600,000}$ there are the numbers that have an integer part equal to [Shakespeare]. And above them are the intervals that start with his works followed by something else, a decimal point and then any decimals. And beyond them there are the [Shakespeare][Shakespeare]xxx\ldots numbers…

Shakespeare is common

One way of seeing that Shakespearian numbers are the generic case is to imagine choosing a number randomly. It has probability S of being in the level 1 interval of Shakespearian numbers. If not, then it will be in one of the 9 intervals 1/10 long that don’t start with the correct first digit, where the probability of starting with Shakespeare in the second digit is S. If that was all there was, the total probability would be S+(9/10)S+(9/10^2)S+\ldots = 10S<1. But the 1/10 interval around the first Shakespearian interval also counts: a number that has the right first digit but wrong second digit can still be Shakespearian. So it will add probability.

Another way of thinking about it is just to look at the initial digits: the probability of starting with [Shakespeare] is S, the probability of starting with [Shakespeare] in position 2 is (1-S)S (the first factor is the probability of not having Shakespeare first), and so on. So the total probability of finding Shakespeare is S + (1-S)S + (1-S)^2S + (1-S)^3S + \ldots = S/(1-(1-S))=1. So nearly all numbers are Shakespearian.

This might seem strange, since any number you are likely to mention is very likely Shakespeare-free. But this is just like the case of transcendental, normal or uncomputable numbers: they are actually the generic case in the reals, but most everyday numbers belong to the algebraic, non-normal and computable numbers.

It is also worth remembering that while all normal numbers are (almost surely) Shakespearian, there are non-normal Shakespearian numbers. For example, the fractional number 0.[Shakespeare]000\ldots is non-normal but Shakespearian. So is 0.[Shakespeare][Shakespeare][Shakespeare]\ldots We can throw in arbitrary finite sequences of digits between the Shakespeares, biasing numbers as close or far as we want from normality. There is a number 0.[Shakespeare]3141592\ldots that has the digits of \pi plus Shakespeare. And there is a number that looks like \pi until Graham’s number digits, then has a single Shakespeare and then continues. Shakespeare can hide anywhere.

In things of great receipt with case we prove,
Among a number one is reckoned none.
Then in the number let me pass untold,
Though in thy store’s account I one must be
-Sonnet 136

My Newtonmass Fractal

doubleshphere

I like the hyperbolic tangent function. It is useful for making sigmoid curves for neurons and fitting growth rates, it enables a cute minimal surface. So of course it should be iterated to make fractals! And there is no better way to celebrate Newtonmass than to make fractals!

As iteration formula I choose z_{n+1} = f(z_n) = \tanh(cz_n) , where c is a multiplicative constant. Iterating some number like 1 and plotting its fate produces the following “Mandelbrot set” in the c-plane – the colours here do not denote the time until escape to infinity but rather where in the complex plane the point ended up, as a function of c. In a normal Mandelbrot set infinity is an attractive fixed point; here it is just one place in the (extended) complex plane like any other.

"Mandelbrot set" for the hyperbolic tanh function tanh(cz).
“Mandelbrot set” for the hyperbolic tanh function tanh(cz).

The pinkish surroundings of the pattern represent points attracted to the positive solution of z=\tanh(cz). There is of course a corresponding negative solution since tanh is antisymmetric: if z is an attractive fixed point or cycle, so is -z. So the dynamics is always bistable.

Incidentally, the color scheme is achieved by doing a stereographic projection of the complex plane onto a sphere, which is then fitted into the RBG cube. Infinity corresponds to (0.5,0.5,1) and zero to (0.5,0.5,0) – the brownish middle of the Mandelbrot set, where points are attracted towards zero for small c.

 

Sphere used to stereographically map complex numbers to colors.
Sphere used to stereographically map complex numbers to colors.

Another property of tanh is that the function has singularities wherever z=\pm \pi n i / 2 c for integer n>0. Since Great Picard’s Theorem, that means that in the vicinity of those points it takes on nearly all other values in the complex plane. So whatever the pattern of the corresponding Julia set is, it will repeat itself near there (including images of the image, and so on).This means that despite most z points being attracted towards zero for c-values inside the unit circle, there will be a complex stitching of undefined points since they will be mapped to infinity, or are preimages of points that get mapped there.

Zoom into the tanh Mandelbrot set, showing chaotic regions with interspersed periodic regions.
Zoom into the tanh Mandelbrot set, showing chaotic regions with interspersed periodic regions.

Zooming into the messy regions shows that they are full of circle-cusp areas where there is a periodic attractor cycle. Between them are the regions where most of the z-plane where the Julia sets live is just pure chaos. Thanks to various classic theorems in the theory of complex iteration we know that if the Julia set has non-empty interior it is the entire complex plane.

Walking around the outside edge of the boring brown circle gives a fun sequence of patterns. At c=1 there are two real fixed points and a straight line border along the imaginary axis. This line of course contains the singularity points where things get sent to infinity, and near them the preimages of all the other singularities on the line: dramatic, but visually uninteresting.

Tanh 'Julia set' for c=1.
Tanh ‘Julia set’ for c=1.

As we move along the circle towards more imaginary c, there is a twisting of the border since each multiplication by c corresponds to a twist: it is now a fractal spiral covered by little spirals. As the twisting gets stronger, the spirals get bigger and wilder (especially when we are very close to the unit circle, where the dynamics has a lot of intermittency: the iterates almost but not quite gets stuck close to certain points, speed away, and then return to make rather elliptic spirals).

Tanh 'Julia set' for c=1.1*exp(0.23*i).
Tanh ‘Julia set’ for c=1.1*exp(0.23*i).
Tanh 'Julia set' for c=1.1*exp(0.5*i).
Tanh ‘Julia set’ for c=1.1*exp(0.5*i).
Tanh 'Julia set' for c=1.1*exp(0.55*i).
Tanh ‘Julia set’ for c=1.1*exp(0.55*i).

When we advance towards a cuspy border in the c-plane we see the spirals unfold into long twisty tentacles just before touching, turning into borders between chains of periodic domains.

Tanh 'Julia set' for c=1.1*exp(0.6*i).
Tanh ‘Julia set’ for c=1.1*exp(0.6*i).

But then the periodic domains start to snake out, filling the plane wildly.

Tanh 'Julia set' for c=1.1*exp(0.6594*i).
Tanh ‘Julia set’ for c=1.1*exp(0.6594*i).

until we get a plane-filling, ergodic Julia set with no discernible structure. For some c-values there are complex tesselations of basins of attraction, and quite often some places are close enough to weakly repelling fixed points to produce small circular false basins of attraction where divergence is slow.

 

Tanh 'Julia set' for c=1.1*exp(0.66*i).
Tanh ‘Julia set’ for c=1.1*exp(0.66*i).

One way of visualizing this is to make a bifurcation diagram like we do for real iteration. Following a curve r e^{i\theta} we plot where iterates end up projected along some line (for example their real or imaginary part, or some combination). To make structure stand out a bit more I decided to color points after where in the whole plane they are, producing a colorful diagram for r=1.1:

bifurk1.1b

(I have some others on Flickr for the imaginary axis, r=1.25 and r=1.5).

Another, more fun way is to turn them into animated gifs. Since Flickr doesn’t handle them well, I have stored them locally instead:

  • Growth of the Mandelbrot set – shows the behaviour of test iterates in the c-plane near the edge. Note the intermittent spirals.
  • Unit circle – following the unit circle.
  • Tanh 1.0 – the same as above, but inverted coordinates: z=\infty is at the center, zero outside the borders.
  • Tanh 1.1 – r=1.1.
  • Tanh 1.5 – r=1.5.
  • Tanh 2.5 – r=2.5.
  • Tanh 5.0 – r=5.0. Rather sedate except for a brief window near \theta=\pi/2.

Note how spirals unfold until they touch each other, forming periodic domains or exploding across the entire plane, making a chaotic full-plane attractor… which often blinks into complex patterns of periodic domains only to return to chaos.

tanhspir125.082

 

A sustainable orbital death ray

Visualizing lightI have for many years been a fan of the webcomic Schlock Mercenary. Hardish, humorous military sf with some nice, long-term plotting.

In the current plotline (some spoilers ahead) there is an enormous Chekov’s gun: Earth is surrounded by an equatorial ring of microsatellites that can reflect sunlight. It was intended for climate control, but as the main character immediately points out, it also makes an awesome weapon. You can guess what happens. That leds to an interesting question: just how effective would such a weapon actually be?

From any point on Earth’s surface only part of the ring is visible above the horizon. In fact, at sufficiently high latitudes it is entirely invisible – there you would be safe no matter what. Also, Earth likely casts a shadow across the ring that lowers the efficiency on the nightside.

I guessed, based on the appearance in some strips, that the radius is about two Earth radii (12,000 km), and the thickness about 2000 km. I did a Monte Carlo integration where I generated random ring microsatellites, checking whether they were visible above the horizon for different Earth locations (by looking at the dot product of the local normal and the satellite-location vector; for anything above the horizon this product must be possible) and were in sunlight (by checking that the distance to the Earth-Sun axis was more than 6000 km). The result is the following diagram of how much of the ring can be seen from any given location:

Visibility fraction of an equatorial ring 12,000-14,000 km out from Earth for different latitudes and longitudes.
Visibility fraction of an equatorial ring 12,000-14,000 km out from Earth for different latitudes and longitudes.

At most, 35% of the ring is visible. Even on the nightside where the shadow cuts through the ring about 25% is visible. In practice, there would be a notch cut along the equator where the ring cannot fire through itself; just how wide it would be depends on the microsatellite size and properties.

Overlaying the data on a world map gives the following footprint:

Visibility fraction of 12,000-14,000 ring from different locations on Earth.
Visibility fraction of 12,000-14,000 ring from different locations on Earth.

The ring is strongly visible up to 40 degrees of latitude, where it starts to disappear below the southern or northern horizon. Antarctica, northern Canada, Scandinavia and Siberia are totally safe.

This corresponds to the summer solstice, where the ring is maximally tilted relative to the Earth-Sun axis. This is when it has maximal power: at the equinoxes it is largely parallel to the sunlight and cannot reflect much at all.

The total amount of energy the ring receives is E_0 = \pi (r_o^2-r_i^2)|\sin(\theta)|S where r_o is the outer radius, r_i the inner radius, $\theta$ the tilt (between 23 degrees for the summer/winter solstice and 0 for equinoxes) and S is the solar constant, 1.361 kW/square meter. This ignores the Earth shadow. So putting in \theta=20^{\circ} for a New Years Eve firing, I get E_0 \approx 7.6\cdot 10^{16} Watt.

If we then multiply by 0.3 for visibility, we get 23 petawatts – is nothing to sneeze at! Of course, there will be losses, both in reflection (likely a few percent at most) and more importantly through light scattering (about 25%, assuming it behaves like normal sunlight). Now, a 17 PW beam is still pretty decent. And if you are on the nightside the shadowed ring surface can still give about 8 PW. That is about six times the energy flow in the Gulf Stream.

Light pillar

How destructive would such a beam be? A megaton of TNT is 4.18 PJ. So in about a second the beam could produce a comparable amount of heat.  It would be far redder than a nuclear fireball (since it is essentially 6000K blackbody radiation) and the IR energy would presumably bounce around and be re-radiated, spreading far in the transparent IR bands. I suspect the fireball would quickly affect the absorption in a complicated manner and there would be defocusing effects due to thermal blooming: keeping it on target might be very hard, since energy would both scatter and reflect. Unlike a nuclear weapon there would not be much of a shockwave (I suspect there would still be one, but less of the energy would go into it).

The awesome thing about the ring is that it can just keep on firing. It is a sustainable weapon powered by renewable energy. The only drawback is that it would not have an ommminous hummmm….

Addendum 14 December: I just realized an important limitation. Sunlight comes from an extended source, so if you reflect it using plane mirrors you will get a divergent beam – which means that the spot it hits on the ground will be broad. The sun has diameter 1,391,684 km and is 149,597,871 km away, so the light spot 8000 km below the reflector will be 74 km across. This is independent of the reflector size (down to the diffraction limit and up to a mirror that is as large as the sun in the sky).

Intensity with three overlapping beams.
Intensity with three overlapping beams.

At first this sounds like it kills the ring beam. But one can achieve a better focus by clever alignment. Consider three circular footprints arranged like a standard Venn diagram. The center area gets three times the solar input as the large circles. By using more mirrors one can make a peak intensity that is much higher than the side intensity. The vicinity will still be lit up very brightly, but you can focus your devastation better than with individual mirrors – and you can afford to waste sunlight anyway. Still, it looks like this is more of a wide footprint weapon of devastation rather than a surgical knife.

Intensity with 200 beams overlapping slightly.
Intensity with 200 beams overlapping slightly.