Funny identity matrix
Given any category C with finite products and coproducts such that the products distribute over the coproducts, you can make a compact closed bicategory Mat(C) of (natural numbers, matrices of elements of Ob(C), matrices of elements of Mor(C)) and define matrix composition by
Take the partial order whose objects are nonnegative integers and whose morphisms mean ; the product is gcd and the coproduct is lcm. In this category, the terminal object is 0 and the initial object is 1, so the identity matrix looks like
and matrix multiplication is
Reflective calculi
This is mostly for my own benefit.
We start with some termgenerating functor parametric in a set of ground terms and a set of names .
Now specialize to a single ground term:
Now mod out by structural equivalence:
Let be a set of names and let M be the set of nameequivalence classes (where is yet to be defined—it’s part of the theory of names in which we’re still parametric).
Prequoting and predereference are an algebra and coalgebra of , respectively:
such that
Real quoting and dereference have to use , defined below.
Define . Does ? I think so; assuming it does, define
so name equivalence is structural equivalence; equivalence of prequoted predereferences is automatic by the definition above.
The fixed point gives us an isomorphism
We can define , because undecidability doesn’t come into play until we add operational semantics. It’s decidable whether two terms are structurally equivalent. Thus
is the identity, satisfying the condition, and
is the identity, which we get for free.
When we mod out by operational semantics (following the traditional approach rather than the 2categorical one needed for pi calculus):
we have the quotient map
and a map
that picks a representative from the equivalence class.
It’s undecidable whether two terms are in the same operational equivalence class, so may not halt. However, it’s true by construction that
is the identity.
We can extend prequoting and predereference to quoting and dereference on by
and then
which is what we want for quoting and dereference. The other way around involves undecidability.
Monads without category theory, redux
I intended the last version of this post to be a simple oneoff, but people cared too much. A firebreathing Haskellite said that jQuery must never be used as an example of a monad because it’s (gasp) not even functional! And someone else didn’t understand JavaScript well enough to read my code snippets. So here’s another attempt that acknowledges the cultural divide between functional and imperative programmers and tries to explain everything in English as well as in source code.
Monads are a design pattern that is most useful to functional programmers because their languages prefer to implement features as libraries rather than as syntax. In Haskell there are monads for input / output, sideeffects and exceptions, with special syntax for using these to do imperativestyle programming. Imperative programmers look at that and laugh: “Why go through all that effort? Just use an imperative language to start with.” Functional programmers also tend to write programs as—surprise!—applying the composite of a bunch of functions to some data. The basic operation for a monad is really just composition in a different guise, so functional programmers find this comforting. Functional programmers are also more likely to use “continuations”, which are something like extrapowerful exceptions that only work well when there is no global state; there’s a monad for making them easier to work with.
There are, however, some uses for monads that even imperative programmers find useful. Collections like sets and lists (with or without parentheses), parsers, promises, and membranes are a few examples, which I’ll explain below.
Collections
Many collection types support the following three operations:
 Map a function over the elements of the collection.
 Flatten a collection of collections into a collection.
 Create a collection from a single element.
A monad is a class that provides a generalized version of these three operations. When I say “generalized”, I mean that they satisfy some of the same rules that the collections’ operations satisfy in much the same way that multiplication of real numbers is associative and commutative just like addition of real numbers.
The way monads are usually used is by mapping a function and then flattening. If we have a function f
that takes an element and returns a list, then we can say myList.map(f).flatten()
and get a new list.
Parsers
A parser is an object with a list of tokens that have already been parsed and the remainder of the object (usually a string) to be parsed.
var Parser = function (obj, tokens) { this.obj = obj; // If tokens are not provided, use the empty list. this.tokens = tokens  []; };
It has three operations like the collections above.
 Mapping a function over a parser applies the function to the contained obj.
Parser.prototype.map = function (f) { return new Parser(f(this.obj), this.tokens); };
 Flattening a parser of parsers concatenates the list of tokens.
Parser.prototype.flatten = function () { return new Parser(this.obj.obj, this.obj.tokens.concat(this.tokens)); };
The definition above means that
new Parser(new Parser(x, tokens1), tokens2).flatten()
is equivalent tonew Parser(x, tokens1.concat(tokens2))
.  We can create a parser from an element
x
:new Parser(x)
.
If we have a function f
that takes a string, either parses out some tokens or throws an exception, and returns a parser with the tokens and the remainder of the string, then we can say
myParser.map(f).flatten()
and get a new parser. In what follows, I create a parser with the string “Hi there” and then expect a word, then some whitespace, then another word.
var makeMatcher = function (re) { return function (s) { var m = s.match(re); if (!m) { throw new Error('Expected to match ' + re); } return new Parser(m[2], [m[1]]); }; }; var alnum = makeMatcher(/^([azAZ09]+)(.*)/); var whitespace = makeMatcher(/^(s+)(.*)/); new Parser('Hi there') .map(alnum).flatten() .map(whitespace).flatten() .map(alnum).flatten(); // is equivalent to new Parser('', ['Hi', ' ', 'there']);
Promises
A promise is an object that represents the result of a computation that hasn’t finished yet; for example, if you send off a request over the network for a webpage, the promise would represent the text of the page. When the network transaction completes, the promise “resolves” and code that was waiting on the result gets executed.
 Mapping a function
f
over a promise forx
results in a promise forf(x)
.  When a promise represents remote data, a promise for a promise is still just remote data, so the two layers can be combined; see promise pipelining.
 We can create a resolved promise for any object that we already have.
If we have a function f
that takes a value and returns a promise, then we can say
myPromise.map(f).flatten()
and get a new promise. By stringing together actions like this, we can set up a computation that will execute properly as various network actions complete.
Membranes
An objectcapability language is an objectoriented programming language where you can’t get a reference to an object unless you create it or someone calls one of your methods and passes a reference to it. A “membrane” is a design pattern that implements access control.
Say you have a folder object with a bunch of file objects. You want to grant someone temporary access to the folder; if you give them a reference to the folder directly, you can’t force them to forget it, so that won’t work for revokable access. Instead, suppose you create a “proxy” object with a switch that only you control; if the switch is on, the object forwards all of its method calls to the folder and returns the results. If it’s off, it does nothing. Now you can give the person the object and turn it off when their time is up.
The problem with this is that the folder object may return a direct reference to the file objects it contains; the person could lose access to the folder but could retain access to some of the files in it. They would not be able to have access to any new files placed in the folder, but would see updates to the files they retained access to. If that is not your intent, then the proxy object should hide any file references it returns behind similar new proxy objects and wire all the switches together. That way, when you turn off the switch for the folder, all the switches turn off for the files as well.
This design pattern of wrapping object references that come out of a proxy in their own proxies is a membrane.
 We can map a function
f
over a membrane forx
and get a membrane forf(x)
.  A membrane for a membrane for
x
can be collapsed into a single membrane that checks both switches.  Given any object, we can wrap it in a membrane.
If we have a function f
that takes a value and returns a membrane, then we can say
myMembrane.map(f).flatten()
and get a new membrane. By stringing together actions like this, we can set up arbitrary reference graphs, while still preserving the membrane creator’s right to turn off access to his objects.
Conclusion
Monads implement the abstract operations map
and flatten
, and have an operation for creating a new monad out of any object. If you start with an instance m
of a monad and you have a function f
that takes an object and returns a monad, then you can say
m.map(f).flatten();
and get a new instance of a monad. You’ll often find scenarios where you repeat that process over and over.
Overloading JavaScript’s dot operator with direct proxies
With the new ECMAScript 6 Proxy object that Firefox has implemented, you can make dot do pretty much anything you want. I made the dot operator in JavaScript behave like Haskell’s bind:
// I'll give a link to the code for lift() later, // but one thing it does is wrap its input in brackets. lift(6); // [6] lift(6)[0]; // 6 lift(6).length; // 1 // lift(6) has no "upto" property lift(6).upto; // undefined // But when I define this global function, ... // Takes an int n, returns an array of ints [0, ..., n1]. var upto = function (x) { var r = [], i; for (i = 0; i < x; ++i) { r.push(i); } return r; }; // ... now the object lift(6) suddenly has this property lift(6).upto; // [0,1,2,3,4,5] // and it automagically maps and flattens! lift(6).upto.upto; // [0,0,1,0,1,2,0,1,2,3,0,1,2,3,4] lift(6).upto.upto.length; // 15
To be clear, ECMAScript 6 has changed the API for Proxy since Firefox adopted it, but you can implement the new one on top of the old one. Tom van Cutsem has code for that.
I figured this out while working on a contracts library for JavaScript. Using the standard monadic style (e.g. jQuery), I wrote an implementation that doesn’t use proxies; it looked like this:
lift(6)._(upto)._(upto).value; // [0,0,1,0,1,2,0,1,2,3,0,1,2,3,4]
The lift
function takes an input, wraps it in brackets, and stores it in the value
property of an object. The other property of the object, the underscore method, takes a function as input, maps that over the current value and flattens it, then returns a new object of the same kind with that flattened array as the new value.
The direct proxy API lets us create a “handler” for a target object. The handler contains optional functions to call for all the different things you can do with an object: get or set properties, enumerate keys, freeze the object, and more. If the target is a function, we can also trap when it’s used as a constructor (i.e. new F()
) or when it’s invoked.
In the proxybased implementation, rather than create a wrapper object and set the value
property to the target, I created a handler that intercepted only get requests for the target’s properties. If the target has the property already, it returns that; you can see in the example that the length property still works and you can look up individual elements of the array. If the target lacks the property, the handler looks up that property on the window object and does the appropriate mapandflatten logic.
I’ve explained this in terms of the list monad, but it’s completely general. In the code below, mon
is a monad object defined in the category theorist’s style, a monoid object in an endofunctor category, with multiplication and unit. On line 2, it asks for a type to specialize to. On line 9, it maps the named function over the current state, then applies the monad multiplication. On line 15, it applies the monad unit.
var kleisliProxy = function (mon) { return function (t) { var mont = mon(t); function M(mx) { return Proxy(mx, { get: function (target, name, receiver) { if (!(name in mx)) { if (!(name in window)) { return undefined; } return M(mont['*'](mon(window[name]).t(mx))); } return mx[name]; } }); } return function (x) { return M(mont[1](x)); }; }; }; var lift = kleisliProxy(listMon)(int32); lift(6).upto.upto; // === [0,0,1,0,1,2,0,1,2,3,0,1,2,3,4]
Cantor’s diagonal proof
There’s a riddle about a library where the index had two parts, each in its own book. The first index book listed every book whose title appeared between its own covers; nearly every book in the library was listed in this index, since the title is typically on the title page. The other index book, a mere pamphlet really, was for those rare cases where the title was not between the covers of the book. Neither index had a title page, just a list of book titles and what shelf each book was on.
In which book was the first book listed? It could have been listed in itself, which is consistent with the claim that it lists all the books that contain their own titles. Or it could have been listed in the second book and not in itself, which would also be consistent.
The riddle is this: in which index was the second index book listed? If we suppose that the second book did not list itself, then it would be incomplete, since it is a book in the library that does not contain its title between the covers. If we suppose that it did list itself, then it would be inconsistent, since it is supposed to list only those books that do not contain their own titles. There is no way for the second index book to be both consistent and complete.
Georg Cantor was a mathematician who used this idea to show that there are different sizes of infinity! But before I go there, consider the Munduruku tribe of the Amazon, which has no words for specific numbers larger than five; how would a man with eight children tell if he has enough fruit to give one to each child? He would probably name each child and set aside a piece of fruit for them—that is, he would set up an isomorphism between the children and the fruit. Even though he cannot count either set, he knows that they are the same size.
We can do the same thing to compare infinite sets: even though we can’t count them, if we can show that there is an isomorphism between two infinite sets, we know they are the same size. Likewise, if we can prove that there is no possible isomorphism between the sets, then they must be different sizes.
So now suppose that we take the set of natural numbers {0, 1, 2, 3, …} and the set of positive even numbers {0, 2, 4, 6, …}. These sets both have infinitely many elements. Are they the same size? We can double every natural number and get an even one, and halve every positive even number and get a natural number. That’s an isomorphism, so they’re the same size.
How about natural numbers and pairs of natural numbers? Given two numbers like 32768 and 137, we can pad them to the same length and then interleave them: 3020716387; they come apart just as easily.
Since we can take the first number to be the denominator of a nonnegative rational number and the second number to be the numerator, we also find that the size of the set of rationals is the same as the size of the set of natural numbers.
Cantor came up with isomorphisms for each of these sets and thought that perhaps all infinite sets were the same size. But then he tried to come up with a way to match up the natural numbers and infinite sequences of bits, like the sequence of all zeros:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
or the sequence that’s all zeros except for the third one:
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, ...
or the sequence whose nth bit is 1 if n is prime:
0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, ...
or sequences with no pattern at all:
0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, ...
He tried and tried, but couldn’t figure out a way to make them match up. Then he thought of an ingenious proof that it’s impossible! It uses the same insight as the library riddle above.
Suppose we try to match them up. We write in the left column a natural number and in the right column a binary sequence:
1: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 2: 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, ... 3: 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, ... 4: 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, ... 5: 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, ... 6: 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, ... ...
Consider the diagonal of the table above: 0, 0, 1, 1, 0, 1, … This sequence is number 6 above, and is like the first index book: the sixth bit in the sequence could be either 0 or 1, and it would be consistent.
Now consider the opposite of the diagonal, its “photographic negative” sequence: 1, 1, 0, 0, 1, 0, … This sequence is like the second index book: it cannot consistently appear anywhere in the table. To see that, imagine if we had already assigned this sequence to number 7. (The number 7 is arbitrary; any other number will work just as well.) Then the table would look like this:
1: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 2: 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, ... 3: 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, ... 4: 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, ... 5: 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, ... 6: 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, ... 7: 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, ...
But look at the seventh bit of sequence 6! It’s supposed to be the seventh bit along the diagonal, and it’s wrong. If we correct it, then the 7th bit of sequence 7 is wrong, since it’s supposed to be the opposite of the 7th bit of sequence 6.
This proof shows that there are too many sequences of bits to match up to natural numbers. No matter how we try match them up, we can always find a contradiction by looking at the sequence defined by the opposite of the diagonal sequence. And as if two different sizes of infinity weren’t enough, Cantor was able to repeat this process and show that there are infinitely many sizes of infinity, each bigger than the last!
Many influential mathematicians didn’t like this idea, and attacked Cantor personally rather than trying to find fault with his proof. Religious philosophers misinterpreted his results and equated the idea of multiple infinities with pantheism! Confronted with so much antagonism, Cantor often fought with depression.
He persevered, however, and later in life was greatly praised for his work. The Royal Society gave him their highest honor, the Sylvester Medal; and no less a mathematician than David Hilbert exclaimed, “No one shall expel us from the paradise that Cantor has created.”
Semantics for the blue calculus
Blue calculus  MonCat  2Hilb  Set (as a one object bicategory)  
Types  monoidal categories  manifolds  2 Hilbert spaces  * 
Terms with one free variable  monoidal functors  manifolds with boundary  linear functors  sets 
Rewrite rules  monoidal natural transformation  manifolds with corners  linear natural transformations  functions 
Tensor product  juxtaposition (disjoint union)  tensor product  cartesian product  
where is free  formal sum of cobordisms with boundary from to  sum of linear functors  disjoint union 
In the MonCat column, is the categorified version of the tensor product of monoids.
Axiom of fun choice
A fun choice function is a function defined on a collection of jobs that must be done such that for every job is an element of fun. The axiom of fun choice states,
For any set of jobs that must be done, there exists a fun choice function defined on
This axiom asserts that one can always find the fun in any job that must be done; a theorem of Poppins deduces from this that all such jobs are games.
Escher “Print gallery” metapixel
This one uses a conformal transformation like Escher did in his “Print Gallery“:
A picture that contains a scaleddown copy of itself is periodic both in theta and in log (r). Taking the complex log of such an image gives a rectangular tiling of the plane; by scaling and rotating, I got a new tiling, then exponentiated it. Now going around 2 pi radians takes you diagonally across the original rectangle, so you end up one level away from where you started.
[Edit] See also these better versions.
Droste OTCA metapixel
John Conway and some of his friends invented a cellular automaton called “Life”. It’s Turing complete, so you can make a Life pattern that runs Life. The OTCA metapixel has all its logic around the edges of a vast field that gets filled with lightweight spaceships when the cell is on. (See here, here, and here for more details.)
This image uses a logarithmic spiral to zoom through a factor of 2048 after one turn. There’s a little bit of misalignment between the largest scale and the smallest one, but I think it still turned out pretty well (click for a larger view):
Astounding protein folding paper
By assuming that to get from point A to point B you don’t have to hit a sequence of points in between—i.e. the way quantum particles work—these guys accurately predict protein folding rates in 15 different real proteins. This is huge.
In case it wasn’t already abundantly clear…
Regular tilings of threedimensional spaces
If you start at the north pole and make an equilateral triangle 6000 miles on a side, the bottom will lie on the equator, each of the angles will be 90 degrees, and only four of them will fit around the pole.
In a similar way, large enough tetrahedra would tile the surface of a hypersphere. This paper identifies the eleven regular tilings of threedimensional spaces and whether they’re spherical, Euclidean, or hyperbolic tilings, and then looks at the geometry of spacetime to see how it might be tiled.
The “cubic” tilings (where eight polyhedra meet around a vertex like cubes do in Euclidean space) are amenable to taking crosssections; this tiling of hyperbolic space with dodecahedra
has a cross section with a tiling of the hyperbolic plane with pentagons:
The Word of God
From desert cliff and mountaintop we trace the wide design,
Strikeslip fault and overthrust and syn and anticline…
We gaze upon creation where erosion makes it known,
And count the countless aeons in the banding of the stone.
Odd, longvanished creatures and their tracks & shells are found;
Where truth has left its sketches on the slate below the ground.
The patient stone can speak, if we but listen when it talks.
Humans wrote the Bible; God wrote the rocks.
There are those who name the stars, who watch the sky by night,
Seeking out the darkest place, to better see the light.
Long ago, when torture broke the remnant of his will,
Galileo recanted, but the Earth is moving still.
High above the mountaintops, where only distance bars,
The truth has left its footprints in the dust between the stars.
We may watch and study or may shudder and deny,
Humans wrote the Bible; God wrote the sky.
By stem and root and branch we trace, by feather, fang and fur,
How the living things that are descend from things that were.
The moss, the kelp, the zebrafish, the very mice and flies,
These tiny, humble, wordless things–how shall they tell us lies?
We are kin to beasts; no other answer can we bring.
The truth has left its fingerprints on every living thing.
Remember, should you have to choose between them in the strife,
Humans wrote the Bible; God wrote life.
And we who listen to the stars, or walk the dusty grade,
Or break the very atoms down to see how they are made,
Or study cells, or living things, seek truth with open hand.
The profoundest act of worship is to try to understand.
Deep in flower and in flesh, in star and soil and seed,
The truth has left its living word for anyone to read.
So turn and look where best you think the story is unfurled.
Humans wrote the Bible; God wrote the world.
Catherine Faber, The Word of God
The Sum of Forking Paths
Paracelsus threw the rose into the fire; it bent on impact, recoiled, and fell deeper among the logs. In the goldorange light, the stem was already black. The husk began to shrivel and split as the sap boiled away on its surface. The petals blistered and blackened and fell. The fivefingered star beneath them danced subtly, swaying in the brittle heat. For nearly an hour it lay visibly unchanged save for a gradual loss of hue and a universal greyness, then fell into three large pieces as the log crumbled around it. The ashes glowed orange, then gradually dimmed; the last visible flash of light burst outward from the remains of the stem. Like all light, it carried within it a timepiece.
Once, when the clock read noon, it traveled without hesitation in a straight path to my retina. Once it took another course, only to bend around a molecule of nitrogen and reach the same destination.
Once it traced the signature of a man no one remembers.
Once, at half past three, it decayed into a tiny spark and its abominable opposite image; their mutual horrified fascination drew them together, each a moth in the other’s flame. The last visible flash of light from their fiery consummation was indistinguishable from the one that spawned them.
Once, when the clock ticked in the silence just before dawn, the light decayed into two mirrored worlds, somewhat better than ours due to the fact that I was never born there. Both worlds were consumed by mirrored dragons before collapsing back into the chaos from which they arose; all that remained was an orange flash of light.
Once it traveled to a far distant galaxy, reflected off the waters of a billion worlds and witnessed the death of a thousand stars before returning to the small room in which we sat.
Once it transcribed a short story of Borges in his own cramped scrawl, the six versions he discarded, the corrupt Russian translation by Nabokov, and a version in which the secondtolast word was illegible.
Once it traveled every path, each in its time; once it became and destroyed every possible world. All these summed to form what was: I saw an orange flash, and in that moment, I was enlightened.
Formal axiomatic divination
It’s well known that “in Xanadu did Kublai Khan a stately pleasure dome decree,” but his true legacy is the field of formal axiomatic divination. In 1279, Khan sought an auspicious date on which to begin construction of the palace. He consulted each of his twelve astrologers separately and without warning; unsurprisingly, he received twelve different answers. Khan flew into a rage and said that until the astrologer’s craft was as precise as that of his masons and carpenters, they were banished from his presence.
Kublai Khan died in 1294 and his successor Temur Khan was convinced to reinstate the astrologers. Despite this, the young mathematician Zhu Shijie took up the old Khan’s challenge in 1305. Zhu had already completed two enormously influential mathematical texts: Introduction to Mathematical Studies, published in 1299, and True reflections of the four unknowns, published in 1303. This latter work included a table of “the ancient method of powers”, now known as Pascal’s triangle, and Zhu used it extensively in his analysis of polynomials in up to four unknowns.
In turning to the analysis of divination, Zhu naturally focused his attention on the I Ching. The first step in performing an I Ching divination is casting coins or yarrow stalks to construct a series of hexagrams. In 1308, Zhu published his treatise on probability theory, Path of the falling stone. It included an analysis of the probability for generating each hexagram as well as betting strategies for several popular games of chance. Using his techniques, Zhu became quite wealthy and began to travel; it was during this period that he was exposed to the work of the mathematicians in northern China. In the preface to True reflections, Mo Ruo writes that “Zhu Shijie of Yanshan became famous as a mathematician. He travelled widely for more than twenty years and the number of those who came to be taught by him increased each day.”
Zhu worked for nearly a decade on the subsequent problem, that of interpreting a series of hexagrams. Hexagrams themselves are generated one bit at a time by looking at remainders modulo four of random handfuls of yarrow stalks; the four outcomes either specify the next bit directly or in terms of the previous bit. These latter rules give I Ching its subtitle, The Book of Changes. For mystical reasons, Zhu asserted that the proper interpretation of a series of hexagrams should also be given by a set of changes, but for years he could find no reason to prefer one set of changes to any other. However, in 1316, Zhu wrote to Yang Hui:
“I dreamed that I was summoned to the royal palace. As I stepped upon the threshold, the sun burst forth over the gilded tile; I was blinded and, overcome, I fell to my knees. I lifted my hand to shield my eyes from its brilliance, and the Emperor himself took it and raised me up. To my surprise, he changed his form as I watched; he became so much like me that I thought I was looking in a mirror.
“‘How can this be?’ I cried. He laughed and took the form of a phoenix; I fell back from the flames as he ascended to heaven, then sorrowed as he dove toward the Golden Water River, for the water would surely quench the bird. Yet before reaching the water, he took the form of an eel, dove into the river and swam to the bank; he wriggled ashore, then took the form of a seed, which sank into the earth and grew into a mighty tree. Finally he took his own form again and spoke to me: ‘I rule all things; things above the earth and in the earth and under the earth, land and sea and sky. I can rule all these because I rule myself.’
“I woke and wondered at the singularity of the vision; when my mind reeled in amazement and could stand no more, it retreated to the familiar problem of the tables of changes. It suddenly occurred to me that as the Emperor could take any form, there could be a table of changes that could take the form of any other. Once I had conceived the idea, the implementation was straightforward.”
The rest of the letter has been lost, but Yang Hui described the broad form of the changes in a letter to a friend; the Imperial Changes were a set of changes that we now recognize as a Turingcomplete programming language, nearly seven hundred years before Turing. It was a type of register machine similar to Melzak’s model, where seeds were ‘planted’ in pits; the lists of hexagrams generated by the yarrow straws were the programs, and the result of the computation was taken as the interpretation of the casting. Zhu recognized that some programs never stopped–some went into infinite loops, some grew without bound, and some behaved so erratically he couldn’t decide whether they would ever give an interpretation.
Given his fascination with probabilities, it was natural that Zhu would consider the probability that a string of hexagrams had an interpretation. We do not have Zhu’s reasoning, only an excerpt from his conclusion: “The probability that a list of hexagrams has an interpretation is a secret beyond the power of fate to reveal.” It may be that Zhu anticipated Chaitin’s proof of the algorithmic randomness of this probability as well.
All of Zhu’s works were lost soon after they were published; True reflections survived in a corrupted form through Korean (1433 AD) and Japanese (1658 AD) translations and was reintroduced to China only in the nineteenth century. One wonders what the world might have been like had the Imperial Changes been understood and exploited. We suppose it is a secret beyond the power of fate to reveal.
Imaginary Time 2
Another part of the analogy I started here, but this time using inverse temperature instead of imaginary time. It describes a thermometer where a mass changes position with temperature. I’m guessing this stuff only applies when the temperature is changing adiabatically.
Thermometer (unitless temperature):  
[1]  inverse temperature (unitless)  
[m]  y coordinate  
[kg/s^2 K]  spring constant * temp unit conversion  
[m]  how position changes with (inverse) temperature  
[kg m/s^2 K]  force per Kelvin  
[kg m^2/s^2 K]  stretching energy per Kelvin  
[kg m^2/s^2 K]  potential energy per Kelvin  
[kg m^2/s^2 K]  entropy  
Thermometer:  
[1/K]  inverse temperature  
[m]  y coordinate  
[kg/s^2 K^2 = bits/m^2 K]  how information density changes with temp  
[m K]  how position changes with (inverse) temperature  
[kg m/s^2 K = bits/m]  force per Kelvin  
[kg m^2/s^2 = bits K]  stretching energy = change in stretching information with invtemp  
[kg m^2/s^2 = bits K]  potential energy = change in potential information with invtemp  
[bits]  entropy 
I assume that the dynamics of such a system would follow a path where is that a minimumentropy path or a maximum?
An analogy
Stat mech  Quant mech  
column vector  distribution 
amplitude distribution (wave function) 
row vector  population , where are the coefficients that, when normalized, maximize the inner product with 
, where is the complex conjugate of . 
normalization  
transitions  stochastic  unitary 
harmonic oscillator  many HOs at temperature  one QHO evolving for time 
uniform distribution over states  
special distribution  Gibbs distribution  Free evolution 
partition function = inner product with  
path integrals  = maximum entropy?  = least action? 
Entropic gravity
Erik Verlinde has been in the news recently for revisiting Ted Jacobson’s suggestion that gravity is an entropic force rather than a fundamental one. The core of the argument is as follows:
Say we have two boxes, one inside the other:
++    ++               ++  ++
Say the inner box has room for ten bits on its surface and the outer one room for twenty. Each box can use as many “1”s as there are particles inside it:
++  X   ++        X        ++  ++
In this case, the inner box has only one particle inside, so there are 10 choose 1 = 10 ways to choose a labeling of the inner box; the outer box has two particles inside, so there are 20 choose 2 = 190 ways. Thus there are 1900 ways to label the system in all.
If both particles are in the inner box, though, the number of ways increases:
++    ++        X X        ++  ++
The inner box now has 10 choose 2 ways = 45, while the outer box still has 190. So using the standard assumption that all labelings are equally likely, it’s 4.5 times as likely to find both particles in the inner box, and we get an entropic force drawing them together.
The best explanation of Verlinde’s paper I’ve seen is Sabine Hossenfelder’s Comments on and Comments on Comments on Verlinde’s paper “On the Origin of Gravity and the Laws of Newton”.
Simple activity for teaching about radiometric dating
This works best with small groups of about 510 students and at least thirty dice. Divide the dice evenly among the students.
 Count the number of dice held by the students and write it on the board.
 Have everyone roll each die once.
 Collect all the dice that show a ‘one’, count them, write the sum on the board, then set them aside.
 Go back to step 1.
A run with 30 dice will look something like this:
dice  number of ones 
30  5 
25  4 
21  4 
17  3 
14  1 
13  3 
10  2 
8  1 
7  1 
6  0 
6  1 
5  0 
5  1 
4  1 
3  0 
3  0 
3  0 
3  1 
2  1 
1  0 
1  0 
1  0 
1  0 
1  1 
Point out how the number of dice rolling a one on each turn is about one sixth of the dice that hadn’t yet rolled a one on the previous turn. Also, that you lose about half of the remaining dice after about four turns.
Send someone out of the room; do either four or eight turns, then bring them back and ask them to guess how many turns the group took. The student should be able to see that if half the dice are left, there were only four turns, but if a quarter of the dice are left, there were eight turns.
If the students are advanced enough to use logarithms, try the above with some number other than four or eight and have the student use logarithms to calculate the number of turns:
turns = log(number remaining/total) / log(5/6),
or, equivalently, in terms of the halflife (which is really closer to 3.8 than 4):
turns = 3.8 * log(number remaining/total) / log(1/2).
When Zircon crystals form, they strongly reject lead atoms: new zircon crystals have no lead in them. They easily accept uranium atoms. Each die represents a uranium atom, and rolling a one represents decaying into a lead atom: because uranium atoms are radioactive, they can lose bits of their nucleus and turn into lead–but only randomly, like rolling a die. Instead of four turns, the halflife of U238 is 4.5 billion years.
Zircon forms in almost all rocks and is hard to break down. So to judge the age of a rock, you get the zircon out, throw it in a mass spectrometer, look at the proportion of uranium to (lead plus uranium) and calculate
years = 4.5 billion * log(mass of uranium/mass of (lead+uranium)) / log(1/2).
Problem: given a zircon crystal where there’s one lead atom for every ninetynine uranium atoms, how long ago was it formed?
4.5 billion * log(99/100) / log(1/2) = 65 million years ago.
In reality, it’s slightly more complicated: there are two isotopes of uranium and several of lead. But this is a good thing, since we know the halflives of both isotopes and can use them to crosscheck each other; it’s as though each student had both six and twentysided dice, and the student guessing the number of turns could use information from both groups to refine her guess.
Escher and Mandelbrot
If you take a complex number z with argument θ and square it, you double θ. The Mandelbrot/Julia iteration
z ↦ z^{2} + c
does pretty much the same thing, but adds wiggles to the curve. Since the iterations stop when z > 2, the boundary at the zeroth iteration is a circle; after the first it’s a pear shape, and so on. We can map any point in the region between bands to a point in a rectangular tile that’s periodic once along the outside edge and twice along the inside edge. Here’s a site with a few different examples.
The transformation Escher used in “Print Gallery” takes concentric circles at r=1/r^{n} to a logarithmic spiral. The concentric boundaries between iterations for the Julia set at c = 0 are circles. There ought to be a transformation for Mandelbrot / Julia sets similar to the Droste effect but spiraling inward so that the frequency is smoothly increasing just as the distance can be made to smoothly increase.
Lazulinos
Lazulinos are quasiparticles in a naturally occurring BoseEinstein condensate first described in 1977 by the Scottish physicist Alexander Craigie while at the University of Lahore [3]. The quasiparticles are weakly bound by an interaction for which neither the position nor number operator commutes with the Hamiltonian. A measurement of a lazulino’s position will cause the condensate to go into a superposition of number states, and a subsequent measurement of the population will return a random number; also, counting the lazulinos at two different times will likely give different results.
Their name derives from the stone lapis lazuli and means, roughly, “little blue stone”. Lazulinos are so named because even though the crystals in which they arise absorb visible light, and would otherwise be jet black, they lose energy through surface plasmons in the form of nearultraviolet photons, with visible peaks at 380, 402, and 417nm. Optical interference imparts a “laser speckle” quality to the emitted light; Craigie described the effect in a famously poetic way: “Their colour is the blue that we are permitted to see only in our dreams”. What makes lazulinos particularly interesting is that they are massive and macroscopic. Since the number operator does not commute with the Hamiltonian, lazulinos themselves do not have a welldefined mass; if the population is N, then the mass of any particular lazulino is m/N, where m is the total mass of the condensate.
In a recent followup to the “quantum mirage” experiment [2], Don Eigler’s group at IBM used a scanning tunneling microscope to implement “quantum mancala”—picking up the lazulino ‘stones’ in a particular location usually changes the number of stones, so the strategy for winning becomes much more complicated. In order to pick up a fixed number of stones, you must choose a superposition of locations [1].
 C.P. Lutz and D.M. Eigler, “Quantum Mancala: Manipulating Lazulino Condensates,” Nature 465, 132 (2010).
 H.C. Manoharan, C.P. Lutz and D.M. Eigler, “Quantum Mirages: The Coherent Projection of Electronic Structure,” Nature 403, 512 (2000). Images available at http://www.almaden.ibm.com/almaden/media/image_mirage.html
 A. Craigie, “Surface plasmons in cobaltdoped Y_{3}Al_{5}O_{12},” Phys. Rev. D 15 (1977). Also available at http://tinyurl.com/35oyrnd.
Coends
Coends are a categorified version of “summing over repeated indices”. We do that when we’re computing the trace of a matrix and when we’re multiplying two matrices. It’s categorified because we’re summing over a bunch of sets instead of a bunch of numbers.
Let be a small category. The functor assigns
 to each pair of objects the set of morphisms between them, and
 to each pair of morphisms a function that takes a morphism and returns the composite morphism , where
It turns out that given any functor we can make a new category where and are subcategories and is actually the hom functor; some keywords for more information on this are “collages” and “Artin glueing”. So we can also think of as assigning
 to each pair of objects a set of morphisms between them, and
 to each pair of morphisms a function that takes a morphism and returns the composite morphism , where and
We can think of these functors as adjacency matrices, where the two parameters are the row and column, except that instead of counting the number of paths, we’re taking the set of paths. So is kind of like a matrix whose elements are sets, and we want to do something like sum the diagonals.
The coend of is the coequalizer of the diagram
The top set consists of all the pairs where
 the first element is a morphism and
 the second element is a morphism
The bottom set is the set of all the endomorphisms in
The coequalizer of the diagram, the coend of is the bottom set modulo a relation. Starting at the top with a pair the two arrows give the relation
where I’m using the lollipop to mean a morphism from
So this says take all the endomorphisms that can be chopped up into a morphism from going one way and a from going the other, and then set For this to make any sense, it has to identify any two objects related by such a pair. So it’s summing over all the endomorphisms of these equivalence classes.
To get the trace of the hom functor, use in the analysis above and replace the lollipop with a real arrow. If that category is just a group, this is the set of conjugacy classes. If that category is a preorder, then we’re computing the set of isomorphism classes.
The coend is also used when “multiplying matrices”. Let Then the top set consists of triples the bottom set of pairs and the coend is the bottom set modulo
That is, it doesn’t matter if you think of as connected to or to ; the connection is associative, so you can go all the way from to
Notice here how a morphism can turn “inside out”: when and the identities surround a morphism in , it’s the same as being surrounded by morphisms in and ; this is the difference between a trace, where we’re repeating indices on the same matrix, and matrix multiplication, where we’re repeating the column of the first matrix in the row of the second matrix.
Theories and models
The simplest kind of theory is just a set thought of as a set of concepts or Platonic ideals. We typically have some other set thought of as the set of real things that are described by concepts. Then a model is a function For example, we could let = {0, 1}; this is the theory of the number “two”, since it has two elements. Whatever set we choose for , the models of the theory are going to be pairs of elements of . So if = the set of people, models of in are going to be pairs of people (where choosing the same person twice is allowed).
Concepts, however, are usually related to each other, whereas in a set, you can only ask if elements are the same or not. So the way a theory is usually presented is as a category For example let be the category with two objects and two parallel nontrivial morphisms and let be the category Set of sets and functions. A model is a structurepreserving map from the category to Set, i.e. a functor. Each object of gets mapped to a set; here we think of the image of as a set of vertices and the image of as a set of edges. Each morphism of gets mapped to a function; and take an edge and produce the source vertex or target vertex, respectively. The category = Th(Graph) is the theory of a graph, and its models are all graphs.
Usually, our theories have extra structure. Consider the first example of a model, a function between sets. We can add structure to the theory; for example, we can take the set to be the ring of integers Then a model is a structurepreserving function, a homomorphism between and Of course, this means that has to have at least as much structure as We could, for instance, take to be the real numbers under multiplication. Since this ring homomorphism is entirely determined by where we map 1, and we can choose any real number for its image, there would be one model for each real number; each integer would map to for some Another option is to take the integers modulo 4. There are three nonisomorphic models of in . If we map 1 to 0, we get the trivial ring; if we map 1 to 1 or 3, we get integers modulo 4; and if we map 1 to 2, we get integers modulo 2.
Similarly, we can add structure to a category. If we take monoidal categories then we can tensor objects together to get new ones. A model of such a theory is a tensorproductpreserving functor from to See my paper “Physics, Topology, Computation, and Logic: a Rosetta Stone” with John Baez for a thorough exploration of theories that are braided monoidal closed categories and models of these.
An element of the set is a number, while an object of the category Th(Graph) is a set. A theory is a mathematical gadget in which we can talk about theories of one dimension lower. In Java, we say “interface” instead of “theory” and “class” instead of “model”. With Java interfaces we can describe sets of values and functions between them; it is a cartesian closed category whose objects are datatypes and whose morphisms are (roughly) programs. Models of an interface are different classes that implement that interface.
And there’s no reason to stop at categories; we can consider bicategories with structure and structurepreserving functors between these; these higher theories should let us talk about different models of computation. One model would be Turing machines, another lambda calculus, a third would be the Java Virtual Machine, a fourth Pi calculus.
Aleph and Omega
I shut my eyes — I opened them. Then I saw the Aleph.
I arrive now at the ineffable core of my story. And here begins my despair as a writer. All language is a set of symbols whose use among its speakers assumes a shared past. How, then, can I translate into words the limitless Aleph, which my floundering mind can scarcely encompass? Mystics, faced with the same problem, fall back on symbols: to signify the godhead, one Persian speaks of a bird that somehow is all birds; Alanus de Insulis, of a sphere whose center is everywhere and circumference is nowhere; Ezekiel, of a fourfaced angel who at one and the same time moves east and west, north and south. (Not in vain do I recall these inconceivable analogies; they bear some relation to the Aleph.) Perhaps the gods might grant me a similar metaphor, but then this account would become contaminated by literature, by fiction. Really, what I want to do is impossible, for any listing of an endless series is doomed to be infinitesimal. In that single gigantic instant I saw millions of acts both delightful and awful; not one of them occupied the same point in space, without overlapping or transparency. What my eyes beheld was simultaneous, but what I shall now write down will be successive, because language is successive. Nonetheless, I’ll try to recollect what I can.
On the back part of the step, toward the right, I saw a small iridescent sphere of almost unbearable brilliance. At first I thought it was revolving; then I realised that this movement was an illusion created by the dizzying world it bounded. The Aleph’s diameter was probably little more than an inch, but all space was there, actual and undiminished. Each thing (a mirror’s face, let us say) was infinite things, since I distinctly saw it from every angle of the universe. I saw the teeming sea; I saw daybreak and nightfall; I saw the multitudes of America; I saw a silvery cobweb in the center of a black pyramid; I saw a splintered labyrinth (it was London); I saw, close up, unending eyes watching themselves in me as in a mirror; I saw all the mirrors on earth and none of them reflected me; I saw in a backyard of Soler Street the same tiles that thirty years before I’d seen in the entrance of a house in Fray Bentos; I saw bunches of grapes, snow, tobacco, lodes of metal, steam; I saw convex equatorial deserts and each one of their grains of sand; I saw a woman in Inverness whom I shall never forget; I saw her tangled hair, her tall figure, I saw the cancer in her breast; I saw a ring of baked mud in a sidewalk, where before there had been a tree; I saw a summer house in Adrogué and a copy of the first English translation of Pliny — Philemon Holland’s — and all at the same time saw each letter on each page (as a boy, I used to marvel that the letters in a closed book did not get scrambled and lost overnight); I saw a sunset in Querétaro that seemed to reflect the colour of a rose in Bengal; I saw my empty bedroom; I saw in a closet in Alkmaar a terrestrial globe between two mirrors that multiplied it endlessly; I saw horses with flowing manes on a shore of the Caspian Sea at dawn; I saw the delicate bone structure of a hand; I saw the survivors of a battle sending out picture postcards; I saw in a showcase in Mirzapur a pack of Spanish playing cards; I saw the slanting shadows of ferns on a greenhouse floor; I saw tigers, pistons, bison, tides, and armies; I saw all the ants on the planet; I saw a Persian astrolabe; I saw in the drawer of a writing table (and the handwriting made me tremble) unbelievable, obscene, detailed letters, which Beatriz had written to Carlos Argentino; I saw a monument I worshipped in the Chacarita cemetery; I saw the rotted dust and bones that had once deliciously been Beatriz Viterbo; I saw the circulation of my own dark blood; I saw the coupling of love and the modification of death; I saw the Aleph from every point and angle, and in the Aleph I saw the earth and in the earth the Aleph and in the Aleph the earth; I saw my own face and my own bowels; I saw your face; and I felt dizzy and wept, for my eyes had seen that secret and conjectured object whose name is common to all men but which no man has looked upon — the unimaginable universe.
I felt infinite wonder, infinite pity…
(Jorge Luis Borges, The Aleph)
A finitelyrefutable question is one of the form, “Does property X holds for all natural numbers?” Any mathematical question admitting a proof or disproof is in this category. If you believe the ideas of digital physics, then any question about the behavior of some portion of the universe is in this category. We can encode any finitely refutable question as a program that iterates through the natural numbers and checks to see if it’s a counterexample. If so, it halts; if not, it goes to the next number.
The halting probability of a universal Turing machine is a number between zero and one. Given the first bits of this number, there is a program that will compute which bit programs halt and which don’t. Assuming digital physics, all those things Borges wrote about in the Aleph are in the Omega. There’s a trivial way–the Omega is a normal number, so every sequence of digits appears infinitely often–but there’s a more refined way: ask any finitelyrefutable question using an bit program and the first bits of Omega contain the proper information to compute the answer.
The bits of Omega are pure information; they can’t be computed from a fixedsize program, like the bits of can.
Renormalization and Computation 4
This is the fourth in a series of posts on Yuri Manin’s pair of papers. In the previous posts, I laid out the background; this time I’ll actually get around to his result.
A homomorphism from the Hopf algebra into a target algebra is called a character. The functor that assigns an action to a path, whether classical or quantum, is a character. In the classical case, it’s into the rig and we take an infimum over paths; in the quantum it’s into the rig and we take an integral over paths. Moving from the quantum to the classical case is called Maslov dequantization.
Manin mentions that the runtime of a parallel program is a character akin to the classical action, with the runtime of the composition of two programs being the sum of the respective runtimes, while the runtime of two parallel programs is the maximum of the two. A similar result holds for nearly any cost function. He also points out that computably enumerable reals form a rig He examines RotaBaxter operators as a way to generalize what “polar part” means and extend the theorems on Hopf algebra renormalization to such rigs.
In the second paper, he looks at my work with Calude as an example of a character. He uses our same argument to show that lots of measures of program behavior have the property that if the measure hasn’t stopped growing after reaching a certain large amount with respect to the program size, then the density of finite values the measure could take decreases like Surprisingly, though he referred to these results as cutoffs, he didn’t actually use them anywhere for doing regularization.
Reading between the lines, he might be suggesting something like approximating the Kolmogorov complexity that he uses later by using a time cutoff, motivated by results from our paper: there’s a constant depending only on the programming language such that if you run the th program steps and it hasn’t stopped, then the density of times near at which it could stop is roughly
Levin suggested using a computable complexity that’s the sum of the program length and the log of the number of time steps; I suppose you could “regularize” the Kolmogorov complexity by adding to the length of the program, renormalize, and then let go to zero, but that’s not something Manin does.
Instead, he proposed two other constructions suitable for renormalization; here’s the simplest. Given a partial computable function define the computably enumerable by if is defined, and 0 otherwise. Now define
When is undefined, which has a pole at When is defined, converges everywhere except Birkhoff decomposition would separate these two cases, though I’m not sure what value would take or what it would mean.
The other construction involves turning into a permutation and inventing a function that has poles when the permutation has fixpoints.
So Manin’s idea of renormalizing the halting problem is to do some uncomputable stuff to get an easytorenormalize function and then throw the Birkhoff decomposition at it; since we know the halting problem is undecidable, perhaps the fact that he didn’t come up with a new technique for extracting information about the problem is unsurprising, but after putting in so much effort to understand it, I was left rather disappointed: if you’re going to allow yourself to do uncomputable things, why not just solve the halting problem directly?
I must suppose that his intent was not to tackle this hard problem, but simply to play with the analogy he’d noticed; it’s what I’ve done in other papers. And being forced to learn renormalization was exhilarating! I have a bunch of ideas to follow up; I’ll write them up as I get a chance.
Renormalization and Computation 2
This is the second in a series of posts covering Yuri Manin’s ideas involving Hopf algebra renormalization of the Halting problem. Last time I showed how perturbing a quantum harmonic oscillator gave a sum over integrals involving interactions with the perturbation; we can keep track of the integrals using Feynman diagrams, though in the case of a single QHO they weren’t very interesting.
One point about the QHO needs emphasis at this point. Given a wavefunction describing the state of the QHO, it must be the case that we get some value when we measure the energy; so if we sum up the norms of the probability amplitudes, we should get unity:
This is called the normalization condition.
When we perturb the QHO, the states are no longer the energy eigenvectors of the new Hamiltonian. We can express the new eigenvectors in terms of the old ones:
where is the strength of the perturbation, and we reexpress our wavefunction in this new basis:
Since we’re working with a new set of coefficients, we have to make sure they sum up to unity, too:
This is the renormalization condition. So renormalization is about making sure things sum up right once you perturb the system.
I want to talk about renormalization in quantum field theory; the trouble is, I don’t actually know quantum field theory, so I’ll just be writing up what little I’ve gathered from reading various things and conversations with Dr. Baez. I’ve likely got some things wrong, so please let me know and I’ll fix them.
A field is a function defined on spacetime. Scalar fields are functions with a single output, whereas vector fields are functions with several outputs. The electromagnetic field assigns a single number, called the electric field, and a vector, called the magnetic field, to every point in spacetime. When you have two electrons and move one of them, it feels a reaction force and loses momentum; the other electron doesn’t move until the influence, traveling at the speed of light, reaches it. Conservation of momentum says that the momentum has to be somewhere; it’s useful to consider it to be in the electromagnetic field.
When you take the Fourier transform of the field, you get a function that assigns values to harmonics of the field; in the case of electromagnetism, the transformed field assigns a value to each color of light. Quantizing this transformed field amounts to making into a creation operator, just like in the QHO example from last time. So we have a continuum of QHOs, each indexed by a color (By the way—the zerodimensional Fourier transform is the identity function, so the QHO example from last time can be thought of both as the field at the unique point in spacetime and the field at the unique frequency.)
When we move to positivedimensional fields, we get more interesting pictures, like these from quantum electrodynamics:
Here, our coupling constant is the fine structure constant where is the charge of the electron. For each vertex, we write down our coupling constant times times a delta function saying that the incoming momentum minus the outgoing momentum equals zero. For each internal line, we write down a propagator—a function representing the transfer of momentum from one point to another; it’s a function of the fourmomentum —and multiply all this stuff together. Then we integrate over all fourmomenta and get something that looks like
The trouble is, this integral usually gives infinity for an answer. We try to work around this in two steps: first, we regularize the integral by introducing a length scale This represents the point at which gravity starts being important and we need to move to a more fundamental theory. In the quantum field theory of magnetic domains in iron crystals, the length scale is the interatom distance in the lattice. Regularization makes the integral finite for away from some singularity.
There are a few different ways of regularizing; one is to use as a momentum cutoff:
This obviously converges, and solutions to this are always a sum of three parts:
 The first part diverges as either logarithmically or as a power of
 The second part is finite and independent of
 The third part vanishes as
Renormalization in this case amounts to getting rid of the first part.
These three parts represent three different length scales: at lengths larger than all quantum or statistical fluctuations are negligible, and we can use the mean field approximation and do classical physics. At lengths between and we use QFT to calculate what’s going on. Finally, at lengths smaller than we need a new theory to describe what’s going on. In the case of QED, the new theory is quantum gravity; string theory and loop quantum gravity are the serious contenders for the correct theory.
The problem with this regularization scheme is that it doesn’t preserve gauge invariance, so usually physicists use another regularization scheme, called “dimensional regularization”. Here, we compute
which gives us an expression involving gamma functions of , where gamma is the continuous factorial function, and then we set The solutions to this are also a sum of three terms—a divergent part, a finite part, and a vanishing part—and then renormalization gets rid of the divergent part.
Assume we have some theory with a single free parameter . We’d like to calculate a function perturbatively in terms of , where represents some physical quantity, and we know . We assume takes the form
and assume that this definition gives us divergent integrals for the The first step is regularization: instead of we have a new function
Now we get to the business of renormalization! We solve this problem at each order; if the theory is renormalizable, knowing the solution at the previous order will give us a constraint for the next order, and we can subtract off all the divergent terms in a consistent way:
 Order
Here, Since it’s a constant, it has to match so In this approximation, the coupling constant takes the classical value.
 Order
Let where Plugging this into the definition of we get
Using we get which diverges as In the case of QED, this says that the charge on the electron is infinite. While the preferred interpretation these days is that quantum gravity is a more fundamental theory that takes precedence on very small scales (a Planck length is to a proton as a proton is to a meter), when the theory was first introduced, there was no reason to think that we’d need another theory. So the interpretation was that with an infinite charge, an electron would be able to extract an infinite amount of energy from the electromagnetic field. Then the uncertainty principle would create virtual particles of all energies, which would exist for a time inversely proportional to the energy. The particles can be charged, so they line up with the field and dampen the strength just like dielectrics. In this interpretation, the charge on the electron depends on the energy of the particles you’re probing it with.
So to second order,
A theory is therefore only renormalizable if the divergent part of is independent of In QED it is. We can now define as the limit
 Higher orders.
In a renormalizable theory, the process continues, with the counterterms entirely specified by knowing
Renormalization and Computation 1
Yuri Manin recently put two papers on the arxiv applying the methods of renormalization to computation and the Halting problem. Grigori Mints invited me to speak on Manin’s results at the weekly Stanford logic seminar because in his second paper, he expands on some of my work.
In these next few posts, I’m going to cover the idea of Feynman diagrams (mostly taken from the lecture notes for the spring 2004 session of John Baez’s Quantum Gravity seminar); next I’ll talk about renormalization (mostly taken from Andrew Blechman’s overview and B. Delamotte’s “hint”); third, I’ll look at the Hopf algebra approach to renormalization (mostly taken from this post by Urs Schreiber on the nCategory Café); and finally I’ll explain how Manin applies this to computation by exploiting the fact that Feynman diagrams and lambda calculus are both examples of symmetric monoidal closed categories (which John Baez and I tried to make easy to understand in our Rosetta stone paper), together with some results on the density of halting times from my paper “Most programs stop quickly or never halt” with Cris Calude. I doubt all of this will make it into the talk, but writing it up will make it clearer for me.
Renormalization is a technique for dealing with the divergent integrals that arise in quantum field theory. The quantum harmonic oscillator is quantum field theory in 0+1 dimensions—it describes what quantum field theory would be like if space consisted of a single point. It doesn’t need renormalization, but I’m going to talk about it first because it introduces the notion of a Feynman diagram.
“Harmonic oscillator” is a fancy name for a rock on a spring. The force exerted by a spring is proportional to how far you stretch it:
The potential energy stored in a stretched spring is the integral of that:
and to make things work out nicely, we’re going to choose The total energy is the sum of the potential and the kinetic energy:
By choosing units so that we get
where is momentum.
Next we quantize, getting a quantum harmonic oscillator, or QHO. We set taking units where Now
If we define a new observable then
We can think of as and write the energy eigenvectors as polynomials in
The creation operator adds a photon to the mix; there’s only one way to do that, so The annihilation operator destroys one of the photons; in the state , there are photons to choose from, so
Schrödinger’s equation says so
This way of representing the state of a QHO is known as the “Fock basis”.
Now suppose that we don’t have the ideal system, that the quadratic potential is only a good local approximation to the real potential . Then we can write the total as where is a function of position and momentum, or equivalently of and and is small.
Now we solve Schrödinger’s equation perturbatively. We know that
and we assume that
so that it makes sense to solve it perturbatively. Define
and
After a little work, we find that
and integrating, we get
We feed this equation back into itself recursively to get
So here we have a sum of a bunch of terms; the th term involves interactions with the potential interspersed with evolving freely between the interactions, and we integrate over all possible times at which those interactions could occur.
Here’s an example Feynman diagram for this simple system, representing the fourth term in the sum above:
The lines represent evolving under the free Hamiltonian , while the dots are interactions with the potential .
As an example, let’s consider and choose so that When acts on a state we get So at each interaction, the system either gains a photon or changes phase and loses a photon.
A particle moving in a quadratic potential in dimensional space gives the tensor product of QHOs, which is QFT in a space where there are possible harmonics. Quantum electrodynamics (QED) amounts to considering infinitely many QHOs, one for each possible energymomentum, which forms a continuum. The diagrams for QED start to look more familiar:
The vertices are interactions with the electromagnetic field. The straight lines are electrons and the wiggly ones are photons; between interactions, they propagate under the free Hamiltonian.
Monad for weakly monoidal categories
We’ve got free and forgetful functors Define Given a category the category has
 binary trees with labeled leaves as objects and
 binary trees with labeled leaves together with the natural isomorphisms from the definition of a weakly monoidal category as its morphisms.
The multiplication collapses two layers of trees down to one. The unit gives a oneleaf tree.
An algebra of the monad is a category together with a functor such that and Define
Then the associator should be a morphism
However, it isn’t immediately evident that the associator that comes from does the job, since just applying to gives
for the source instead of
,
which we get by replacing with its definition above. We need an isomorphism
so we can define Now we use the equations an algebra has to satisfy to derive this isomorphism. Since the following two objects are equal:
Therefore, the isomorphism we wanted is simply equality and It also means that satisfies the pentagon equation.
A similar derivation works for the unitors and the triangle equation.
A morphism of algebras is a functor such that Now
and
so we have the coherence laws for a strict monoidal functor.
Also,
so it preserves the associator as well. The unitors follow in the same way, so morphisms of these algebras are strict monoidal functors that preserve the associator and unitors.
Functors and monads
In many languages you have type constructors; given a type A
and a type constructor Lift
, you get a new type Lift<A>
. A functor is a type constructor together with a function
lift: (A > B) > (Lift<A> > Lift<B>)
that preserves composition and identities. If h
is the composition of two other functions g
and f
h (a) = g (f (a)),
then lift (h)
is the composition of lift (g)
and lift (f)
lift (h) (la) = lift (g) (lift (f) (la)),
where the variable la
has the type Lift<A>
. Similarly, if h is the identity function on variables of type A
h (a: A) = a,
then lift (h)
will be the identity on variables of type Lift<A>
lift (h) (la) = la.
Examples:
 Multiplication
Lift<>
adjoins an extra integer to any type:Lift<A> = Pair<A, int>
The function
lift()
pairs upf
with the identity function on integers:lift (f) = (f, id)
 Concatenation
Lift<>
adjoins an extra string to any type:Lift<A> = Pair<A, string>
The function
lift()
pairs upf
with the identity function on strings:lift (f) = (f, id)
 Composition
LetEnv
be a type representing the possible states of the environment andEffect = Env > Env
Also, we’ll be explicit in the type of the identity function
id<A>: A > A id<A> (a) = a,
so one possible
Effect
isid<Env>
, the “trivial sideeffect”.Then
Lift<>
adjoins an extra sideeffect to any type:Lift<A> = Pair<A, Effect>
The function
lift()
pairs upf
with the identity on sideeffects:lift (f) = (f, id<Effect>)
 Lists
The previous three examples used the Pair type constructor to adjoin an extra value. This functor is slightly different. Here,Lift<>
takes any typeA
to a list ofA
‘s:Lift<A> = List<A>
The function
lift()
is the function map():lift = map
 Double negation, or the continuation passing transform
In a nice type system, there’s theUnit
type, with a single value, and there’s also theEmpty
type, with no values (it’s “uninhabited”). The only function of typeX > Empty
is the identity functionid<Empty>
. This means that we can think of types as propositions, where a proposition is true if it’s possible to construct a value of that type. We interpret the arrow as implication, and negation can be defined as “arrowing intoEmpty
“: letF = Empty
andT = F > F
. ThenT > F = F
(sinceT > F
is uninhabited) andT
is inhabited since we can construct the identity function of type F > F. Functions correspond to constructive proofs. “Negation” of a proof is changing it into its contrapositive form:If A then B => If NOT B then NOT A.
Double negation is doing the contrapositive twice:
IF A then B => If NOT NOT A then NOT NOT B.
Here,
Lift<>
is double negation:Lift<A> = (A > F) > F.
The function lift takes a proof to its double contrapositive:
lift: (A > B) > ((A > F) > F) > ((B > F) > F) lift (f) (k1) (k2) = k1 (lambda (a) { k2 (f (a)) })
Monads
A monad is a functor together with two functions
m: Lift<Lift<A>> > Lift<A> e: A > Lift<A>
satisfying some conditions I’ll get to in a minute.
Examples:
 Multiplication
If you adjoin two integers,m()
multiplies them to get a single integer:m: Pair<Pair<A, int>, int> > Pair<A, int> m (a, i, j) = (a, i * j).
The function
e()
adjoins the multiplicative identity, or “unit”:e: A > Pair<A, int> e (a) = (a, 1)
 Concatenation
If you adjoin two strings,m()
concatenates them to get a single string:m: Pair<Pair<A, string>, string> > Pair<A, string> m (a, s, t) = (a, s + t).
The function
e()
adjoins the identity for concatenation, the empty string:e: A > Pair<A, string> e (a) = (a, "")
 Composition
If you adjoin two sideeffects,m()
composes them to get a single effect:m: Pair<Pair<A, Effect>, Effect> > Pair<A, Effect> m (a, s, t) = (a, t o s),
where
(t o s) (x) = t (s (x)).
The function
e()
adjoins the identity for composition, the identity function onEnv
:e: A > Pair<A, Effect> e (a) = (a, id<Env>)
 Lists
If you have two layers of lists,m()
flattens them to get a single layer:m: List<List<A>> > List<A> m = flatten
The function
e()
makes any element of A into a singleton list:e: A > List<A> e (a) = [a]
 Double negation, or the continuation passing transform
If you have a quadruple negation,m()
reduces it to a double negation:m: ((((A > F) > F) > F) > F) > ((A > F) > F) m (k1) (k2) = k1 (lambda (k3) { k3 (k2) })
The function
e()
is just reverse application:e: A > (A > F) > F e (a) (k) = k (a)
The conditions that e
and m
have to satisfy are that m
is associative and e
is a left and right unit for m
. In other words, assume we have
llla: Lift<Lift<Lift<A>>> la: Lift<A>
Then
m (lift (m) (llla)) = m (m (llla))
and
m (e (la)) = m (lift (e) (la)) = la
Examples:
 Multiplication:
There are two different ways we can use lifting with these two extra functionse()
andm()
. The first is applyinglift()
to them. When we applylift
tom()
, it acts on three integers instead of two; but becauselift (m) = (m, id),
it ignores the third integer:
lift (m) (a, i, j, k) = (a, i * j, k).
Similarly, lifting
e()
will adjoin the multiplicative unit, but will leave the last integer alone:lift (e) = (e, id) lift (e) (a, i) = (a, 1, i)
The other way to use lifting with
m()
ande()
is to apply
to their input types. This specifiesLift<>
A
asPair<A', int>
, so the first integer gets ignored:m (a, i, j, k) = (a, i, j * k) e (a, i) = (a, i, 1)
Now when we apply
m()
to all of these, we get the associativity and unit laws. For associativity we getm (lift (m) (a, i, j, k)) = m(a, i * j, k) = (a, i * j * k) m (m (a, i, j, k)) = m(a, i, j * k) = (a, i * j * k)
and for unit, we get
m (lift (e) (a, i)) = m (a, 1, i) = (a, 1 * i) = (a, i) m (e (a, i)) = m (a, i, 1) = (a, i * 1) = (a, i)
 Concatenation
There are two different ways we can use lifting with these two extra functionse()
andm()
. The first is applyinglift()
to them. When we apply lift tom()
, it acts on three strings instead of two; but becauselift (m) = (m, id),
it ignores the third string:
lift (m) (a, s, t, u) = (a, s + t, u).
Similarly, lifting
e()
will adjoin the empty string, but will leave the last string alone:lift (e) = (e, id) lift (e) (a, s) = (a, "", s)
The other way to use lifting with
m()
ande()
is to applyLift<>
to their input types. This specifiesA
asPair<A', string>
, so the first string gets ignored:m (a, s, t, u) = (a, s, t + u) e (a, s) = (a, s, 1)
Now when we apply
m()
to all of these, we get the associativity and unit laws. For associativity we getm (lift (m) (a, s, t, u)) = m(a, s + t, u) = (a, s + t + u) m (m (a, s, t, u)) = m(a, s, t + u) = (a, s + t + u)
and for unit, you get
m (lift (e) (a, s)) = m (a, "", s) = (a, "" + s) = (a, s) m (e (a, s)) = m (a, s, "") = (a, s + "") = (a, s)
 Composition
There are two different ways we can use lifting with these two extra functionse()
andm()
. The first is applyinglift()
to them. When we apply lift tom()
, it acts on three effects instead of two; but becauselift (m) = (m, id<Effect>),
it ignores the third effect:
lift (m) (a, s, t, u) = (a, t o s, u).
Similarly, lifting
e()
will adjoin the identity function, but will leave the last string alone:lift (e) = (e, id<Effect>) lift (e) (a, s) = (a, id<Env>, s)
The other way to use lifting with
m()
ande()
is to applyLift<>
to their input types. This specifiesA
asPair<A', Effect>
, so the first effect gets ignored:m (a, s, t, u) = (a, s, u o t) e (a, s) = (a, s, id<Env>)
Now when we apply
m()
to all of these, we get the associativity and unit laws. For associativity we getm (lift (m) (a, s, t, u)) = m(a, t o s, u) = (a, u o t o s) m (m (a, s, t, u)) = m(a, s, u o t) = (a, u o t o s)
and for unit, you get
m (lift (e) (a, s)) = m (a, id<Env>, s) = (a, s o id<Env>) = (a, s) m (e (a, s)) = m (a, s, id<Env>) = (a, id<Env> o s) = (a, s)
 Lists
There are two different ways we can use lifting with these two extra functionse()
andm()
. The first is applyinglift()
to them. When we apply lift tom()
, it acts on three layers instead of two; but becauselift (m) = map (m),
it ignores the third (outermost) layer:
lift (m) ([[[a, b, c], [], [d, e]], [[]], [[x], [y, z]]]) = [[a, b, c, d, e], [], [x, y, z]]
Similarly, lifting
e()
will make singletons, but will leave the outermost layer alone:lift (e) ([a, b, c]) = [[a], [b], [c]]
The other way to use lifting with
m()
ande()
is to applyLift<>
to their input types. This specifiesA
asList<A'>
, so the *innermost* layer gets ignored:m ([[[a, b, c], [], [d, e]], [[]], [[x], [y, z]]]) = [[a, b, c], [], [d, e], [], [x], [y, z]] e ([a, b, c]) = [[a, b, c]]
Now when we apply
m()
to all of these, we get the associativity and unit laws. For associativity we getm (lift (m) ([[[a, b, c], [], [d, e]], [[]], [[x], [y, z]]])) = m([[a, b, c, d, e], [], [x, y, z]]) = [a, b, c, d, e, x, y, z] m (m ([[[a, b, c], [], [d, e]], [[]], [[x], [y, z]]])) = m([[a, b, c], [], [d, e], [], [x], [y, z]]) = [a, b, c, d, e, x, y, z]
and for unit, we get
m (lift (e) ([a, b, c])) = m ([[a], [b], [c]]) = [a, b, c] m (e ([a, b, c])) = m ([[a], [b], [c]]) = [a, b, c]
Monads in Haskell style, or “Kleisli arrows”
Given a monad (Lift, lift, m, e)
, a Kleisli arrow is a function
f: A > Lift<B>,
so the e()
function in a monad is already a Kleisli arrow. Given
g: B > Lift<C>
we can form a new Kleisli arrow
(g >>= f): A > Lift<C> (g >>= f) (a) = m (lift (g) (f (a))).
The operation >>= is called “bind” by the Haskell crowd. You can think of it as composition for Kleisli arrows; it’s associative, and e()
is the identity for bind. e()
is called “return” in that context. Sometimes code is less complicated with bind and return instead of m
and e
.
If we have a function f: A > B
, we can turn it into a Kleisli arrow by precomposing with e()
:
(e o f): A > Lift<B> (e o f) (a) = e (f (a)) = return (f (a)).
Example:
 Double negation, or the continuation passing style transform
We’re going to (1) show that the CPS transform of a function takes a continuation and applies that to the result of the function. We’ll also (2) show that for two functionsr, s
,CPS (s o r) = CPS (s) >>= CPS (r),
(1) To change a function
f: A > B
into a Kleisli arrow (i.e. continuized function)CPS (f): A > (B > X) > X
, we just compose withe
—or in the language of Haskell, wereturn
the result:CPS (f) (a) (k) = return (f (a)) (k) = (e o f) (a) (k) = e (f (a)) (k) = k (f (a))
(2) Given two Kleisli arrows
f: A > (B > F) > F
and
g: B > (C > F) > F,
we can bind them:
(g >>= f) (a) (k) = m (lift (g) (f (a))) (k) // defn of bind = lift (g) (f (a)) (lambda (k3) { k3 (k) }) // defn of m = f (a) (lambda (b) { (lambda (k3) { k3 (k) }) (g (b)) }) // defn of lift = f (a) (lambda (b) { g (b) (k) }), // application
which is just what we wanted.
In particular, if
f
andg
are really just continuized functionsf = (e o r) g = (e o s)
then
(g >>= f) (a) (k) = f (a) (lambda (b) { g (b) (k) }) // by above = (e o r) (a) (lambda (b) { (e o s) (b) (k) }) // defn of f and g = (e o r) (a) (lambda (b) { k (s (b)) }) // defn of e = (e o r) (a) (k o s) // defn of composition = (k o s) (r (a)) // defn of e = k (s (r (a))) // defn of composition = (e o (s o r)) (a) (k) // defn of e = CPS (s o r) (a) (k) // defn of CPS
so
CPS (s) >>= CPS (r) = CPS (s o r).
My talk at Perimeter Institute
I spent last week at the Perimeter Institute, a Canadian institute founded by Mike Lazaridis (CEO of RIM, maker of the BlackBerry) that sponsors research in cosmology, particle physics, quantum foundations, quantum gravity, quantum information theory, and superstring theory. The conference, Categories, Quanta, Concepts, was organized by Bob Coecke and Andreas Döring. There were lots of great talks, all of which can be found online, and lots of good discussion and presentations, which unfortunately can’t. (But see Jeff Morton’s comments.) My talk was on the Rosetta Stone paper I coauthored with Dr. Baez.
Syntactic string diagrams
I hit on the idea of making lambda a node in a string diagram, where its inputs are an antivariable and a term in which the variable is free, and its output is the same term, but in which the variable is bound. This allows a string diagram notation for lambda calculus that is much closer to the syntactical description than the stuff in our Rosetta Stone paper. Doing it this way makes it easy to also do pi calculus and blue calculus.
There are two types, V (for variable) and T (for term). I’ve done untyped lambda calculus, but it’s straightforward to add subscripts to the types V and T to do typed lambda calculus.
There are six function symbols:
 Lambda takes an antivariable and a term that may use the corresponding variable.
 This turns an antivariable “x” introduced by lambda into the term “x”.
 (Application) This takes and and produces .
 These two mean we can duplicate and delete terms.
The rule is the real meat of the computation. The “P” is an arbitrary subdiagram. The effect is replacing the “A” application node with the “P” subdiagram, modulo some wiring.
I label the upwards arrows out of lambdas with a variable name in parentheses; this is just to assist in matching up the syntactical representation with the string diagram.
In the example, I surround part of the diagram with a dashed line; this is the part to which the rule applies. Within that, I surround part with a dashdot line; this is the subdiagram P in the rule.
When I do blue calculus this way, there are a few more function symbols and the relations aren’t confluent, but the flavor is very much the same.
Imaginary time
Statics (geometric = no time):  
[x]  x coordinate  
[y]  y coordinate  
[k]  proportionality constant  
[y/x]  slope  
[k y/x]  proportional to slope  
[k y^2/x^2]  distortion  
[k y^2/x^2]  original shape  
[k y^2/x]  least S at equilibrium  
Statics (with energy):  
[x]  parameterization of curve  
[y]  y coordinate  
[kg x/s^2]  spring constant at x  
[y/x]  slope  
[kg y/s^2]  force due to stretching  
[kg y^2/s^2 x = J/x]  stretching energy density  
[kg y^2/s^2 x= J/x]  gravitational energy density  
[kg y^2/s^2 = J]  energy (least energy at equilibrium)  
Statics (unitless distance):  
[1]  parameterization of curve  
[m]  y coordinate  
[kg/s^2]  spring constant  
[m]  relative displacement  
[kg m/s^2 = N]  force at x due to stretching  
[kg m^2 / s^2 = J]  stretching energy at x  
[kg m^2 / s^2 = J]  gravitational energy at x  
[kg m^2 / s^2 = J]  energy (least energy at equilibrium)  
Dynamics ():  
[s]  time  
[m]  y coordinate  
[kg]  mass  
[m/s]  velocity  
[kg m/s]  momentum  
[kg m^2/s^2 = J]  kinetic energy  
[kg m^2 / s^2 = J]  potential energy  
[kg m^2/s]  i * action 
See also Toby Bartels‘ sci.physics post.
I finally understand the state transformer monad!
It’s the monad arising from the currying adjunction between and attaches an environment to :
This is the type of a function that takes a state of type and outputs a result of type and a new state of type . So is the ability to depend on state and is the ability to change the state.
The natural transformation is evaluation:
takes a function and an input point and evaluates the function at that point. So we get
by evaluating the in the middle, while the unit
is just the curried identity on pairs:
Passive polarization clock
Here’s a design for a passive polarization clock.
The sky is polarized in concentric circles around the sun. The polarization of the southern sky moves through around 180 degrees during daylight hours. It is polarized vertically in the morning, horizontally at noon, and vertically again in the evening.
Align slices of polarized film such that they are parallel to the contours. Any given ray from the center of the sundial outward always hits the contours at the same angle; the angle changes by 360 degrees as the ray passes through 180 degrees. In other words, the clock goes from 6am to 6pm as the sun moves through the sky.
Yoneda embedding as contrapositive and callcc as double negation
Consider the problem of how to represent negation of a proposition when we only have implication and falsehood:
Since is true and is false, this does what we want.
The contrapositive says has the same truth value as Using only implication, the contrapositive of is
What if we don’t even have falsehood? Well, we can pick any proposition to represent falsehood and form
The Yoneda embedding takes a category and produces a category :
This embedding is better known among computer scientists as the continuation passing style transformation.
In a symmetric monoidal closed category, like lambda calculus, we can move everything “inside:” every morphism has an internal version The internal Yoneda embedding of is
Here is the “unit” type; notice how the target type is isomorphic under the CurryHoward isomorphism to the contrapositive. This is a term that maps a continuation and a value to
To get double negation, first do the Yoneda embedding on the identity to get
then uncurry, braid, and recurry to get
or, internally,
This takes a value to a function value
Callwithcurrentcontinuation expects a term that has been converted to CPS style as above, and then hands it the remainder of the computation in
The category GraphUp
The category GraphUp of graphs and Granovetter update rules has
 directed finite graphs as objects
 morphisms generated by Granovetter rules, i.e. the following five operations:
 add a new node. (creation,
refcount=0
)  given a node add a new node and an edge (creation,
refcount=1
)  given edges and add an edge (introduction,
++refcount
)  remove an edge. (deletion,
refcount
)  remove a node and all its outgoing edges if it has no incoming edges. (garbage collection)
 add a new node. (creation,
It’s a monoidal category where the tensor product is disjoint union. Also, since two disjoint graphs can never become connected, they remain disjoint.
Programs in a capabilitysecure language get interpreted in GraphUp. A program’s state graph consists of nodes, representing the states of the system, and directed edges, representing the system’s transitions between states upon receiving input. A functor from a program state graph to GraphUp assigns an object reference graph as state to each node and an update generated by Granovetter rules to each edge.
Haskell monads for the category theorist
A monad in Haskell is about composing almostcompatible morphisms. Typically you’ve got some morphism and then you think of a reason that you’d rather have where is a functor. In Haskell, though, you don’t define functors right out: you define type constructors, which are like the map of objects in a functor. You have to define the map of morphisms separately.
To define the map of morphisms, we have to define (aka ) and . We define to be the categorytheorist’s unit composed with the input. Applying to gives
To compose “halffunctored” morphisms like we define like this: given an input and a morphism like bind produces
So a “monad” in Haskell is always the monad for categories, except the lists are of bindable halffunctored morphisms like rather than lists of composable morphisms.
A sideeffect monad has where is the data type for the environment on which the morphism is to act. The IO monad is a specific instance of a sideeffect monad with the environment being the keyboard queue, disk drive, network, etc.
Another piece of the stone
A few days ago, I thought that I had understood pi calculus in terms of category theory, and I did, in a way.


To make lambda calculus into a category, we mod out by the rewrite rules and consider equivalence classes of terms instead of actual terms. A model of this category (i.e. a functor from the category to Set) picks out a set of values for each datatype and a function for each program. Given a value from the input set, we get a single value from the output set.
Similarly, a model of the pi calculus assigns to each process a set of states and to each reduction rule a function that updates the state. A morphism in this way of thinking is a certain number of reductions to perform. The reductions are deterministic in the sense that we can model “A or B” as Given an object’s state at a certain point, we can tell what set of states it can be in after the system reduces n messages.
However, what we really want is to know the set of states it can be in after all the messages have been processed: what is its normal form? This is far more like the rewrite rules in lambda calculus. It suggests that we should be treating the reduction rules like 2morphisms instead of 1morphisms. There’s one important difference from lambda calculus, however: the 2morphisms of pi calculus are not confluent! It matters very much in which order they are evaluated. Thus processes can’t map to anything but trivial functions.
It looks like a better fit for models of the pi calculus is Rel, the category of sets and relations. A morphism in Rel can relate a single input state to multiple output states. This suggests we have a single object * in the pi calculus that gets mapped to a set of possible states for the system, while each process gets mapped to a relation that encodes all its possible reductions.
I’m rather embarrassed that it took me so long to notice this, since my advisor has been talking about replacing Set by Rel for years.
category  lambda calculus  pi calculus 
objects  types  only one type *, a system of processes 
a morphism  an equivalence class of terms  a structural congruence class of processes 
dinatural transformation from the constant functor (mapping to the terminal set and its identity) to a functor generated by hom, products, projections, and exponentials (if they’re there)  combinator: template for programs mapping between isomorphic types (usually)  since there’s only one type, this is trivial 
Model of the category in Rel  (usually taken in Set, a subcategory of Rel) a set of values for each data type and a function for each morphism between them  * maps to a set S of states for the system, and each process gets mapped to a relation that relates each element of S to its possible reductions in S 
Continuation passing as a reflection
We can write any expression like as a full binary tree where the nodes denote evaluation of the left child at the right, and the leaves are values:
Figure 1:
[In the caption of figure 1, the expression is slightly different; when using trees, it’s more convenient to curry all the functions—that is, replace every comma “,” by backtoback parens: “)(” .]
The continuation passing transform (Yoneda embedding) first reflects the tree across the vertical axis and then replaces the root and all the left children with their continuized form—a value gets replaced with a function value
Figure 2:
What does this evaluate to? Well,
As we hoped, it’s the continuization (Yoneda embedding) of the original expression. Iterating, we get
Figure 3:
At this point, we get an enormous simplification: we can get rid of overlines whenever the left and right branch both have them. For instance,
Actually working out the whole expression would mean lots of epicycular reductions like this one, but taking the shortcut, we just get rid of all the lines except at the root. That means that we end up with
for our final expression.
However, if this expression is just part of a larger one—if what we’re calling the “root” is really the child of some other node—then the cancellation of lines on siblings applies to our “root” and its sibling, and we really do get back to where we started!
A piece of the Rosetta stone
category  lambda calculus  pi calculus  Turing machine 
objects  types  structural congruence classes of processes  where is the natural numbers and is all binary sequences with finitely many ones. 
a morphism  an equivalence class of terms  a specific reduction from one process state to the next  a specific transition from one state and position of the machine to the next 
dinatural transformation from the constant functor (mapping to the terminal set and its identity) to a functor generated by hom, products, projections, and exponentials (if they’re there)  combinator  reduction rule (covers all reductions of a particular form)  tapehead update rule (covers all transitions with the current cell and state in common) 
products  product types  parallel processes  multiple tapes 
internal hom  exponential types  all types are exponentials?  ? 
Model of the category in Set  A set of values for each data type and a function for each morphism between them  A set of states for each process and a single evolution function out of each set.  ? 
This won’t be appearing in our Rosetta stone paper, but I wanted to write it down. What flavor of logic corresponds to the pi calculus? To the Turing machine?
The continuation passing transform and the Yoneda embedding
They’re the same thing! Why doesn’t anyone ever say so?
Assume A
and B
are types; the continuation passing transform takes a function (here I’m using C++ notation)
B f(A a);
and produces a function
<X> CPT_f(<X> (*k)(B), A a) { return k(f(a)); }
where X
is any type. In CPT_f
, instead of returning the value f(a)
directly, it reads in a continuation function k
and “passes” the result to it. Many compilers use this transform to optimize the memory usage of recursive functions; continuations are also used for exception handling, backtracking, coroutines, and even show up in English.
The Yoneda embedding takes a category and produces a category :
We get the transformation above by uncurrying to get
In Java, a (cartesian closed) category is an interface C
with a bunch of internal interfaces and methods mapping between them. A functor is written
class F implements C
.
Then each internal interface C.A
gets instantiated as a set F.A
of values and each method C.f()
becomes instantiated as a function F.f()
between the sets.
The continuation passing transform can be seen as a parameterized functor . We’d write
class CPT<X> implements C
.
Then each internal interface C.A
gets instantiated as a set CPT
<X>
.A
of methods mapping from C.A
to X
—i.e. continuations that accept an input of type C.A
—and each method C.f
maps to the continuized function CPT<X>.f
described above.
Then the Yoneda lemma says that for every model of —that is, for every
class F
implementing the interface C
—there’s a natural isomorphism between the set and the set of natural transformations
A natural transformation between and is a way to cast the class F
to the class CPT<X>
such that for any method of C
, you can either
 invoke its implementation directly (as a method of
F
) and then continuize the result (using the type cast), or  continuize first (using the type cast) and then invoke the continuized function (as a method of
CPT<X>
) on the result
and you’ll get the same answer. Because it’s a natural isomorphism, the cast has an inverse.
The power of the Yoneda lemma is taking a continuized form (which apparently turns up in lots of places) and replacing it with the direct form. The trick to using it is recognizing a continuation when you see one.
leave a comment