Funny identity matrix

Posted in Math by Mike Stay on 2013 June 11

Given any category C with finite products and coproducts such that the products distribute over the coproducts, you can make a compact closed bicategory Mat(C) of (natural numbers, matrices of elements of Ob(C), matrices of elements of Mor(C)) and define matrix composition by \displaystyle (M \circ N)(i,k) = \sum_j M(i,j) \times N(j,k).

Take the partial order whose objects are nonnegative integers and whose morphisms m \to n mean m | n; the product is gcd and the coproduct is lcm. In this category, the terminal object is 0 and the initial object is 1, so the identity matrix looks like

\left( \begin{array}{cccc}0 & 1 & \ldots & 1 \\ 1 & 0 & \ldots & 1 \\ \vdots & \vdots & \ddots & \vdots \\ 1 & 1 & \ldots & 0 \end{array} \right)

and matrix multiplication is \displaystyle (M \circ N)(i,k) = \mathop{\mbox{lcm}}_j \mbox{gcd}(M(i,j), N(j,k)).

Reflective calculi

Posted in Category theory, Programming by Mike Stay on 2013 May 15

This is mostly for my own benefit.

We start with some term-generating functor T(G, X) parametric in a set of ground terms G and a set of names X.

Now specialize to a single ground term:

T_2(X) = T(1, X)

Now mod out by structural equivalence:

T_3(X) = T_2(X) / \equiv

Let N be a set of names and let M be the set of name-equivalence classes N / \equiv_N (where \equiv_N is yet to be defined—it’s part of the theory of names in which we’re still parametric).

Prequoting and predereference are an algebra and coalgebra of T_3, respectively:

PQ: T_3(M) \to M

PD: M \to T_3(M)

such that

PD \circ PQ: T_3(M) \to T_3(M) = 1_{T_3(M)}.

Real quoting and dereference have to use T_4(X), defined below.

Define N = \mu X.T_2(X). Does N / \equiv = \mu X.T_3(X)? I think so; assuming it does, define

M = \mu X.T_3(X)

so name equivalence is structural equivalence; equivalence of prequoted predereferences is automatic by the definition above.

The fixed point gives us an isomorphism

F: T_3(M) \to M.

We can define PQ = F, PD = F^{-1}, because undecidability doesn’t come into play until we add operational semantics. It’s decidable whether two terms are structurally equivalent. Thus

PD \circ PQ:T_3(M) \to T_3(M)

is the identity, satisfying the condition, and

PQ \circ PD: M \to M

is the identity, which we get for free.

When we mod out by operational semantics (following the traditional approach rather than the 2-categorical one needed for pi calculus):

T_4(X) = T_3(X) / \beta,

we have the quotient map

q_X: T_3(X) \to T_4(X)

and a map

r_X: T_4(X) \to T_3(X)

that picks a representative from the equivalence class.

It’s undecidable whether two terms are in the same operational equivalence class, so q_X may not halt. However, it’s true by construction that

q_X \circ r_X: T_4(X) \to T_4(X)

is the identity.

We can extend prequoting and predereference to quoting and dereference on T_4 by

Q: T_4(M) \stackrel{r_M}{\to} T_3(M) \stackrel{PQ}{\to} M

D: M \stackrel{PD}{\to} T_3(M) \stackrel{q_M}{\to} T_4(M)

and then

D \circ Q

= T_4(M) \stackrel{r_M}{\to} T_3(M) \stackrel{PQ}{\to} M \stackrel{PD}{\to} T_3(M) \stackrel{q_M}{\to} T_4(M)

= T_4(M) \stackrel{r_M}{\to} T_3(M) \stackrel{q_M}{\to} T_4(M)

= T_4(M) \stackrel{id}{\to} T_4(M)

which is what we want for quoting and dereference. The other way around involves undecidability.

Monads without category theory, redux

Posted in Category theory, Programming by Mike Stay on 2012 September 26

I intended the last version of this post to be a simple one-off, but people cared too much. A fire-breathing Haskellite said that jQuery must never be used as an example of a monad because it’s (gasp) not even functional! And someone else didn’t understand JavaScript well enough to read my code snippets. So here’s another attempt that acknowledges the cultural divide between functional and imperative programmers and tries to explain everything in English as well as in source code.

Monads are a design pattern that is most useful to functional programmers because their languages prefer to implement features as libraries rather than as syntax. In Haskell there are monads for input / output, side-effects and exceptions, with special syntax for using these to do imperative-style programming. Imperative programmers look at that and laugh: “Why go through all that effort? Just use an imperative language to start with.” Functional programmers also tend to write programs as—surprise!—applying the composite of a bunch of functions to some data. The basic operation for a monad is really just composition in a different guise, so functional programmers find this comforting. Functional programmers are also more likely to use “continuations”, which are something like extra-powerful exceptions that only work well when there is no global state; there’s a monad for making them easier to work with.

There are, however, some uses for monads that even imperative programmers find useful. Collections like sets and lists (with or without parentheses), parsers, promises, and membranes are a few examples, which I’ll explain below.


Many collection types support the following three operations:

  1. Map a function over the elements of the collection.
  2. Flatten a collection of collections into a collection.
  3. Create a collection from a single element.

A monad is a class that provides a generalized version of these three operations. When I say “generalized”, I mean that they satisfy some of the same rules that the collections’ operations satisfy in much the same way that multiplication of real numbers is associative and commutative just like addition of real numbers.

The way monads are usually used is by mapping a function and then flattening. If we have a function f that takes an element and returns a list, then we can say and get a new list.


A parser is an object with a list of tokens that have already been parsed and the remainder of the object (usually a string) to be parsed.

var Parser = function (obj, tokens) {
  this.obj = obj;
  // If tokens are not provided, use the empty list.
  this.tokens = tokens || [];

It has three operations like the collections above.

  1. Mapping a function over a parser applies the function to the contained obj. = function (f) {
      return new Parser(f(this.obj), this.tokens);
  2. Flattening a parser of parsers concatenates the list of tokens.
    Parser.prototype.flatten = function () {
      return new Parser(this.obj.obj, this.obj.tokens.concat(this.tokens));

    The definition above means that

    new Parser(new Parser(x, tokens1), tokens2).flatten()
    is equivalent to
    new Parser(x, tokens1.concat(tokens2)).

  3. We can create a parser from an element x: new Parser(x).

If we have a function f that takes a string, either parses out some tokens or throws an exception, and returns a parser with the tokens and the remainder of the string, then we can say

and get a new parser. In what follows, I create a parser with the string “Hi there” and then expect a word, then some whitespace, then another word.

var makeMatcher = function (re) {
  return function (s) {
    var m = s.match(re);
    if (!m) { throw new Error('Expected to match ' + re); }
    return new Parser(m[2], [m[1]]);

var alnum = makeMatcher(/^([a-zA-Z0-9]+)(.*)/);
var whitespace = makeMatcher(/^(s+)(.*)/);

new Parser('Hi there')
// is equivalent to new Parser('', ['Hi', ' ', 'there']);


A promise is an object that represents the result of a computation that hasn’t finished yet; for example, if you send off a request over the network for a webpage, the promise would represent the text of the page. When the network transaction completes, the promise “resolves” and code that was waiting on the result gets executed.

  1. Mapping a function f over a promise for x results in a promise for f(x).
  2. When a promise represents remote data, a promise for a promise is still just remote data, so the two layers can be combined; see promise pipelining.
  3. We can create a resolved promise for any object that we already have.

If we have a function f that takes a value and returns a promise, then we can say

and get a new promise. By stringing together actions like this, we can set up a computation that will execute properly as various network actions complete.


An object-capability language is an object-oriented programming language where you can’t get a reference to an object unless you create it or someone calls one of your methods and passes a reference to it. A “membrane” is a design pattern that implements access control.

Say you have a folder object with a bunch of file objects. You want to grant someone temporary access to the folder; if you give them a reference to the folder directly, you can’t force them to forget it, so that won’t work for revokable access. Instead, suppose you create a “proxy” object with a switch that only you control; if the switch is on, the object forwards all of its method calls to the folder and returns the results. If it’s off, it does nothing. Now you can give the person the object and turn it off when their time is up.

The problem with this is that the folder object may return a direct reference to the file objects it contains; the person could lose access to the folder but could retain access to some of the files in it. They would not be able to have access to any new files placed in the folder, but would see updates to the files they retained access to. If that is not your intent, then the proxy object should hide any file references it returns behind similar new proxy objects and wire all the switches together. That way, when you turn off the switch for the folder, all the switches turn off for the files as well.

This design pattern of wrapping object references that come out of a proxy in their own proxies is a membrane.

  1. We can map a function f over a membrane for x and get a membrane for f(x).
  2. A membrane for a membrane for x can be collapsed into a single membrane that checks both switches.
  3. Given any object, we can wrap it in a membrane.

If we have a function f that takes a value and returns a membrane, then we can say

and get a new membrane. By stringing together actions like this, we can set up arbitrary reference graphs, while still preserving the membrane creator’s right to turn off access to his objects.


Monads implement the abstract operations map and flatten, and have an operation for creating a new monad out of any object. If you start with an instance m of a monad and you have a function f that takes an object and returns a monad, then you can say;

and get a new instance of a monad. You’ll often find scenarios where you repeat that process over and over.

Overloading JavaScript’s dot operator with direct proxies

Posted in Category theory, Math, Programming by Mike Stay on 2012 August 28

With the new ECMAScript 6 Proxy object that Firefox has implemented, you can make dot do pretty much anything you want. I made the dot operator in JavaScript behave like Haskell’s bind:

// I'll give a link to the code for lift() later,
// but one thing it does is wrap its input in brackets.
lift(6); // [6]
lift(6)[0]; // 6
lift(6).length; // 1

// lift(6) has no "upto" property
lift(6).upto; // undefined

// But when I define this global function, ...

// Takes an int n, returns an array of ints [0, ..., n-1].
var upto = function (x) {
  var r = [], i;
  for (i = 0; i < x; ++i) {
  return r;

// ... now the object lift(6) suddenly has this property
lift(6).upto; // [0,1,2,3,4,5]
// and it automagically maps and flattens!
lift(6).upto.upto; // [0,0,1,0,1,2,0,1,2,3,0,1,2,3,4]
lift(6).upto.upto.length; // 15

To be clear, ECMAScript 6 has changed the API for Proxy since Firefox adopted it, but you can implement the new one on top of the old one. Tom van Cutsem has code for that.

I figured this out while working on a contracts library for JavaScript. Using the standard monadic style (e.g. jQuery), I wrote an implementation that doesn’t use proxies; it looked like this:

lift(6)._(upto)._(upto).value; // [0,0,1,0,1,2,0,1,2,3,0,1,2,3,4]

The lift function takes an input, wraps it in brackets, and stores it in the value property of an object. The other property of the object, the underscore method, takes a function as input, maps that over the current value and flattens it, then returns a new object of the same kind with that flattened array as the new value.

The direct proxy API lets us create a “handler” for a target object. The handler contains optional functions to call for all the different things you can do with an object: get or set properties, enumerate keys, freeze the object, and more. If the target is a function, we can also trap when it’s used as a constructor (i.e. new F()) or when it’s invoked.

In the proxy-based implementation, rather than create a wrapper object and set the value property to the target, I created a handler that intercepted only get requests for the target’s properties. If the target has the property already, it returns that; you can see in the example that the length property still works and you can look up individual elements of the array. If the target lacks the property, the handler looks up that property on the window object and does the appropriate map-and-flatten logic.

I’ve explained this in terms of the list monad, but it’s completely general. In the code below, mon is a monad object defined in the category theorist’s style, a monoid object in an endofunctor category, with multiplication and unit. On line 2, it asks for a type to specialize to. On line 9, it maps the named function over the current state, then applies the monad multiplication. On line 15, it applies the monad unit.

var kleisliProxy = function (mon) {
  return function (t) {
    var mont = mon(t);
    function M(mx) {
      return Proxy(mx, {
        get: function (target, name, receiver) {
          if (!(name in mx)) {
            if (!(name in window)) { return undefined; }
            return M(mont['*'](mon(window[name]).t(mx)));
          return mx[name];
    return function (x) { return M(mont[1](x)); };

var lift = kleisliProxy(listMon)(int32);
lift(6).upto.upto; // === [0,0,1,0,1,2,0,1,2,3,0,1,2,3,4]

Cantor’s diagonal proof

Posted in Math by Mike Stay on 2012 August 20

There’s a riddle about a library where the index had two parts, each in its own book. The first index book listed every book whose title appeared between its own covers; nearly every book in the library was listed in this index, since the title is typically on the title page. The other index book, a mere pamphlet really, was for those rare cases where the title was not between the covers of the book. Neither index had a title page, just a list of book titles and what shelf each book was on.

In which book was the first book listed? It could have been listed in itself, which is consistent with the claim that it lists all the books that contain their own titles. Or it could have been listed in the second book and not in itself, which would also be consistent.

The riddle is this: in which index was the second index book listed? If we suppose that the second book did not list itself, then it would be incomplete, since it is a book in the library that does not contain its title between the covers. If we suppose that it did list itself, then it would be inconsistent, since it is supposed to list only those books that do not contain their own titles. There is no way for the second index book to be both consistent and complete.

Georg Cantor was a mathematician who used this idea to show that there are different sizes of infinity! But before I go there, consider the Munduruku tribe of the Amazon, which has no words for specific numbers larger than five; how would a man with eight children tell if he has enough fruit to give one to each child? He would probably name each child and set aside a piece of fruit for them—that is, he would set up an isomorphism between the children and the fruit. Even though he cannot count either set, he knows that they are the same size.

We can do the same thing to compare infinite sets: even though we can’t count them, if we can show that there is an isomorphism between two infinite sets, we know they are the same size. Likewise, if we can prove that there is no possible isomorphism between the sets, then they must be different sizes.

So now suppose that we take the set of natural numbers {0, 1, 2, 3, …} and the set of positive even numbers {0, 2, 4, 6, …}. These sets both have infinitely many elements. Are they the same size? We can double every natural number and get an even one, and halve every positive even number and get a natural number. That’s an isomorphism, so they’re the same size.

How about natural numbers and pairs of natural numbers? Given two numbers like 32768 and 137, we can pad them to the same length and then interleave them: 3020716387; they come apart just as easily.

Since we can take the first number to be the denominator of a nonnegative rational number and the second number to be the numerator, we also find that the size of the set of rationals is the same as the size of the set of natural numbers.

Cantor came up with isomorphisms for each of these sets and thought that perhaps all infinite sets were the same size. But then he tried to come up with a way to match up the natural numbers and infinite sequences of bits, like the sequence of all zeros:

    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...

or the sequence that’s all zeros except for the third one:

    0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, ...

or the sequence whose nth bit is 1 if n is prime:

    0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, ...

or sequences with no pattern at all:

    0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, ...

He tried and tried, but couldn’t figure out a way to make them match up. Then he thought of an ingenious proof that it’s impossible! It uses the same insight as the library riddle above.

Suppose we try to match them up. We write in the left column a natural number and in the right column a binary sequence:

    1: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
    2: 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, ...
    3: 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, ...
    4: 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, ...
    5: 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, ...
    6: 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, ...

Consider the diagonal of the table above: 0, 0, 1, 1, 0, 1, … This sequence is number 6 above, and is like the first index book: the sixth bit in the sequence could be either 0 or 1, and it would be consistent.

Now consider the opposite of the diagonal, its “photographic negative” sequence: 1, 1, 0, 0, 1, 0, … This sequence is like the second index book: it cannot consistently appear anywhere in the table. To see that, imagine if we had already assigned this sequence to number 7. (The number 7 is arbitrary; any other number will work just as well.)  Then the table would look like this:

    1: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
    2: 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, ...
    3: 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, ...
    4: 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, ...
    5: 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, ...
    6: 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, ...
    7: 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, ...

But look at the seventh bit of sequence 6! It’s supposed to be the seventh bit along the diagonal, and it’s wrong. If we correct it, then the 7th bit of sequence 7 is wrong, since it’s supposed to be the opposite of the 7th bit of sequence 6.

This proof shows that there are too many sequences of bits to match up to natural numbers. No matter how we try match them up, we can always find a contradiction by looking at the sequence defined by the opposite of the diagonal sequence.  And as if two different sizes of infinity weren’t enough, Cantor was able to repeat this process and show that there are infinitely many sizes of infinity, each bigger than the last!

Many influential mathematicians didn’t like this idea, and attacked Cantor personally rather than trying to find fault with his proof.  Religious philosophers misinterpreted his results and equated the idea of multiple infinities with pantheism!  Confronted with so much antagonism, Cantor often fought with depression.

He persevered, however, and later in life was greatly praised for his work. The Royal Society gave him their highest honor, the Sylvester Medal; and no less a mathematician than David Hilbert exclaimed, “No one shall expel us from the paradise that Cantor has created.”

Semantics for the blue calculus

Posted in Category theory, Math, Quantum by Mike Stay on 2011 April 29
Blue calculus MonCat \frac{n \mbox{Cob}_2}{} 2Hilb Set (as a one object bicategory)
Types monoidal categories manifolds 2 Hilbert spaces *
Terms with one free variable monoidal functors manifolds with boundary linear functors sets
Rewrite rules monoidal natural transformation manifolds with corners linear natural transformations functions
Tensor product \boxtimes juxtaposition (disjoint union) tensor product cartesian product
(T | T'):Y where x:X is free ([[T]] \otimes_Y [[T']]):X \to Y formal sum of cobordisms with boundary from X to Y sum of linear functors disjoint union

In the MonCat column, \boxtimes is the categorified version of the tensor product of monoids.

Axiom of fun choice

Posted in Fun links, Math by Mike Stay on 2011 April 14

A fun choice function is a function f defined on a collection J of jobs that must be done such that for every job j \in J, f(j) is an element of fun. The axiom of fun choice states,

For any set \displaystyle J of jobs that must be done, there exists a fun choice function defined on \displaystyle J.

This axiom asserts that one can always find the fun in any job that must be done; a theorem of Poppins deduces from this that all such jobs are games.

Escher “Print gallery” metapixel

Posted in Math by Mike Stay on 2011 March 10

This one uses a conformal transformation like Escher did in his “Print Gallery“:

A picture that contains a scaled-down copy of itself is periodic both in theta and in log (r). Taking the complex log of such an image gives a rectangular tiling of the plane; by scaling and rotating, I got a new tiling, then exponentiated it. Now going around 2 pi radians takes you diagonally across the original rectangle, so you end up one level away from where you started.

[Edit] See also these better versions.

Droste OTCA metapixel

Posted in Math by Mike Stay on 2011 March 8

John Conway and some of his friends invented a cellular automaton called “Life”. It’s Turing complete, so you can make a Life pattern that runs Life. The OTCA metapixel has all its logic around the edges of a vast field that gets filled with light-weight spaceships when the cell is on. (See here, here, and here for more details.)

This image uses a logarithmic spiral to zoom through a factor of 2048 after one turn. There’s a little bit of misalignment between the largest scale and the smallest one, but I think it still turned out pretty well (click for a larger view):

Astounding protein folding paper

Posted in Chemistry, General physics by Mike Stay on 2011 February 22

By assuming that to get from point A to point B you don’t have to hit a sequence of points in between—i.e. the way quantum particles work—these guys accurately predict protein folding rates in 15 different real proteins. This is huge.

In case it wasn’t already abundantly clear…

Posted in Math by Mike Stay on 2011 February 15

You know a guy’s a nerd if he pulls over to take a picture of his odometer because it’s showing the Fibonacci sequence.

But it takes a special kind of nerd to have reset his trip odometer 132.1 miles ahead of time in anticipation of taking the picture.

Regular tilings of three-dimensional spaces

Posted in Astronomy, Fun links, General physics, Math by Mike Stay on 2011 January 31

If you start at the north pole and make an equilateral triangle 6000 miles on a side, the bottom will lie on the equator, each of the angles will be 90 degrees, and only four of them will fit around the pole.

In a similar way, large enough tetrahedra would tile the surface of a hypersphere. This paper identifies the eleven regular tilings of three-dimensional spaces and whether they’re spherical, Euclidean, or hyperbolic tilings, and then looks at the geometry of spacetime to see how it might be tiled.

The “cubic” tilings (where eight polyhedra meet around a vertex like cubes do in Euclidean space) are amenable to taking cross-sections; this tiling of hyperbolic space with dodecahedra

has a cross section with a tiling of the hyperbolic plane with pentagons:

Buckyball Bacteriophage

Posted in Design, Evolution, Math by Mike Stay on 2011 January 11

The t4 bacteriophage infects e. coli.

Dodecahedral Buckyballs

Posted in Design, Math by Mike Stay on 2011 January 6

Hyperbolic Buckyballs

Posted in Design, Math by Mike Stay on 2011 January 6

Icosahedral Buckyballs

Posted in Design, Math by Mike Stay on 2011 January 6

The Word of God

Posted in Astronomy, Chemistry, Evolution, General physics, History, Poetry, Theocosmology by Mike Stay on 2010 November 3

From desert cliff and mountaintop we trace the wide design,
Strike-slip fault and overthrust and syn and anticline…
We gaze upon creation where erosion makes it known,
And count the countless aeons in the banding of the stone.
Odd, long-vanished creatures and their tracks & shells are found;
Where truth has left its sketches on the slate below the ground.
The patient stone can speak, if we but listen when it talks.
Humans wrote the Bible; God wrote the rocks.

There are those who name the stars, who watch the sky by night,
Seeking out the darkest place, to better see the light.
Long ago, when torture broke the remnant of his will,
Galileo recanted, but the Earth is moving still.
High above the mountaintops, where only distance bars,
The truth has left its footprints in the dust between the stars.
We may watch and study or may shudder and deny,
Humans wrote the Bible; God wrote the sky.

By stem and root and branch we trace, by feather, fang and fur,
How the living things that are descend from things that were.
The moss, the kelp, the zebrafish, the very mice and flies,
These tiny, humble, wordless things–how shall they tell us lies?
We are kin to beasts; no other answer can we bring.
The truth has left its fingerprints on every living thing.
Remember, should you have to choose between them in the strife,
Humans wrote the Bible; God wrote life.

And we who listen to the stars, or walk the dusty grade,
Or break the very atoms down to see how they are made,
Or study cells, or living things, seek truth with open hand.
The profoundest act of worship is to try to understand.
Deep in flower and in flesh, in star and soil and seed,
The truth has left its living word for anyone to read.
So turn and look where best you think the story is unfurled.
Humans wrote the Bible; God wrote the world.

-Catherine Faber, The Word of God

The Sum of Forking Paths

Posted in Borges, Perception, Quantum by Mike Stay on 2010 September 7

Paracelsus threw the rose into the fire; it bent on impact, recoiled, and fell deeper among the logs. In the gold-orange light, the stem was already black. The husk began to shrivel and split as the sap boiled away on its surface. The petals blistered and blackened and fell. The five-fingered star beneath them danced subtly, swaying in the brittle heat. For nearly an hour it lay visibly unchanged save for a gradual loss of hue and a universal greyness, then fell into three large pieces as the log crumbled around it. The ashes glowed orange, then gradually dimmed; the last visible flash of light burst outward from the remains of the stem. Like all light, it carried within it a timepiece.

Once, when the clock read noon, it traveled without hesitation in a straight path to my retina. Once it took another course, only to bend around a molecule of nitrogen and reach the same destination.

Once it traced the signature of a man no one remembers.

Once, at half past three, it decayed into a tiny spark and its abominable opposite image; their mutual horrified fascination drew them together, each a moth in the other’s flame. The last visible flash of light from their fiery consummation was indistinguishable from the one that spawned them.

Once, when the clock ticked in the silence just before dawn, the light decayed into two mirrored worlds, somewhat better than ours due to the fact that I was never born there. Both worlds were consumed by mirrored dragons before collapsing back into the chaos from which they arose; all that remained was an orange flash of light.

Once it traveled to a far distant galaxy, reflected off the waters of a billion worlds and witnessed the death of a thousand stars before returning to the small room in which we sat.

Once it transcribed a short story of Borges in his own cramped scrawl, the six versions he discarded, the corrupt Russian translation by Nabokov, and a version in which the second-to-last word was illegible.

Once it traveled every path, each in its time; once it became and destroyed every possible world. All these summed to form what was: I saw an orange flash, and in that moment, I was enlightened.

Formal axiomatic divination

Posted in Borges, Math, Programming, Time by Mike Stay on 2010 September 1

It’s well known that “in Xanadu did Kublai Khan a stately pleasure dome decree,” but his true legacy is the field of formal axiomatic divination. In 1279, Khan sought an auspicious date on which to begin construction of the palace. He consulted each of his twelve astrologers separately and without warning; unsurprisingly, he received twelve different answers. Khan flew into a rage and said that until the astrologer’s craft was as precise as that of his masons and carpenters, they were banished from his presence.

Kublai Khan died in 1294 and his successor Temur Khan was convinced to reinstate the astrologers. Despite this, the young mathematician Zhu Shijie took up the old Khan’s challenge in 1305. Zhu had already completed two enormously influential mathematical texts: Introduction to Mathematical Studies, published in 1299, and True reflections of the four unknowns, published in 1303. This latter work included a table of “the ancient method of powers”, now known as Pascal’s triangle, and Zhu used it extensively in his analysis of polynomials in up to four unknowns.

In turning to the analysis of divination, Zhu naturally focused his attention on the I Ching. The first step in performing an I Ching divination is casting coins or yarrow stalks to construct a series of hexagrams. In 1308, Zhu published his treatise on probability theory, Path of the falling stone. It included an analysis of the probability for generating each hexagram as well as betting strategies for several popular games of chance. Using his techniques, Zhu became quite wealthy and began to travel; it was during this period that he was exposed to the work of the mathematicians in northern China. In the preface to True reflections, Mo Ruo writes that “Zhu Shijie of Yan-shan became famous as a mathematician. He travelled widely for more than twenty years and the number of those who came to be taught by him increased each day.”

Zhu worked for nearly a decade on the subsequent problem, that of interpreting a series of hexagrams. Hexagrams themselves are generated one bit at a time by looking at remainders modulo four of random handfuls of yarrow stalks; the four outcomes either specify the next bit directly or in terms of the previous bit. These latter rules give I Ching its subtitle, The Book of Changes. For mystical reasons, Zhu asserted that the proper interpretation of a series of hexagrams should also be given by a set of changes, but for years he could find no reason to prefer one set of changes to any other. However, in 1316, Zhu wrote to Yang Hui:

“I dreamed that I was summoned to the royal palace. As I stepped upon the threshold, the sun burst forth over the gilded tile; I was blinded and, overcome, I fell to my knees. I lifted my hand to shield my eyes from its brilliance, and the Emperor himself took it and raised me up. To my surprise, he changed his form as I watched; he became so much like me that I thought I was looking in a mirror.

“‘How can this be?’ I cried. He laughed and took the form of a phoenix; I fell back from the flames as he ascended to heaven, then sorrowed as he dove toward the Golden Water River, for the water would surely quench the bird. Yet before reaching the water, he took the form of an eel, dove into the river and swam to the bank; he wriggled ashore, then took the form of a seed, which sank into the earth and grew into a mighty tree. Finally he took his own form again and spoke to me: ‘I rule all things; things above the earth and in the earth and under the earth, land and sea and sky. I can rule all these because I rule myself.’

“I woke and wondered at the singularity of the vision; when my mind reeled in amazement and could stand no more, it retreated to the familiar problem of the tables of changes. It suddenly occurred to me that as the Emperor could take any form, there could be a table of changes that could take the form of any other. Once I had conceived the idea, the implementation was straightforward.”

The rest of the letter has been lost, but Yang Hui described the broad form of the changes in a letter to a friend; the Imperial Changes were a set of changes that we now recognize as a Turing-complete programming language, nearly seven hundred years before Turing. It was a type of register machine similar to Melzak’s model, where seeds were ‘planted’ in pits; the lists of hexagrams generated by the yarrow straws were the programs, and the result of the computation was taken as the interpretation of the casting. Zhu recognized that some programs never stopped–some went into infinite loops, some grew without bound, and some behaved so erratically he couldn’t decide whether they would ever give an interpretation.

Given his fascination with probabilities, it was natural that Zhu would consider the probability that a string of hexagrams had an interpretation. We do not have Zhu’s reasoning, only an excerpt from his conclusion: “The probability that a list of hexagrams has an interpretation is a secret beyond the power of fate to reveal.” It may be that Zhu anticipated Chaitin’s proof of the algorithmic randomness of this probability as well.

All of Zhu’s works were lost soon after they were published; True reflections survived in a corrupted form through Korean (1433 AD) and Japanese (1658 AD) translations and was reintroduced to China only in the nineteenth century. One wonders what the world might have been like had the Imperial Changes been understood and exploited. We suppose it is a secret beyond the power of fate to reveal.

Imaginary Time 2

Posted in Chemistry, General physics, Math by Mike Stay on 2010 July 26

Another part of the analogy I started here, but this time using inverse temperature instead of imaginary time. It describes a thermometer where a mass changes position with temperature. I’m guessing this stuff only applies when the temperature is changing adiabatically.

Thermometer (unitless temperature):
\displaystyle \beta [1] inverse temperature (unitless)
\displaystyle y(\beta) [m] y coordinate
\displaystyle r [kg/s^2 K] spring constant * temp unit conversion
\displaystyle v(\beta) = \frac{dy(\beta)}{d\beta} [m] how position changes with (inverse) temperature
\displaystyle F(\beta) = r \; v(\beta) [kg m/s^2 K] force per Kelvin
\displaystyle T(\beta) = \frac{r}{2}v(\beta)^2 [kg m^2/s^2 K] stretching energy per Kelvin
\displaystyle V(\beta) [kg m^2/s^2 K] potential energy per Kelvin
\displaystyle S = \int (T + V)(\beta) \; d\beta

[kg m^2/s^2 K] entropy
\displaystyle \beta [1/K] inverse temperature
\displaystyle y(\beta) [m] y coordinate
\displaystyle r [kg/s^2 K^2 = bits/m^2 K] how information density changes with temp
\displaystyle v(\beta) = \frac{dy(\beta)}{d\beta} [m K] how position changes with (inverse) temperature
\displaystyle F(\beta) = r \; v(\beta) [kg m/s^2 K = bits/m] force per Kelvin
\displaystyle T(\beta) = \frac{r}{2}v(\beta)^2 [kg m^2/s^2 = bits K] stretching energy = change in stretching information with invtemp
\displaystyle V(\beta) [kg m^2/s^2 = bits K] potential energy = change in potential information with invtemp
\displaystyle S = \int (T + V)(\beta) \; d\beta

[bits] entropy

I assume that the dynamics of such a system would follow a path where \displaystyle \delta S=0; is that a minimum-entropy path or a maximum?

An analogy

Posted in Math, Quantum by Mike Stay on 2010 July 24
Stat mech Quant mech
column vector distribution
\displaystyle |p\rangle = \sum_j p_j|j\rangle
amplitude distribution (wave function)
\displaystyle |\psi\rangle = \sum_j \psi_j |j\rangle
row vector population
\displaystyle \langle p| = \sum_j p^*_j \langle j|,
where p_j^* are the coefficients that, when normalized, maximize the inner product with |p\rangle
\displaystyle \langle \psi| = \sum_j \psi_j^* \langle j|,
where \psi_j^* is the complex conjugate of \psi_j.
normalization \displaystyle \sum_j p_j = 1 \displaystyle \sum_j |\psi_j|^2 = 1
transitions stochastic unitary
harmonic oscillator many HOs at temperature T one QHO evolving for time t
uniform distribution over n states \displaystyle |U\rangle = \frac{1}{n}\sum_j |j\rangle \displaystyle |U\rangle = \frac{1}{\sqrt{n}}\sum_j |j\rangle
special distribution Gibbs distribution \displaystyle p(T) = \sum_j e^{-E_j / k_B T} |j\rangle Free evolution \displaystyle \psi(t) = H|U\rangle = \sum_j e^{-E_j \; it/\hbar} |j\rangle
partition function = inner product with \langle U| \displaystyle Z(T) = \langle U|p(T)\rangle = \frac{1}{n}\sum_j e^{-E_j / k_B T} \displaystyle Z(t) = \langle U|\psi(t)\rangle = \frac{1}{\sqrt{n}}\sum_j e^{-E_j \; it/\hbar}
\displaystyle \langle Q\rangle \displaystyle = \frac{1}{Z(T)}\langle U| \; |Qp(T)\rangle \displaystyle = \frac{1}{Z(t)}\langle U| \; |Q\psi(t)\rangle
path integrals E/T = maximum entropy? Et = least action?

Entropic gravity

Posted in Astronomy, General physics, Math by Mike Stay on 2010 July 19

Erik Verlinde has been in the news recently for revisiting Ted Jacobson’s suggestion that gravity is an entropic force rather than a fundamental one. The core of the argument is as follows:

Say we have two boxes, one inside the other:

|               |
| +----------+  |
| |          |  |
| |          |  |
| |          |  |
| +----------+  |

Say the inner box has room for ten bits on its surface and the outer one room for twenty. Each box can use as many “1”s as there are particles inside it:

|      X        |
| +----------+  |
| |          |  |
| |  X       |  |
| |          |  |
| +----------+  |

In this case, the inner box has only one particle inside, so there are 10 choose 1 = 10 ways to choose a labeling of the inner box; the outer box has two particles inside, so there are 20 choose 2 = 190 ways. Thus there are 1900 ways to label the system in all.

If both particles are in the inner box, though, the number of ways increases:

|               |
| +----------+  |
| |          |  |
| |  X  X    |  |
| |          |  |
| +----------+  |

The inner box now has 10 choose 2 ways = 45, while the outer box still has 190. So using the standard assumption that all labelings are equally likely, it’s 4.5 times as likely to find both particles in the inner box, and we get an entropic force drawing them together.

The best explanation of Verlinde’s paper I’ve seen is Sabine Hossenfelder’s Comments on and Comments on Comments on Verlinde’s paper “On the Origin of Gravity and the Laws of Newton”.

Simple activity for teaching about radiometric dating

Posted in General physics, Quantum by Mike Stay on 2010 May 3

This works best with small groups of about 5-10 students and at least thirty dice. Divide the dice evenly among the students.

  1. Count the number of dice held by the students and write it on the board.
  2. Have everyone roll each die once.
  3. Collect all the dice that show a ‘one’, count them, write the sum on the board, then set them aside.
  4. Go back to step 1.

A run with 30 dice will look something like this:

dice number of ones
30 5
25 4
21 4
17 3
14 1
13 3
10 2
8 1
7 1
6 0
6 1
5 0
5 1
4 1
3 0
3 0
3 0
3 1
2 1
1 0
1 0
1 0
1 0
1 1

Point out how the number of dice rolling a one on each turn is about one sixth of the dice that hadn’t yet rolled a one on the previous turn. Also, that you lose about half of the remaining dice after about four turns.

Send someone out of the room; do either four or eight turns, then bring them back and ask them to guess how many turns the group took. The student should be able to see that if half the dice are left, there were only four turns, but if a quarter of the dice are left, there were eight turns.

If the students are advanced enough to use logarithms, try the above with some number other than four or eight and have the student use logarithms to calculate the number of turns:

turns = log(number remaining/total) / log(5/6),

or, equivalently, in terms of the half-life (which is really closer to 3.8 than 4):

turns = 3.8 * log(number remaining/total) / log(1/2).

When Zircon crystals form, they strongly reject lead atoms: new zircon crystals have no lead in them. They easily accept uranium atoms. Each die represents a uranium atom, and rolling a one represents decaying into a lead atom: because uranium atoms are radioactive, they can lose bits of their nucleus and turn into lead–but only randomly, like rolling a die. Instead of four turns, the half-life of U238 is 4.5 billion years.

Zircon forms in almost all rocks and is hard to break down. So to judge the age of a rock, you get the zircon out, throw it in a mass spectrometer, look at the proportion of uranium to (lead plus uranium) and calculate

years = 4.5 billion * log(mass of uranium/mass of (lead+uranium)) / log(1/2).

Problem: given a zircon crystal where there’s one lead atom for every ninety-nine uranium atoms, how long ago was it formed?

4.5 billion * log(99/100) / log(1/2) = 65 million years ago.

In reality, it’s slightly more complicated: there are two isotopes of uranium and several of lead. But this is a good thing, since we know the half-lives of both isotopes and can use them to cross-check each other; it’s as though each student had both six- and twenty-sided dice, and the student guessing the number of turns could use information from both groups to refine her guess.

Escher and Mandelbrot

Posted in Math by Mike Stay on 2010 May 3

If you take a complex number z with argument θ and square it, you double θ. The Mandelbrot/Julia iteration

z ↦ z2 + c

does pretty much the same thing, but adds wiggles to the curve. Since the iterations stop when |z| > 2, the boundary at the zeroth iteration is a circle; after the first it’s a pear shape, and so on. We can map any point in the region between bands to a point in a rectangular tile that’s periodic once along the outside edge and twice along the inside edge. Here’s a site with a few different examples.

The transformation Escher used in “Print Gallery” takes concentric circles at r=1/rn to a logarithmic spiral. The concentric boundaries between iterations for the Julia set at c = 0 are circles. There ought to be a transformation for Mandelbrot / Julia sets similar to the Droste effect but spiraling inward so that the frequency is smoothly increasing just as the distance can be made to smoothly increase.


Posted in Borges, Fun links, General physics, Perception, Quantum by Mike Stay on 2010 April 27

Lazulinos are quasiparticles in a naturally occurring Bose-Einstein condensate first described in 1977 by the Scottish physicist Alexander Craigie while at the University of Lahore [3]. The quasiparticles are weakly bound by an interaction for which neither the position nor number operator commutes with the Hamiltonian. A measurement of a lazulino’s position will cause the condensate to go into a superposition of number states, and a subsequent measurement of the population will return a random number; also, counting the lazulinos at two different times will likely give different results.

Their name derives from the stone lapis lazuli and means, roughly, “little blue stone”. Lazulinos are so named because even though the crystals in which they arise absorb visible light, and would otherwise be jet black, they lose energy through surface plasmons in the form of near-ultraviolet photons, with visible peaks at 380, 402, and 417nm. Optical interference imparts a “laser speckle” quality to the emitted light; Craigie described the effect in a famously poetic way: “Their colour is the blue that we are permitted to see only in our dreams”. What makes lazulinos particularly interesting is that they are massive and macroscopic. Since the number operator does not commute with the Hamiltonian, lazulinos themselves do not have a well-defined mass; if the population is N, then the mass of any particular lazulino is m/N, where m is the total mass of the condensate.

In a recent follow-up to the “quantum mirage” experiment [2], Don Eigler’s group at IBM used a scanning tunneling microscope to implement “quantum mancala”—picking up the lazulino ‘stones’ in a particular location usually changes the number of stones, so the strategy for winning becomes much more complicated. In order to pick up a fixed number of stones, you must choose a superposition of locations [1].

  1. C.P. Lutz and D.M. Eigler, “Quantum Mancala: Manipulating Lazulino Condensates,” Nature 465, 132 (2010).
  2. H.C. Manoharan, C.P. Lutz and D.M. Eigler, “Quantum Mirages: The Coherent Projection of Electronic Structure,” Nature 403, 512 (2000). Images available at
  3. A. Craigie, “Surface plasmons in cobalt-doped Y3Al5O12,” Phys. Rev. D 15 (1977). Also available at


Posted in Category theory, Math, Quantum by Mike Stay on 2010 April 11

Coends are a categorified version of “summing over repeated indices”. We do that when we’re computing the trace of a matrix and when we’re multiplying two matrices. It’s categorified because we’re summing over a bunch of sets instead of a bunch of numbers.

Let C be a small category. The functor \mbox{hom}:C^{\mbox{op}} \times C \to Set assigns

  • to each pair of objects the set of morphisms between them, and
  • to each pair of morphisms (f:c \to c', h:d\to d') a function that takes a morphism g \in \mbox{hom}(c', d) and returns the composite morphism h \circ g \circ f \in \mbox{hom}(c, d'), where c, c', d, d' \in \mbox{Ob}(C).

It turns out that given any functor S:C^{\mbox{op}} \times D \to \mbox{Set}, we can make a new category where C and D are subcategories and S is actually the hom functor; some keywords for more information on this are “collages” and “Artin glueing”. So we can also think of S as assigning

  • to each pair of objects a set of morphisms between them, and
  • to each pair of morphisms (f:c \to c', h:d\to d') a function that takes a morphism g \in S(c', d) and returns the composite morphism h \circ g \circ f \in S(c, d'), where c,c' \in \mbox{Ob}(C) and d,d' \in \mbox{Ob}(D).

We can think of these functors as adjacency matrices, where the two parameters are the row and column, except that instead of counting the number of paths, we’re taking the set of paths. So S is kind of like a matrix whose elements are sets, and we want to do something like sum the diagonals.

The coend of S is the coequalizer of the diagram

\begin{array}{c}\displaystyle \coprod_{f:c \to c'} S(c', c) \\ \\ \displaystyle S(f, c) \downarrow \quad \quad \downarrow S(c', f) \\ \\ \displaystyle \coprod_c S(c, c) \end{array}

The top set consists of all the pairs where

  • the first element is a morphism f \in \mbox{hom}(c, c') and
  • the second element is a morphism g \in S(c', c).

The bottom set is the set of all the endomorphisms in S.

The coequalizer of the diagram, the coend of S, is the bottom set modulo a relation. Starting at the top with a pair (f, g), the two arrows give the relation

\displaystyle c \stackrel{f}{\to} c' \stackrel{g}{\multimap} c \stackrel{c}{\to} c \quad \sim \quad c' \stackrel{c'}{\to} c' \stackrel{g}{\multimap} c \stackrel{f}{\to} c',

where I’m using the lollipop to mean a morphism from S.

So this says take all the endomorphisms that can be chopped up into a morphism f from \mbox{hom} going one way and a g from S going the other, and then set fg \sim gf. For this to make any sense, it has to identify any two objects related by such a pair. So it’s summing over all the endomorphisms of these equivalence classes.

To get the trace of the hom functor, use S = \mbox{hom} in the analysis above and replace the lollipop with a real arrow. If that category is just a group, this is the set of conjugacy classes. If that category is a preorder, then we’re computing the set of isomorphism classes.

The coend is also used when “multiplying matrices”. Let S(c', c) = T(b, c) \times U(c', d). Then the top set consists of triples (f: c\to c',\quad g:b \multimap c,\quad h:c' \multimap d), the bottom set of pairs (g:b \multimap c, \quad h:c \multimap d), and the coend is the bottom set modulo

(\displaystyle b \stackrel{g}{\multimap} c \stackrel{c}{\to} c, \quad c \stackrel{f}{\to} c' \stackrel{h}{\multimap} d) \quad \sim \quad  (\displaystyle b \stackrel{g}{\multimap} c \stackrel{f}{\to} c', \quad c' \stackrel{c'}{\to} c' \stackrel{h}{\multimap} d)

That is, it doesn’t matter if you think of f as connected to g or to h; the connection is associative, so you can go all the way from b to d.

Notice here how a morphism can turn “inside out”: when f and the identities surround a morphism in S, it’s the same as being surrounded by morphisms in T and U; this is the difference between a trace, where we’re repeating indices on the same matrix, and matrix multiplication, where we’re repeating the column of the first matrix in the row of the second matrix.

Theories and models

Posted in Category theory, Math, Programming by Mike Stay on 2010 March 4

The simplest kind of theory is just a set T, thought of as a set of concepts or Platonic ideals. We typically have some other set S, thought of as the set of real things that are described by concepts. Then a model is a function f:T \to S. For example, we could let T = {0, 1}; this is the theory of the number “two”, since it has two elements. Whatever set we choose for S, the models of the theory are going to be pairs of elements of S. So if S = the set of people, models of T in S are going to be pairs of people (where choosing the same person twice is allowed).

Concepts, however, are usually related to each other, whereas in a set, you can only ask if elements are the same or not. So the way a theory is usually presented is as a category T. For example let T be the category with two objects E, V and two parallel nontrivial morphisms \sigma, \tau:E\to V, and let S be the category Set of sets and functions. A model is a structure-preserving map from the category T to Set, i.e. a functor. Each object of T gets mapped to a set; here we think of the image of V as a set of vertices and the image of E as a set of edges. Each morphism of T gets mapped to a function; \sigma and \tau take an edge and produce the source vertex or target vertex, respectively. The category T = Th(Graph) is the theory of a graph, and its models are all graphs.

Usually, our theories have extra structure. Consider the first example of a model, a function between sets. We can add structure to the theory; for example, we can take the set T to be the ring of integers \mathbb{Z}. Then a model is a structure-preserving function, a homomorphism between T and S. Of course, this means that S has to have at least as much structure as T. We could, for instance, take S to be the real numbers under multiplication. Since this ring homomorphism is entirely determined by where we map 1, and we can choose any real number for its image, there would be one model for each real number; each integer x would map to a^x for some a. Another option is to take S = \mathbb{Z}_4, the integers modulo 4. There are three nonisomorphic models of \mathbb{Z} in \mathbb{Z}_4. If we map 1 to 0, we get the trivial ring; if we map 1 to 1 or 3, we get integers modulo 4; and if we map 1 to 2, we get integers modulo 2.

Similarly, we can add structure to a category. If we take monoidal categories T, S, then we can tensor objects together to get new ones. A model of such a theory is a tensor-product-preserving functor from T to S. See my paper “Physics, Topology, Computation, and Logic: a Rosetta Stone” with John Baez for a thorough exploration of theories that are braided monoidal closed categories and models of these.

An element of the set \mathbb{Z}_4 is a number, while an object of the category Th(Graph) is a set. A theory is a mathematical gadget in which we can talk about theories of one dimension lower. In Java, we say “interface” instead of “theory” and “class” instead of “model”. With Java interfaces we can describe sets of values and functions between them; it is a cartesian closed category whose objects are datatypes and whose morphisms are (roughly) programs. Models of an interface are different classes that implement that interface.

And there’s no reason to stop at categories; we can consider bicategories with structure and structure-preserving functors between these; these higher theories should let us talk about different models of computation. One model would be Turing machines, another lambda calculus, a third would be the Java Virtual Machine, a fourth Pi calculus.

Aleph and Omega

Posted in Borges, Math, Perception, Theocosmology, Time by Mike Stay on 2010 January 14

I shut my eyes — I opened them. Then I saw the Aleph.

I arrive now at the ineffable core of my story. And here begins my despair as a writer. All language is a set of symbols whose use among its speakers assumes a shared past. How, then, can I translate into words the limitless Aleph, which my floundering mind can scarcely encompass? Mystics, faced with the same problem, fall back on symbols: to signify the godhead, one Persian speaks of a bird that somehow is all birds; Alanus de Insulis, of a sphere whose center is everywhere and circumference is nowhere; Ezekiel, of a four-faced angel who at one and the same time moves east and west, north and south. (Not in vain do I recall these inconceivable analogies; they bear some relation to the Aleph.) Perhaps the gods might grant me a similar metaphor, but then this account would become contaminated by literature, by fiction. Really, what I want to do is impossible, for any listing of an endless series is doomed to be infinitesimal. In that single gigantic instant I saw millions of acts both delightful and awful; not one of them occupied the same point in space, without overlapping or transparency. What my eyes beheld was simultaneous, but what I shall now write down will be successive, because language is successive. Nonetheless, I’ll try to recollect what I can.

On the back part of the step, toward the right, I saw a small iridescent sphere of almost unbearable brilliance. At first I thought it was revolving; then I realised that this movement was an illusion created by the dizzying world it bounded. The Aleph’s diameter was probably little more than an inch, but all space was there, actual and undiminished. Each thing (a mirror’s face, let us say) was infinite things, since I distinctly saw it from every angle of the universe. I saw the teeming sea; I saw daybreak and nightfall; I saw the multitudes of America; I saw a silvery cobweb in the center of a black pyramid; I saw a splintered labyrinth (it was London); I saw, close up, unending eyes watching themselves in me as in a mirror; I saw all the mirrors on earth and none of them reflected me; I saw in a backyard of Soler Street the same tiles that thirty years before I’d seen in the entrance of a house in Fray Bentos; I saw bunches of grapes, snow, tobacco, lodes of metal, steam; I saw convex equatorial deserts and each one of their grains of sand; I saw a woman in Inverness whom I shall never forget; I saw her tangled hair, her tall figure, I saw the cancer in her breast; I saw a ring of baked mud in a sidewalk, where before there had been a tree; I saw a summer house in Adrogué and a copy of the first English translation of Pliny — Philemon Holland’s — and all at the same time saw each letter on each page (as a boy, I used to marvel that the letters in a closed book did not get scrambled and lost overnight); I saw a sunset in Querétaro that seemed to reflect the colour of a rose in Bengal; I saw my empty bedroom; I saw in a closet in Alkmaar a terrestrial globe between two mirrors that multiplied it endlessly; I saw horses with flowing manes on a shore of the Caspian Sea at dawn; I saw the delicate bone structure of a hand; I saw the survivors of a battle sending out picture postcards; I saw in a showcase in Mirzapur a pack of Spanish playing cards; I saw the slanting shadows of ferns on a greenhouse floor; I saw tigers, pistons, bison, tides, and armies; I saw all the ants on the planet; I saw a Persian astrolabe; I saw in the drawer of a writing table (and the handwriting made me tremble) unbelievable, obscene, detailed letters, which Beatriz had written to Carlos Argentino; I saw a monument I worshipped in the Chacarita cemetery; I saw the rotted dust and bones that had once deliciously been Beatriz Viterbo; I saw the circulation of my own dark blood; I saw the coupling of love and the modification of death; I saw the Aleph from every point and angle, and in the Aleph I saw the earth and in the earth the Aleph and in the Aleph the earth; I saw my own face and my own bowels; I saw your face; and I felt dizzy and wept, for my eyes had seen that secret and conjectured object whose name is common to all men but which no man has looked upon — the unimaginable universe.

I felt infinite wonder, infinite pity…

(Jorge Luis Borges, The Aleph)

A finitely-refutable question is one of the form, “Does property X holds for all natural numbers?” Any mathematical question admitting a proof or disproof is in this category. If you believe the ideas of digital physics, then any question about the behavior of some portion of the universe is in this category. We can encode any finitely refutable question as a program that iterates through the natural numbers and checks to see if it’s a counterexample. If so, it halts; if not, it goes to the next number.

The halting probability of a universal Turing machine is a number between zero and one. Given the first n bits of this number, there is a program that will compute which n-bit programs halt and which don’t. Assuming digital physics, all those things Borges wrote about in the Aleph are in the Omega. There’s a trivial way–the Omega is a normal number, so every sequence of digits appears infinitely often–but there’s a more refined way: ask any finitely-refutable question using an n-bit program and the first n bits of Omega contain the proper information to compute the answer.

The bits of Omega are pure information; they can’t be computed from a fixed-size program, like the bits of \pi can.

Renormalization and Computation 4

Posted in Category theory, Math, Programming, Quantum by Mike Stay on 2009 October 15

This is the fourth in a series of posts on Yuri Manin’s pair of papers. In the previous posts, I laid out the background; this time I’ll actually get around to his result.

A homomorphism from the Hopf algebra into a target algebra is called a character. The functor that assigns an action to a path, whether classical or quantum, is a character. In the classical case, it’s into the rig (\mathbb{R} \cup \{\infty\}, \min, \infty, +, 0), and we take an infimum over paths; in the quantum it’s into the rig (\mathbb{C}, +, 0, \cdot, 1), and we take an integral over paths. Moving from the quantum to the classical case is called Maslov dequantization.

Manin mentions that the runtime of a parallel program is a character akin to the classical action, with the runtime of the composition of two programs being the sum of the respective runtimes, while the runtime of two parallel programs is the maximum of the two. A similar result holds for nearly any cost function. He also points out that computably enumerable reals \mathbb{R}^{c.e.} form a rig (\mathbb{R}^{c.e.}\cup \{\infty\}, \max, \infty, +, 0). He examines Rota-Baxter operators as a way to generalize what “polar part” means and extend the theorems on Hopf algebra renormalization to such rigs.

In the second paper, he looks at my work with Calude as an example of a character. He uses our same argument to show that lots of measures of program behavior have the property that if the measure hasn’t stopped growing after reaching a certain large amount with respect to the program size, then the density of finite values the measure could take decreases like 1/\log(t). Surprisingly, though he referred to these results as cutoffs, he didn’t actually use them anywhere for doing regularization.

Reading between the lines, he might be suggesting something like approximating the Kolmogorov complexity that he uses later by using a time cutoff, motivated by results from our paper: there’s a constant c depending only on the programming language such that if you run the nth program cn^2 steps and it hasn’t stopped, then the density of times near t > cn^2 at which it could stop is roughly 1/\log(t).

Levin suggested using a computable complexity that’s the sum of the program length and the log of the number of time steps; I suppose you could “regularize” the Kolmogorov complexity by adding \Lambda \log(t) to the length of the program, renormalize, and then let \Lambda go to zero, but that’s not something Manin does.

Instead, he proposed two other constructions suitable for renormalization; here’s the simplest. Given a partial computable function f:\mathbb{Z}^+\to \mathbb{Z}^+, define the computably enumerable \bar{f}:\mathbb{N}\to\mathbb{N} by \bar{f}(k) = f(k) if f(k) is defined, and 0 otherwise. Now define

\displaystyle \Psi(k,f;z) = \sum_{n=0}^{\infty} \frac{z^n}{\left(1+n\bar{f}(k)\right)^2}.

When f(k) is undefined, \Psi(k,f;z) = 1/(1-z), which has a pole at z=1. When f(k) is defined, \Psi(k,f;z) converges everywhere except z=\infty. Birkhoff decomposition would separate these two cases, though I’m not sure what value \Psi_+(f,k;1) would take or what it would mean.

The other construction involves turning \bar{f} into a permutation (x,y) \mapsto (x+\bar{f}(y),y), and inventing a function that has poles when the permutation has fixpoints.

So Manin’s idea of renormalizing the halting problem is to do some uncomputable stuff to get an easy-to-renormalize function and then throw the Birkhoff decomposition at it; since we know the halting problem is undecidable, perhaps the fact that he didn’t come up with a new technique for extracting information about the problem is unsurprising, but after putting in so much effort to understand it, I was left rather disappointed: if you’re going to allow yourself to do uncomputable things, why not just solve the halting problem directly?

I must suppose that his intent was not to tackle this hard problem, but simply to play with the analogy he’d noticed; it’s what I’ve done in other papers. And being forced to learn renormalization was exhilarating! I have a bunch of ideas to follow up; I’ll write them up as I get a chance.

Renormalization and Computation 2

Posted in General physics, Math, Quantum by Mike Stay on 2009 October 10

This is the second in a series of posts covering Yuri Manin’s ideas involving Hopf algebra renormalization of the Halting problem. Last time I showed how perturbing a quantum harmonic oscillator gave a sum over integrals involving n interactions with the perturbation; we can keep track of the integrals using Feynman diagrams, though in the case of a single QHO they weren’t very interesting.

One point about the QHO needs emphasis at this point. Given a wavefunction \psi = \sum_{n=0}^{\infty} \psi_n |n\rangle describing the state of the QHO, it must be the case that we get some value when we measure the energy; so if we sum up the norms of the probability amplitudes, we should get unity:

\displaystyle \langle \psi|\psi \rangle = \sum_{n=0}^{\infty} \langle n | \psi_n^* \psi_n | n \rangle = \sum_{n=0}^{\infty} |\psi_n|^2 = 1.

This is called the normalization condition.

When we perturb the QHO, the states |n\rangle are no longer the energy eigenvectors of the new Hamiltonian. We can express the new eigenvectors |m\rangle in terms of the old ones:

\displaystyle |m\rangle = \sum_{n=0}^{\infty}\lambda^n m_n|n\rangle,

where \lambda is the strength of the perturbation, and we reexpress our wavefunction in this new basis:

\displaystyle \psi = \sum_{m=0}^{\infty} \psi'_m |m\rangle

Since we’re working with a new set of coefficients, we have to make sure they sum up to unity, too:

\displaystyle \sum_{m=0}^{\infty} |\psi'_m|^2 = 1.

This is the renormalization condition. So renormalization is about making sure things sum up right once you perturb the system.

I want to talk about renormalization in quantum field theory; the trouble is, I don’t actually know quantum field theory, so I’ll just be writing up what little I’ve gathered from reading various things and conversations with Dr. Baez. I’ve likely got some things wrong, so please let me know and I’ll fix them.

A field is a function defined on spacetime. Scalar fields are functions with a single output, whereas vector fields are functions with several outputs. The electromagnetic field assigns a single number, called the electric field, and a vector, called the magnetic field, to every point in spacetime. When you have two electrons and move one of them, it feels a reaction force and loses momentum; the other electron doesn’t move until the influence, traveling at the speed of light, reaches it. Conservation of momentum says that the momentum has to be somewhere; it’s useful to consider it to be in the electromagnetic field.

When you take the Fourier transform of the field, you get a function that assigns values to harmonics of the field; in the case of electromagnetism, the transformed field \phi assigns a value to each color k of light. Quantizing this transformed field amounts to making \phi(k) into a creation operator, just like z in the QHO example from last time. So we have a continuum of QHOs, each indexed by a color k. (By the way—the zero-dimensional Fourier transform is the identity function, so the QHO example from last time can be thought of both as the field at the unique point in spacetime and the field at the unique frequency.)

When we move to positive-dimensional fields, we get more interesting pictures, like these from quantum electrodynamics:
Feynman diagrams
Here, our coupling constant is the fine structure constant \alpha = e^2/\hbar c, where e is the charge of the electron. For each vertex, we write down our coupling constant times -i times a delta function saying that the incoming momentum minus the outgoing momentum equals zero. For each internal line, we write down a propagator—a function representing the transfer of momentum from one point to another; it’s a function of the four-momentum k—and multiply all this stuff together. Then we integrate over all four-momenta and get something that looks like

\displaystyle \int_0^\infty f(k) d^4 k.

The trouble is, this integral usually gives infinity for an answer. We try to work around this in two steps: first, we regularize the integral by introducing a length scale \Lambda. This represents the point at which gravity starts being important and we need to move to a more fundamental theory. In the quantum field theory of magnetic domains in iron crystals, the length scale is the inter-atom distance in the lattice. Regularization makes the integral finite for \Lambda away from some singularity.

There are a few different ways of regularizing; one is to use \Lambda as a momentum cutoff:

\displaystyle \int_0^\Lambda f(k) d^4 k.

This obviously converges, and solutions to this are always a sum of three parts:

  • The first part diverges as \Lambda \to \infty, either logarithmically or as a power of \Lambda.
  • The second part is finite and independent of \Lambda.
  • The third part vanishes as \Lambda \to \infty.

Renormalization in this case amounts to getting rid of the first part.

These three parts represent three different length scales: at lengths larger than \Lambda, all quantum or statistical fluctuations are negligible, and we can use the mean field approximation and do classical physics. At lengths between \Lambda and 1/\Lambda, we use QFT to calculate what’s going on. Finally, at lengths smaller than 1/\Lambda, we need a new theory to describe what’s going on. In the case of QED, the new theory is quantum gravity; string theory and loop quantum gravity are the serious contenders for the correct theory.

The problem with this regularization scheme is that it doesn’t preserve gauge invariance, so usually physicists use another regularization scheme, called “dimensional regularization”. Here, we compute

\displaystyle \int_0^\infty f(k) d^\Lambda k

which gives us an expression involving gamma functions of \Lambda, where gamma is the continuous factorial function, and then we set \Lambda = 4 - \epsilon. The solutions to this are also a sum of three terms—a divergent part, a finite part, and a vanishing part—and then renormalization gets rid of the divergent part.

Assume we have some theory with a single free parameter g. We’d like to calculate a function F(x) perturbatively in terms of g, where F represents some physical quantity, and we know F(x_0) = g'. We assume F takes the form

\displaystyle F(x) = g + g^2 F_2(x) + g^3 F_3(x) + \cdots

and assume that this definition gives us divergent integrals for the F_n. The first step is regularization: instead of F we have a new function

\displaystyle F_\Lambda(x) = g + g^2 F_{2,\Lambda}(x) + g^3 F_{3,\Lambda}(x) + \cdots

Now we get to the business of renormalization! We solve this problem at each order; if the theory is renormalizable, knowing the solution at the previous order will give us a constraint for the next order, and we can subtract off all the divergent terms in a consistent way:

  1. Order g.

    Here, F_\Lambda(x) = g + O(g^2). Since it’s a constant, it has to match F(x_0) = g', so g = g' + O(g'^2). In this approximation, the coupling constant takes the classical value.

  2. Order g^2.

    Let g = g' + G_2(g') + G_3(g') + \cdots, where G_n(g') \sim O(g'^n). Plugging this into the definition of F_\Lambda, we get

    \displaystyle F_\Lambda(x) = g' + G_2(g') + g'^2 F_{2,\Lambda}(x) + O(g'^3).

    Using F(x_0) = g', we get G_2(g') = -g'^2F_{2,\Lambda}(x_0), which diverges as \Lambda \to \infty. In the case of QED, this says that the charge on the electron is infinite. While the preferred interpretation these days is that quantum gravity is a more fundamental theory that takes precedence on very small scales (a Planck length is to a proton as a proton is to a meter), when the theory was first introduced, there was no reason to think that we’d need another theory. So the interpretation was that with an infinite charge, an electron would be able to extract an infinite amount of energy from the electromagnetic field. Then the uncertainty principle would create virtual particles of all energies, which would exist for a time inversely proportional to the energy. The particles can be charged, so they line up with the field and dampen the strength just like dielectrics. In this interpretation, the charge on the electron depends on the energy of the particles you’re probing it with.

    So to second order,

    \displaystyle F_\Lambda(x) = g' + g'^2\left(F_{2,\Lambda}(x) - F_{2,\Lambda}(x_0)\right) + O(g'^3).

    A theory is therefore only renormalizable if the divergent part of F_{2,\Lambda}(x) is independent of x. In QED it is. We can now define F(x) as the limit

    \displaystyle F(x) = \lim_{\Lambda \to \infty} F_\Lambda(x).

  3. Higher orders.

    In a renormalizable theory, the process continues, with the counterterms entirely specified by knowing F(x_0).

Renormalization and Computation 1

Posted in Category theory, Math, Programming, Quantum by Mike Stay on 2009 October 7

Yuri Manin recently put two papers on the arxiv applying the methods of renormalization to computation and the Halting problem. Grigori Mints invited me to speak on Manin’s results at the weekly Stanford logic seminar because in his second paper, he expands on some of my work.

In these next few posts, I’m going to cover the idea of Feynman diagrams (mostly taken from the lecture notes for the spring 2004 session of John Baez’s Quantum Gravity seminar); next I’ll talk about renormalization (mostly taken from Andrew Blechman’s overview and B. Delamotte’s “hint”); third, I’ll look at the Hopf algebra approach to renormalization (mostly taken from this post by Urs Schreiber on the n-Category Café); and finally I’ll explain how Manin applies this to computation by exploiting the fact that Feynman diagrams and lambda calculus are both examples of symmetric monoidal closed categories (which John Baez and I tried to make easy to understand in our Rosetta stone paper), together with some results on the density of halting times from my paper “Most programs stop quickly or never halt” with Cris Calude. I doubt all of this will make it into the talk, but writing it up will make it clearer for me.

Renormalization is a technique for dealing with the divergent integrals that arise in quantum field theory. The quantum harmonic oscillator is quantum field theory in 0+1 dimensions—it describes what quantum field theory would be like if space consisted of a single point. It doesn’t need renormalization, but I’m going to talk about it first because it introduces the notion of a Feynman diagram.

“Harmonic oscillator” is a fancy name for a rock on a spring. The force exerted by a spring is proportional to how far you stretch it:

\displaystyle F = kx.

The potential energy stored in a stretched spring is the integral of that:

\displaystyle V_0 = \frac{1}{2}kx^2 + C,

and to make things work out nicely, we’re going to choose C = -1/2. The total energy H_0 is the sum of the potential and the kinetic energy:

\displaystyle H_0 = V_0 + T = \frac{1}{2}kx^2 + \frac{1}{2}mv^2 - \frac{1}{2}.

By choosing units so that k = m = 1, we get

\displaystyle H_0 = \frac{x^2}{2} + \frac{p^2}{2} - \frac{1}{2},

where p is momentum.

Next we quantize, getting a quantum harmonic oscillator, or QHO. We set p = -i \frac{\partial}{\partial x}, taking units where \hbar = 1. Now

\begin{array}{rl}\displaystyle [x, p]x^n & \displaystyle = xp - px \\ & = (- x i \frac{\partial}{\partial x} + i \frac{\partial}{\partial x} x)x^n \\\ & \displaystyle = -i(nx^n - (n+1)x^n) \\ & \displaystyle = ix^n.\end{array}

If we define a new observable z = \frac{p + ix}{\sqrt{2}}, then

\begin{array}{rl} \displaystyle z z^* & \displaystyle = \frac{(p + ix)}{\sqrt{2}} \frac{(p - ix)}{\sqrt{2}} \\ & = \frac{1}{2}(p^2 + i(xp - px) + x^2) \\ & = \frac{1}{2}(p^2 -1 + x^2) \\ & = H_0.\end{array}

We can think of z^* as \frac{d}{dz} and write the energy eigenvectors as polynomials in z:

\displaystyle H_0 z^n = z \frac{d}{dz} z^n = n z^n.

The creation operator z adds a photon to the mix; there’s only one way to do that, so z\cdot z^n = 1 z^{n+1}. The annihilation operator \frac{d}{dz} destroys one of the photons; in the state z^n, there are n photons to choose from, so \frac{d}{dz} z^n = n z^{n-1}.

Schrödinger’s equation says i \frac{d}{dt} \psi = H_0 \psi, so

\displaystyle \psi(t) = \sum_{n=0}^{\infty} e^{-itn} a_n z^n.

This way of representing the state of a QHO is known as the “Fock basis”.

Now suppose that we don’t have the ideal system, that the quadratic potential V_0 = \frac{1}{2}kx^2 - \frac{1}{2} is only a good local approximation to the real potential V_0 + \lambda V. Then we can write the total as H = H_0 + \lambda V, where V is a function of position and momentum, or equivalently of z and \frac{d}{dz}, and \lambda is small.

Now we solve Schrödinger’s equation perturbatively. We know that

\displaystyle \psi(t) = e^{-itH} \psi(0),

and we assume that

\displaystyle e^{-itH}\psi(t) \approx e^{-itH_0} \psi(t)

so that it makes sense to solve it perturbatively. Define

\displaystyle \psi_1(t) = e^{itH_0} e^{-itH}\psi(t)


\displaystyle V_1(t) = e^{itH_0} \lambda V e^{-itH_0}.

After a little work, we find that

\displaystyle \frac{d}{dt}\psi_1(t) = -i V_1(t) \psi_1(t),

and integrating, we get

\displaystyle \psi_1(t) = -i\int_0^t V_1(t_0) \psi_1(t_0) dt_0 + \psi(0).

We feed this equation back into itself recursively to get

\begin{array}{rl}\displaystyle \psi_1(t) & \displaystyle = -i \int_0^t V_1(t_0) \left[-i\int_0^{t_0} V_1(t_1) \psi_1(t_1) dt_1 + \psi(0) \right] dt_0 + \psi(0) \\ & \displaystyle = \left[\psi(0)\right] + \left[\int_0^t i^{-1} V_1(t_0)\psi(0) dt_0\right] + \left[\int_0^t\int_0^{t_0} i^{-2} V_1(t_0)V_1(t_1) \psi_1(t_1) dt_1 dt_0\right] \\ & \displaystyle = \sum_{n=0}^{\infty} \int_{t \ge t_0 \ge \ldots \ge t_{n-1} \ge 0} i^{-n} V_1(t_0)\cdots V_1(t_{n-1}) \psi(0) dt_{n-1}\cdots dt_0 \\ & \displaystyle = \sum_{n=0}^{\infty} (-\lambda i)^n \int_{t \ge t_0 \ge \ldots \ge t_{n-1} \ge 0} e^{-i(t-t_0)H_0} V e^{-i(t_0-t_1)H_0} V \cdots V e^{-i(t_{n-1}-0)H_0} \psi(0) dt_{n-1}\cdots dt_0.\end{array}

So here we have a sum of a bunch of terms; the nth term involves n interactions with the potential interspersed with evolving freely between the interactions, and we integrate over all possible times at which those interactions could occur.

Here’s an example Feynman diagram for this simple system, representing the fourth term in the sum above:

Three interactions with the perturbation.

The lines represent evolving under the free Hamiltonian H_0, while the dots are interactions with the potential V.

As an example, let’s consider V = (z + \frac{d}{dz}) and choose \lambda = \frac{1}{\sqrt{2}} so that \lambda V = p. When V acts on a state \psi = z^n, we get V \psi = z^{n+1} + nz^{n-1}. So at each interaction, the system either gains a photon or changes phase and loses a photon.

A particle moving in a quadratic potential in n-dimensional space gives the tensor product of n QHOs, which is QFT in a space where there are n possible harmonics. Quantum electrodynamics (QED) amounts to considering infinitely many QHOs, one for each possible energy-momentum, which forms a continuum. The diagrams for QED start to look more familiar:
Feynman diagrams
The vertices are interactions with the electromagnetic field. The straight lines are electrons and the wiggly ones are photons; between interactions, they propagate under the free Hamiltonian.

Lots of math blogs

Posted in Math by Mike Stay on 2009 September 30

I had no idea there were so many.

Fun article on information theory and flu genetics

Posted in Evolution, Math by Mike Stay on 2009 September 12

Monad for weakly monoidal categories

Posted in Category theory, Math by Mike Stay on 2009 August 19

We’ve got free and forgetful functors L:\mbox{Cat} \to \mbox{WeakMonCat}, R:\mbox{WeakMonCat} \to \mbox{Cat}. Define T = RL:\mbox{Cat} \to \mbox{Cat}. Given a category X, the category TX has

  • binary trees with \mbox{Ob}(X)-labeled leaves as objects and
  • binary trees with \mbox{Mor}(X)-labeled leaves together with the natural isomorphisms from the definition of a weakly monoidal category as its morphisms.

The multiplication \mu_X:TTX\to TX collapses two layers of trees down to one. The unit \eta_X:X \to TX gives a one-leaf tree.

An algebra of the monad is a category X together with a functor h:TX \to X such that h \circ Th = h \circ \mu_X and h \circ \eta_X = X. Define

x \otimes_X x' = h(\eta_X(x) \otimes_{TX} \eta_X(x')).

Then the associator should be a morphism

a_X:(x \otimes_X x') \otimes_X x'' \to x \otimes_X (x' \otimes_X x'').

However, it isn’t immediately evident that the associator that comes from \otimes_{TX} does the job, since just applying h to a_{TX} gives

h((\eta_X(x) \otimes_{TX} \eta_X(x')) \otimes_{TX} \eta_X(x''))

for the source instead of

h(\eta_X(h(\eta_X(x) \otimes_{TX} \eta_X(x'))) \otimes_{TX} \eta_X(x'')),

which we get by replacing \otimes_X with its definition above. We need an isomorphism

m:(x \otimes_X x') \otimes_X x'' \stackrel{\sim}{\to} h((\eta_X(x) \otimes_{TX} \eta_X(x')) \otimes_{TX} \eta_X(x''))

so we can define a_x = m^{-1} \circ h(a_{TX}) \circ m. Now we use the equations an algebra has to satisfy to derive this isomorphism. Since h \circ Th = h \circ \mu_X, the following two objects are equal:

\begin{array}{rl} & h(\eta_X(h(\eta_X(x) \otimes_{TX} \eta_X(x'))) \otimes_{TX} \eta_X(x''))\\ = & h(\eta_X(h(\eta_X(x) \otimes_{TX} \eta_X(x'))) \otimes_{TX} \eta_X(h(\eta_X(x'')))) \\ = & (h \circ Th) ( \eta_{TX}(\eta_X(x) \otimes_{TX} \eta_X(x')) \otimes_{TTX} \eta_{TX}(\eta_X(x'')) \\ = & (h \circ \mu_X) ( \eta_{TX}(\eta_X(x) \otimes_{TX} \eta_X(x')) \otimes_{TTX} \eta_{TX}(\eta_X(x'')) \\ = & h((\eta_X(x) \otimes_{TX} \eta_X(x')) \otimes_{TX} \eta_X(x'')). \end{array}

Therefore, the isomorphism m we wanted is simply equality and a_X = h(a_{TX}). It also means that a_X satisfies the pentagon equation.

A similar derivation works for the unitors and the triangle equation.

A morphism of algebras is a functor f:X \to Y such that f \circ h_X = h_Y \circ Tf. Now

\begin{array}{rll} & f(x \otimes_X x') &  \\ = & (f \circ h_X) (\eta_X(x) \otimes_{TX} \eta_X(x')) & \mbox{by de}\mbox{fn of }\otimes_X  \\ = & (h_Y \circ Tf) (\eta_X(x) \otimes_{TX} \eta_X(x')) & \\ = & h_Y(\eta_X(f(x)) \otimes_{TX} \eta_X(f(x'))) & \mbox{by } T \\ = & f(x) \otimes_Y f(x')& \mbox{by de}\mbox{fn of }\otimes_Y\end{array}


\begin{array}{rll} & f(I_X) \\ = & (f \circ h_X) (I_{TX}) & \mbox{by de}\mbox{fn of }I_X \\ = & (h_Y \circ Tf) (I_{TX}) & \\ = & h_Y(I_{TY}) & \mbox{since }T\mbox{ preserves the empty tree} \\ = & I_Y & \mbox{by de}\mbox{fn of }I_Y \end{array}

so we have the coherence laws for a strict monoidal functor.


\begin{array}{rll} & f(a_X) & \\ = & (f \circ h_X) (a_{TX}) & \mbox{by the derivation above} \\ = & (h_Y \circ Tf) (a_{TX}) & \\ = & h_Y(a_{TY}) & \mbox{since }T\mbox{ preserves the associator} \\ = & a_Y & \mbox{again by the derivation above},\end{array}

so it preserves the associator as well. The unitors follow in the same way, so morphisms of these algebras are strict monoidal functors that preserve the associator and unitors.

Functors and monads

Posted in Category theory, Math, Programming by Mike Stay on 2009 June 23

In many languages you have type constructors; given a type A and a type constructor Lift, you get a new type Lift<A>. A functor is a type constructor together with a function

   lift: (A -> B) -> (Lift<A> -> Lift<B>)

that preserves composition and identities. If h is the composition of two other functions g and f

   h (a) = g (f (a)),

then lift (h) is the composition of lift (g) and lift (f)

   lift (h) (la) = lift (g) (lift (f) (la)),

where the variable la has the type Lift<A>. Similarly, if h is the identity function on variables of type A

   h (a: A) = a,

then lift (h) will be the identity on variables of type Lift<A>

   lift (h) (la) = la.


  • Multiplication
    Lift<> adjoins an extra integer to any type:

       Lift<A> = Pair<A, int>

    The function lift() pairs up f with the identity function on integers:

       lift (f) = (f, id)
  • Concatenation
    Lift<> adjoins an extra string to any type:

       Lift<A> = Pair<A, string>

    The function lift() pairs up f with the identity function on strings:

       lift (f) = (f, id)
  • Composition
    Let Env be a type representing the possible states of the environment and

       Effect = Env -> Env

    Also, we’ll be explicit in the type of the identity function

       id<A>: A -> A
       id<A> (a) = a,

    so one possible Effect is id<Env>, the “trivial side-effect”.

    Then Lift<> adjoins an extra side-effect to any type:

       Lift<A> = Pair<A, Effect>

    The function lift() pairs up f with the identity on side-effects:

       lift (f) = (f, id<Effect>)
  • Lists
    The previous three examples used the Pair type constructor to adjoin an extra value. This functor is slightly different. Here, Lift<> takes any type A to a list of A‘s:

       Lift<A> = List<A>

    The function lift() is the function map():

       lift = map
  • Double negation, or the continuation passing transform
    In a nice type system, there’s the Unit type, with a single value, and there’s also the Empty type, with no values (it’s “uninhabited”). The only function of type X -> Empty is the identity function id<Empty>. This means that we can think of types as propositions, where a proposition is true if it’s possible to construct a value of that type. We interpret the arrow as implication, and negation can be defined as “arrowing into Empty“: let F = Empty and T = F -> F. Then T -> F = F (since T -> F is uninhabited) and T is inhabited since we can construct the identity function of type F -> F. Functions correspond to constructive proofs. “Negation” of a proof is changing it into its contrapositive form:

      If A then B => If NOT B then NOT A.

    Double negation is doing the contrapositive twice:

      IF A then B => If NOT NOT A then NOT NOT B.

    Here, Lift<> is double negation:

       Lift<A> = (A -> F) -> F.

    The function lift takes a proof to its double contrapositive:

       lift: (A -> B)   ->   ((A -> F) -> F) -> ((B -> F) -> F)
       lift  (f) (k1) (k2) = k1 (lambda (a) { k2 (f (a)) })


A monad is a functor together with two functions

   m: Lift<Lift<A>> -> Lift<A>
   e: A -> Lift<A>

satisfying some conditions I’ll get to in a minute.


  • Multiplication
    If you adjoin two integers, m() multiplies them to get a single integer:

       m: Pair<Pair<A, int>, int> -> Pair<A, int>
       m  (a, i, j)               =  (a, i * j).

    The function e() adjoins the multiplicative identity, or “unit”:

       e: A   -> Pair<A, int>
       e  (a) = (a, 1)
  • Concatenation
    If you adjoin two strings, m() concatenates them to get a single string:

       m: Pair<Pair<A, string>, string> -> Pair<A, string>
       m  (a, s, t)                     =  (a, s + t).

    The function e() adjoins the identity for concatenation, the empty string:

       e: A   -> Pair<A, string>
       e  (a) = (a, "")
  • Composition
    If you adjoin two side-effects, m() composes them to get a single effect:

       m: Pair<Pair<A, Effect>, Effect> -> Pair<A, Effect>
       m  (a, s, t)                     =  (a, t o s),


       (t o s) (x) = t (s (x)).

    The function e() adjoins the identity for composition, the identity function on Env:

       e: A   -> Pair<A, Effect>
       e  (a) = (a, id<Env>)
  • Lists
    If you have two layers of lists, m() flattens them to get a single layer:

       m: List<List<A>> -> List<A>
       m = flatten

    The function e() makes any element of A into a singleton list:

       e: A -> List<A>
       e (a) = [a]
  • Double negation, or the continuation passing transform
    If you have a quadruple negation, m() reduces it to a double negation:

       m: ((((A -> F) -> F) -> F) -> F)   ->  ((A -> F) -> F)
       m (k1) (k2) = k1 (lambda (k3) { k3 (k2) })

    The function e() is just reverse application:

       e: A -> (A -> F) -> F
       e (a) (k) = k (a)

The conditions that e and m have to satisfy are that m is associative and e is a left and right unit for m. In other words, assume we have

   llla: Lift<Lift<Lift<A>>>
   la: Lift<A>


   m (lift (m) (llla))  =  m (m (llla))


   m (e (la))  =  m (lift (e) (la))  =  la


  • Multiplication:
    There are two different ways we can use lifting with these two extra functions e() and m(). The first is applying lift() to them. When we apply lift to m(), it acts on three integers instead of two; but because

       lift (m) = (m, id),

    it ignores the third integer:

       lift (m) (a, i, j, k) = (a, i * j, k).

    Similarly, lifting e() will adjoin the multiplicative unit, but will leave the last integer alone:

       lift (e) = (e, id)
       lift (e) (a, i) = (a, 1, i)

    The other way to use lifting with m() and e() is to apply Lift<> to their input types. This specifies A as Pair<A', int>, so the first integer gets ignored:

       m (a, i, j, k) = (a, i, j * k)
       e (a, i) = (a, i, 1)

    Now when we apply m() to all of these, we get the associativity and unit laws. For associativity we get

       m (lift (m) (a, i, j, k)) = m(a, i * j, k) = (a, i * j * k)
       m (m (a, i, j, k)) = m(a, i, j * k) = (a, i * j * k)

    and for unit, we get

       m (lift (e) (a, i)) = m (a, 1, i) = (a, 1 * i) = (a, i)
       m (e (a, i)) = m (a, i, 1) = (a, i * 1) = (a, i)
  • Concatenation
    There are two different ways we can use lifting with these two extra functions e() and m(). The first is applying lift() to them. When we apply lift to m(), it acts on three strings instead of two; but because

       lift (m) = (m, id),

    it ignores the third string:

       lift (m) (a, s, t, u) = (a, s + t, u).

    Similarly, lifting e() will adjoin the empty string, but will leave the last string alone:

       lift (e) = (e, id)
       lift (e) (a, s) = (a, "", s)

    The other way to use lifting with m() and e() is to apply Lift<> to their input types. This specifies A as Pair<A', string>, so the first string gets ignored:

       m (a, s, t, u) = (a, s, t + u)
       e (a, s) = (a, s, 1)

    Now when we apply m() to all of these, we get the associativity and unit laws. For associativity we get

       m (lift (m) (a, s, t, u)) = m(a, s + t, u) = (a, s + t + u)
       m (m (a, s, t, u)) = m(a, s, t + u) = (a, s + t + u)

    and for unit, you get

       m (lift (e) (a, s)) = m (a, "", s) = (a, "" + s) = (a, s)
       m (e (a, s)) = m (a, s, "") = (a, s + "") = (a, s)
  • Composition
    There are two different ways we can use lifting with these two extra functions e() and m(). The first is applying lift() to them. When we apply lift to m(), it acts on three effects instead of two; but because

       lift (m) = (m, id<Effect>),

    it ignores the third effect:

       lift (m) (a, s, t, u) = (a, t o s, u).

    Similarly, lifting e() will adjoin the identity function, but will leave the last string alone:

       lift (e) = (e, id<Effect>)
       lift (e) (a, s) = (a, id<Env>, s)

    The other way to use lifting with m() and e() is to apply Lift<> to their input types. This specifies A as Pair<A', Effect>, so the first effect gets ignored:

       m (a, s, t, u) = (a, s, u o t)
       e (a, s) = (a, s, id<Env>)

    Now when we apply m() to all of these, we get the associativity and unit laws. For associativity we get

       m (lift (m) (a, s, t, u)) = m(a, t o s, u) = (a, u o t o s)
       m (m (a, s, t, u)) = m(a, s, u o t) = (a, u o t o s)

    and for unit, you get

       m (lift (e) (a, s)) = m (a, id<Env>, s) = (a, s o id<Env>) = (a, s)
       m (e (a, s)) = m (a, s, id<Env>) = (a, id<Env> o s) = (a, s)
  • Lists
    There are two different ways we can use lifting with these two extra functions e() and m(). The first is applying lift() to them. When we apply lift to m(), it acts on three layers instead of two; but because

       lift (m) = map (m),

    it ignores the third (outermost) layer:

         lift (m) ([[[a, b, c], [], [d, e]], [[]], [[x], [y, z]]])
       = [[a, b, c, d, e], [], [x, y, z]]

    Similarly, lifting e() will make singletons, but will leave the outermost layer alone:

       lift (e) ([a, b, c]) = [[a], [b], [c]]

    The other way to use lifting with m() and e() is to apply Lift<> to their input types. This specifies A as List<A'>, so the *innermost* layer gets ignored:

         m ([[[a, b, c], [], [d, e]], [[]], [[x], [y, z]]])
       = [[a, b, c], [], [d, e], [], [x], [y, z]]
       e ([a, b, c]) = [[a, b, c]]

    Now when we apply m() to all of these, we get the associativity and unit laws. For associativity we get

         m (lift (m) ([[[a, b, c], [], [d, e]], [[]], [[x], [y, z]]]))
       = m([[a, b, c, d, e], [], [x, y, z]])
       = [a, b, c, d, e, x, y, z]
         m (m ([[[a, b, c], [], [d, e]], [[]], [[x], [y, z]]]))
       = m([[a, b, c], [], [d, e], [], [x], [y, z]])
       = [a, b, c, d, e, x, y, z]

    and for unit, we get

       m (lift (e) ([a, b, c])) = m ([[a], [b], [c]]) = [a, b, c]
       m (e ([a, b, c])) = m ([[a], [b], [c]]) = [a, b, c]

Monads in Haskell style, or “Kleisli arrows”

Given a monad (Lift, lift, m, e), a Kleisli arrow is a function

   f: A -> Lift<B>,

so the e() function in a monad is already a Kleisli arrow. Given

   g: B -> Lift<C>

we can form a new Kleisli arrow

   (g >>= f): A -> Lift<C>
   (g >>= f) (a) = m (lift (g) (f (a))).

The operation >>= is called “bind” by the Haskell crowd. You can think of it as composition for Kleisli arrows; it’s associative, and e() is the identity for bind. e() is called “return” in that context. Sometimes code is less complicated with bind and return instead of m and e.

If we have a function f: A -> B, we can turn it into a Kleisli arrow by precomposing with e():

   (e o f): A -> Lift<B>
   (e o f) (a) = e (f (a)) = return (f (a)).


  • Double negation, or the continuation passing style transform
    We’re going to (1) show that the CPS transform of a function takes a continuation and applies that to the result of the function.  We’ll also (2) show that for two functions r, s,

         CPS (s o r)    =    CPS (s) >>= CPS (r),

    (1) To change a function f: A -> B into a Kleisli arrow (i.e. continuized function) CPS (f): A -> (B -> X) -> X, we just compose with e—or in the language of Haskell, we return the result:

         CPS (f) (a) (k)
       = return (f (a)) (k)
       = (e o f) (a) (k)
       = e (f (a)) (k)
       = k (f (a))

    (2) Given two Kleisli arrows

       f: A -> (B -> F) -> F


       g: B -> (C -> F) -> F,

    we can bind them:

         (g >>= f) (a) (k)
       = m (lift (g) (f (a))) (k)                                // defn of bind
       = lift (g) (f (a)) (lambda (k3) { k3 (k) })               // defn of m
       = f (a) (lambda (b) { (lambda (k3) { k3 (k) }) (g (b)) }) // defn of lift
       = f (a) (lambda (b) { g (b) (k) }),                       // application

    which is just what we wanted.

    In particular, if f and g are really just continuized functions

       f = (e o r)
       g = (e o s)


         (g >>= f) (a) (k)
       = f (a) (lambda (b) { g (b) (k) })             // by above
       = (e o r) (a) (lambda (b) { (e o s) (b) (k) }) // defn of f and g
       = (e o r) (a) (lambda (b) { k (s (b)) })       // defn of e
       = (e o r) (a) (k o s)                          // defn of composition
       = (k o s) (r (a))                              // defn of e
       = k (s (r (a)))                                // defn of composition
       = (e o (s o r)) (a) (k)                        // defn of e
       = CPS (s o r) (a) (k)                          // defn of CPS


       CPS (s) >>= CPS (r)    =    CPS (s o r).

My talk at Perimeter Institute

Posted in Category theory, General physics, Math, Quantum by Mike Stay on 2009 June 11

I spent last week at the Perimeter Institute, a Canadian institute founded by Mike Lazaridis (CEO of RIM, maker of the BlackBerry) that sponsors research in cosmology, particle physics, quantum foundations, quantum gravity, quantum information theory, and superstring theory. The conference, Categories, Quanta, Concepts, was organized by Bob Coecke and Andreas Döring. There were lots of great talks, all of which can be found online, and lots of good discussion and presentations, which unfortunately can’t. (But see Jeff Morton’s comments.) My talk was on the Rosetta Stone paper I co-authored with Dr. Baez.

Syntactic string diagrams

Posted in Category theory, Math, Programming by Mike Stay on 2009 January 24

I hit on the idea of making lambda a node in a string diagram, where its inputs are an antivariable and a term in which the variable is free, and its output is the same term, but in which the variable is bound.  This allows a string diagram notation for lambda calculus that is much closer to the syntactical description than the stuff in our Rosetta Stone paper. Doing it this way makes it easy to also do pi calculus and blue calculus.

There are two types, V (for variable) and T (for term). I’ve done untyped lambda calculus, but it’s straightforward to add subscripts to the types V and T to do typed lambda calculus.

There are six function symbols:

  • \lambda:V^* \times T \to T. Lambda takes an antivariable and a term that may use the corresponding variable.
  • \cap:1 \to V^* \times T. This turns an antivariable “x” introduced by lambda into the term “x”.
  • A:T \times T \to T. (Application) This takes f and x and produces f(x).
  • {\rm swap}:T \times T \to T \times T
  • !:T \to 1
  • \Delta:T \to T \times T. These two mean we can duplicate and delete terms.

The \beta-\eta rule is the real meat of the computation. The “P” is an arbitrary subdiagram. The effect is replacing the “A” application node with the “P” subdiagram, modulo some wiring.

I label the upwards arrows out of lambdas with a variable name in parentheses; this is just to assist in matching up the syntactical representation with the string diagram.

In the example, I surround part of the diagram with a dashed line; this is the part to which the \beta-\eta rule applies. Within that, I surround part with a dash-dot line; this is the subdiagram P in the rule.

When I do blue calculus this way, there are a few more function symbols and the relations aren’t confluent, but the flavor is very much the same.

String diagrams for untyped lambda calculus

String diagrams for untyped lambda calculus

An example calculation

An example calculation

Imaginary time

Posted in General physics by Mike Stay on 2009 January 9
Statics (geometric = no time):
\displaystyle x [x] x coordinate
\displaystyle y(x) [y] y coordinate
\displaystyle k [k] proportionality constant
\displaystyle y'(x) = \frac{dy(x)}{dx} [y/x] slope
\displaystyle s(x) = ky'(x) [k y/x] proportional to slope
\displaystyle T(x) = \frac{1}{2} ky'(x)^2 [k y^2/x^2] distortion
\displaystyle V(y(x)) [k y^2/x^2] original shape
\displaystyle S(y) = \int (T + V\circ y)(x) dx
\displaystyle = \int \left[ \frac{k}{2} \left(\frac{dy(x)}{dx}\right)^2 + V(y(x)) \right] dx
[k y^2/x] least S at equilibrium
Statics (with energy):
\displaystyle x [x] parameterization of curve
\displaystyle y(x) [y] y coordinate
\displaystyle k [kg x/s^2] spring constant at x
\displaystyle v(x) = \frac{dy(x)}{dx} [y/x] slope
\displaystyle F(x) = kv(x) [kg y/s^2] force due to stretching
\displaystyle T(x) = \frac{1}{2} kv(x)^2 [kg y^2/s^2 x = J/x] stretching energy density
\displaystyle V(y(x)) [kg y^2/s^2 x= J/x] gravitational energy density
\displaystyle E(y) = \int (T + V\circ y)(x) dx
\displaystyle = \int \left[ \frac{k}{2} \left(\frac{dy(x)}{dx}\right)^2 + V(y(x)) \right] dx
[kg y^2/s^2 = J] energy (least energy at equilibrium)
Statics (unitless distance):
\displaystyle x [1] parameterization of curve
\displaystyle y(x) [m] y coordinate
\displaystyle k [kg/s^2] spring constant
\displaystyle v(x) = \frac{dy(x)}{dx} [m] relative displacement
\displaystyle F(x) = kv(x) [kg m/s^2 = N] force at x due to stretching
\displaystyle T(x) = \frac{1}{2} kv(x)^2 [kg m^2 / s^2 = J] stretching energy at x
\displaystyle V(y(x)) [kg m^2 / s^2 = J] gravitational energy at x
\displaystyle E(y) = \int (T + V\circ y)(x) dx
\displaystyle = \int \left[ \frac{k}{2} \left(\frac{dy(x)}{dx}\right)^2 + V(y(x)) \right] dx
[kg m^2 / s^2 = J] energy (least energy at equilibrium)
Dynamics (\displaystyle \underline{\lambda x.y(x) \mapsto \lambda t.y(it)}):
\displaystyle t [s] time
\displaystyle y(it) [m] y coordinate
\displaystyle m [kg] mass
\displaystyle v(t) = \frac{dy(it)}{dt} = i\frac{dy(it)}{dit} [m/s] velocity
\displaystyle p(t) = m v(t) [kg m/s] momentum
\displaystyle T(t) = \frac{1}{2}m\;v(t)^2 = -\frac{m}{2}\left(\frac{dy(it)}{dit}\right)^2 [-kg m^2/s^2 = -J] -kinetic energy
\displaystyle V(y(it)) [kg m^2 / s^2 = J] potential energy
\displaystyle iS(y) = \int (T + V\circ y \circ i)(t) dt
\displaystyle = \int \left[ -\frac{m}{2}\left(\frac{dy(it)}{dit}\right)^2 + V(y(it)) \right] dt
\displaystyle = i \int \left[ \frac{m}{2}\left(\frac{dy(it)}{dit}\right)^2 - V(y(it)) \right] d it

[kg m^2/s] i * action

See also Toby Bartels‘ sci.physics post.

I finally understand the state transformer monad!

Posted in Category theory by Mike Stay on 2008 October 29

It’s the monad arising from the currying adjunction between FX = X \times Y and GZ = Y \multimap Z.  GF attaches an environment Y to X:

GFX = Y \multimap (X \times Y)

This is the type of a function that takes a state of type Y and outputs a result of type X and a new state of type Y. So G is the ability to depend on state and F is the ability to change the state.

The natural transformation \epsilon:FG \Rightarrow 1, is evaluation:

\epsilon_X:(Y \multimap X) \times Y \to X

takes a function and an input point and evaluates the function at that point.  So we get

\mu:GFGF \Rightarrow GF

by evaluating the FG in the middle, while the unit

\eta:1 \Rightarrow GF

is just the curried identity on pairs:

\begin{array}{rl}\eta_X:X & \to Y \multimap (X \times Y) \\ x & \mapsto \lambda y.(x,y)\end{array}

Reading list

Posted in Category theory, Math, Programming by Mike Stay on 2008 April 28

Computer-aided Origami

Posted in Design, Fun links, Math by Mike Stay on 2008 February 26

With really cool bugs!

Passive polarization clock

Posted in Astronomy, Math, Time by Mike Stay on 2008 February 22

Here’s a design for a passive polarization clock.

The sky is polarized in concentric circles around the sun. The polarization of the southern sky moves through around 180 degrees during daylight hours. It is polarized vertically in the morning, horizontally at noon, and vertically again in the evening.

Align slices of polarized film such that they are parallel to the contours. Any given ray from the center of the sundial outward always hits the contours at the same angle; the angle changes by 360 degrees as the ray passes through 180 degrees. In other words, the clock goes from 6am to 6pm as the sun moves through the sky.


Yoneda embedding as contrapositive and call-cc as double negation

Posted in Category theory, Math, Programming by Mike Stay on 2008 January 26

Consider the problem of how to represent negation of a proposition P when we only have implication and falsehood:

P \to {\rm F}.

Since {\rm F} \to {\rm F} is true and {\rm T} \to {\rm F} is false, this does what we want.

The contrapositive says \neg Q \to \neg P has the same truth value as P \to Q. Using only implication, the contrapositive of P \to Q is

(Q \to {\rm F}) \to (P \to {\rm F}).

What if we don’t even have falsehood? Well, we can pick any proposition X to represent falsehood and form

(Q \to X) \to (P \to X).

The Yoneda embedding takes a category C^{\rm op} and produces a category \mbox{HOM}(C, \mbox{Set}):

\begin{array}{ccccccc}\mbox{Yoneda}_X: & C^{\rm op} & \to & \mbox{HOM}(C, \mbox{Set}) \\ & P & \mapsto & \mbox{hom}(P, X) \\ & f:P \to Q & \mapsto & \mbox{Yoneda}_X(f): & \mbox{hom}(Q, X) & \to & \mbox{hom}(P, X) \\ & & & & k & \mapsto & k \circ f \end{array}

This embedding is better known among computer scientists as the continuation passing style transformation.

In a symmetric monoidal closed category, like lambda calculus, we can move everything “inside:” every morphism f:P \to Q has an internal version f:I\to P \multimap Q. The internal Yoneda embedding of f is

\mbox{Yoneda}_X(f):I \to (Q \multimap X) \multimap (P \multimap X).

Here I is the “unit” type; notice how the target type is isomorphic under the Curry-Howard isomorphism to the contrapositive. This is a term that maps a continuation k:Q\multimap X and a value p:P to k(f(p)).

To get double negation, first do the Yoneda embedding on the identity to get

\mbox{Yoneda}_X(1_P):hom(P,X) \to hom(P,X),

then uncurry, braid, and recurry to get

\mbox{Yoneda}_X(1_P):P \to hom(hom(P,X),X),

or, internally,

\mbox{Yoneda}_X(1_P):I\to P \multimap ((P\multimap X)\multimap X).

This takes a value p to a function value k\mapsto k(p).

Call-with-current-continuation expects a term that has been converted to CPS style as above, and then hands it the remainder of the computation in k.

The category GraphUp

Posted in Category theory, Programming by Mike Stay on 2008 January 17

The category GraphUp of graphs and Granovetter update rules has

  • directed finite graphs as objects
  • morphisms generated by Granovetter rules, i.e. the following five operations:
    • add a new node. (creation, refcount=0)
    • given a node x, add a new node y and an edge x\to y. (creation, refcount=1)
    • given edges x\to y and x \to z, add an edge y\to z. (introduction, ++refcount)
    • remove an edge. (deletion, --refcount)
    • remove a node and all its outgoing edges if it has no incoming edges. (garbage collection)

It’s a monoidal category where the tensor product is disjoint union. Also, since two disjoint graphs can never become connected, they remain disjoint.

Programs in a capability-secure language get interpreted in GraphUp. A program’s state graph consists of nodes, representing the states of the system, and directed edges, representing the system’s transitions between states upon receiving input. A functor from a program state graph to GraphUp assigns an object reference graph as state to each node and an update generated by Granovetter rules to each edge.

Haskell monads for the category theorist

Posted in Category theory, Math, Programming by Mike Stay on 2008 January 1

A monad in Haskell is about composing almost-compatible morphisms. Typically you’ve got some morphism f:A\to B and then you think of a reason that you’d rather have f':A \to TB, where T:{\rm Data} \to {\rm Data} is a functor. In Haskell, though, you don’t define functors right out: you define type constructors, which are like the map of objects in a functor. You have to define the map of morphisms separately.

To define the map of morphisms, we have to define unit (aka return) and bind. We define {\rm unit}:(A \multimap B) \to (A \multimap TB) to be the category-theorist’s unit \eta composed with the input. Applying {\rm unit} to f gives

A\xrightarrow{f} B \xrightarrow{\eta} TB.

To compose “half-functored” morphisms like f', we define {\rm bind}:TA \times (A\multimap TB)\to TB like this: given an input a:TA and a morphism like f', bind produces

TA \xrightarrow{Tf'} TTB \xrightarrow{\mu_B} TB.

So a “monad” in Haskell is always the monad for categories, except the lists are of bindable half-functored morphisms like f' rather than lists of composable morphisms.

A side-effect monad has T(A) = A\times E, where E is the data type for the environment on which the morphism is to act. The IO monad is a specific instance of a side-effect monad with the environment being the keyboard queue, disk drive, network, etc.

Another piece of the stone

Posted in Category theory, Math, Programming by Mike Stay on 2007 December 28

A few days ago, I thought that I had understood pi calculus in terms of category theory, and I did, in a way.

\lambda-calculus model in Set
type set of values
term function
(deterministic outcome)
rewrite rule identity functional
\pi-calculus model in Set
process set of states
reduction rule state update rule
(deterministic outcome)

To make lambda calculus into a category, we mod out by the rewrite rules and consider equivalence classes of terms instead of actual terms. A model of this category (i.e. a functor from the category to Set) picks out a set of values for each datatype and a function for each program. Given a value from the input set, we get a single value from the output set.

Similarly, a model of the pi calculus assigns to each process a set of states and to each reduction rule a function that updates the state. A morphism in this way of thinking is a certain number of reductions to perform. The reductions are deterministic in the sense that we can model “A or B” as A \sqcup B. Given an object’s state at a certain point, we can tell what set of states it can be in after the system reduces n messages.

However, what we really want is to know the set of states it can be in after all the messages have been processed: what is its normal form? This is far more like the rewrite rules in lambda calculus. It suggests that we should be treating the reduction rules like 2-morphisms instead of 1-morphisms. There’s one important difference from lambda calculus, however: the 2-morphisms of pi calculus are not confluent! It matters very much in which order they are evaluated. Thus processes can’t map to anything but trivial functions.

It looks like a better fit for models of the pi calculus is Rel, the category of sets and relations. A morphism in Rel can relate a single input state to multiple output states. This suggests we have a single object * in the pi calculus that gets mapped to a set of possible states for the system, while each process gets mapped to a relation that encodes all its possible reductions.

I’m rather embarrassed that it took me so long to notice this, since my advisor has been talking about replacing Set by Rel for years.

category lambda calculus pi calculus
objects types only one type *, a system of processes
a morphism an equivalence class of terms a structural congruence class of processes
dinatural transformation from the constant functor (mapping to the terminal set and its identity) to a functor generated by hom, products, projections, and exponentials (if they’re there) combinator: template for programs mapping between isomorphic types (usually) since there’s only one type, this is trivial
Model of the category in Rel (usually taken in Set, a subcategory of Rel) a set of values for each data type and a function for each morphism between them * maps to a set S of states for the system, and each process gets mapped to a relation that relates each element of S to its possible reductions in S

The Yoda embedding

Posted in Category theory by Mike Stay on 2007 December 21

\overline{\mbox{Contravariant}}\,(\mbox{it is}).

Continuation passing as a reflection

Posted in Category theory, Math, Programming by Mike Stay on 2007 December 21

We can write any expression like f(g(x,y)) as a full binary tree where the nodes denote evaluation of the left child at the right, and the leaves are values:


Figure 1: f(g(x)(y))

[In the caption of figure 1, the expression is slightly different; when using trees, it’s more convenient to curry all the functions—that is, replace every comma “,” by back-to-back parens: “)(” .]

The continuation passing transform (Yoneda embedding) first reflects the tree across the vertical axis and then replaces the root and all the left children with their continuized form—a value x gets replaced with a function value \overline{x} = (f \mapsto f(x)):


Figure 2: \overline{\overline{ \overline{y}(\overline{x}(g))  } (f)}

What does this evaluate to? Well,

\begin{array}{rl} & \overline{\overline{ \overline{y}(\overline{x}(g))  } (f) }\\ = & \overline{f(\overline{y}(\overline{x}(g)))} \\ = & \overline{f(\overline{x}(g)(y))} \\ = & \overline{f(g(x)(y))} \end{array}

As we hoped, it’s the continuization (Yoneda embedding) of the original expression. Iterating, we get


Figure 3: \overline{\overline{\overline{f}\left(\overline{\overline{\overline{g}\left(\overline{x}\right)}\left(\overline{y}\right)}\right)}}

At this point, we get an enormous simplification: we can get rid of overlines whenever the left and right branch both have them. For instance,

\overline{g}(\overline{x}) = \overline{x}(g) = g(x).

Actually working out the whole expression would mean lots of epicycular reductions like this one, but taking the shortcut, we just get rid of all the lines except at the root. That means that we end up with


for our final expression.

However, if this expression is just part of a larger one—if what we’re calling the “root” is really the child of some other node—then the cancellation of lines on siblings applies to our “root” and its sibling, and we really do get back to where we started!

A piece of the Rosetta stone

Posted in Category theory, Math, Programming by Mike Stay on 2007 December 20
category lambda calculus pi calculus Turing machine
objects types structural congruence classes of processes \mathbb{N}\times S^*, where \mathbb{N} is the natural numbers and S^* is all binary sequences with finitely many ones.
a morphism an equivalence class of terms a specific reduction from one process state to the next a specific transition from one state and position of the machine to the next
dinatural transformation from the constant functor (mapping to the terminal set and its identity) to a functor generated by hom, products, projections, and exponentials (if they’re there) combinator reduction rule (covers all reductions of a particular form) tape-head update rule (covers all transitions with the current cell and state in common)
products product types parallel processes multiple tapes
internal hom exponential types all types are exponentials? ?
Model of the category in Set A set of values for each data type and a function for each morphism between them A set of states for each process and a single evolution function out of each set. ?

This won’t be appearing in our Rosetta stone paper, but I wanted to write it down. What flavor of logic corresponds to the pi calculus? To the Turing machine?

The continuation passing transform and the Yoneda embedding

Posted in Category theory, Math, Programming by Mike Stay on 2007 December 19

They’re the same thing! Why doesn’t anyone ever say so?

Assume A and B are types; the continuation passing transform takes a function (here I’m using C++ notation)

B f(A a);

and produces a function

<X> CPT_f(<X> (*k)(B), A a) {
  return k(f(a));

where X is any type. In CPT_f, instead of returning the value f(a) directly, it reads in a continuation function k and “passes” the result to it. Many compilers use this transform to optimize the memory usage of recursive functions; continuations are also used for exception handling, backtracking, coroutines, and even show up in English.

The Yoneda embedding takes a category C^{\rm op} and produces a category \mbox{HOM}(C, \mbox{Set}):

\begin{array}{ccccccc}\mbox{CPT}: & C^{\rm op} & \to & \mbox{HOM}(C, \mbox{Set}) \\ & A & \mapsto & \mbox{hom}(A, -) \\ & f:B\leftarrow A & \mapsto & \mbox{CPT}(f): & \mbox{hom}(B, -) & \to & \mbox{hom}(A, -) \\ & & & & k & \mapsto & k \circ f \end{array}

We get the transformation above by uncurrying to get \mbox{CPT}(f):\mbox{hom}(B, -) \times A \to -.

In Java, a (cartesian closed) category C is an interface C with a bunch of internal interfaces and methods mapping between them. A functor F:C \to {\rm Set} is written

class F implements C.

Then each internal interface C.A gets instantiated as a set F.A of values and each method C.f() becomes instantiated as a function F.f() between the sets.

The continuation passing transform can be seen as a parameterized functor {\rm CPT\langle X\rangle}:C \to {\rm Set}. We’d write

class CPT<X> implements C.

Then each internal interface C.A gets instantiated as a set CPT<X>.A of methods mapping from C.A to X—i.e. continuations that accept an input of type C.A—and each method C.f maps to the continuized function CPT<X>.f described above.

Then the Yoneda lemma says that for every model of C—that is, for every class F implementing the interface C—there’s a natural isomorphism between the set F(A) and the set of natural transformations {\rm hom}({\rm hom}(A, -), F).

A natural transformation \alpha between F:C \to {\rm Set} and {\rm CPT}:C \to {\rm Set} is a way to cast the class F to the class CPT<X> such that for any method of C, you can either

  • invoke its implementation directly (as a method of F) and then continuize the result (using the type cast), or
  • continuize first (using the type cast) and then invoke the continuized function (as a method of CPT<X>) on the result

and you’ll get the same answer. Because it’s a natural isomorphism, the cast has an inverse.

The power of the Yoneda lemma is taking a continuized form (which apparently turns up in lots of places) and replacing it with the direct form. The trick to using it is recognizing a continuation when you see one.