HDRA part 1a

Posted in Uncategorized by Mike Stay on 2015 May 30

Again, comments are appreciated.

2-categories and lambda calculus

This is a follow up post, but not the promised sequel, from A 2-Categorical Approach to the Pi Calculus where I’ll try to correct various errors I made and try to clarify some things.

First, lambda calculus was invented by Church to solve Hilbert’s decision problem, also known as the Entscheidungsproblem. The decision problem was third on a list of problems he presented at a conference in 1928, but was not, as I wrote last time, “Hilbert’s third problem”, which was third on a list of problems he laid out in 1900.

Second, the problem Greg Meredith and I were trying to address was how to prevent reduction under a “receive” prefix in the pi calculus, which is intimately connected with preventing reduction under a lambda in lambda calculus. All programming languages that I know of, whether eager or lazy, do not reduce under a lambda.

There are two approaches in literature to the semantics of lambda calculus. The first is denotational semantics, which was originally concerned with what function a term computes. This is where computability and type theory live. Denotational semantics treats alpha-beta-eta equivalence classes of lambda terms as morphisms in a category. The objects in the category are called “domains”, and are usually “\omegaCPOs“, a special kind of poset. Lambek and Scott used this approach to show that alpha-beta-eta equivalence classes of lambda terms with one free variable form a cartesian closed category where composition is given by substitution.

The type of a term describes the structure of its normal form. Values are terms that have no beta reduction; they’re either constants or lambda abstractions. The types in lambda calculus are either base types or lambda abstractions, respectively. (While Lambek and Scott introduced new term constructors for products, they’re not strictly necessary, because the lambda term \lambda z.ztu can be used for the pair \langle t,u\rangle with projections \lambda p.p(\lambda xy.x) and \lambda p.p(\lambda xy.y).)

The second is operational semantics, which is more concerned with how a function is computed than what it computes. All of computational complexity theory and algorithmic analysis lives here, since we have to count the number of steps it takes to complete a computation. Operational semantics treats lambda terms as objects in a category and rewrite rules as morphisms. It is a very syntactical approach; the set of terms is an algebra of some endofunctor, usually presented in Backus-Naur Form. This set of terms is then equipped with some equivalence relations and reduction relations. For lambda calculus, we mod out terms by alpha, but not beta. The reduction relations often involve specifying a reduction context, which means that the rewrites can’t occur just anywhere in a term.

In pi calculus, terms don’t have a normal form, so we can’t define an equivalence on pi calculus terms by comparing their normal forms. Instead, we say two terms are equivalent if they behave the same in all contexts, i.e. they’re bisimilar. Typing becomes rather more complicated; pi calculus types describe the structure of the current term and rewrites should do something weaker than preserve types on the nose; it suggests using a double category, but that’s for next time.

Seely suggested modeling rewrites with 2-morphisms in a 2-category and showed how beta and eta were lax adjoints.  We’re suggesting a different, more operational way of modeling rewrites with 2-morphisms.  Define a 2-category L with

  • V and T as generating objects,
  • -, -(=), and \lambda as generating 1-morphisms,
  • alpha equivalence as an identity 2-morphism, and
  • beta reduction as a generating 2-morphism.

A closed term is one with no free variables, and the hom category L(I, T) consisting of closed lambda terms and beta reductions between them is a typical category you’d get from looking for the operational semantics of lambda calculus.

The 2-category L is a fine semantics for the variant of lambda calculus where beta can apply anywhere within a term, but there are too many morphisms if you want to model the lambda calculus without reduction under a prefix. Given closed terms t, t'\colon I \to T such that t beta reduces to t', we can whisker beta by \lambda x to get

\displaystyle \lambda x.\beta\colon \lambda x.t \Rightarrow \lambda x.t'.

To cut down the extra 2-morphisms, we need to model reduction contexts. The difference between the precontexts above and the contexts we need is the notion of the “topmost” context, where we can see enough of the surrounding term to determine that reduction is possible. To model a reduction context, we make a new 2-category L' by introducing a new generating 1-morphism

\displaystyle [-]\colon T \to T

and say that a 1-morphism with signature T^{\otimes n} \to T that factors as a precontext followed by [-] is an n-hole term context. We also interpret beta with a 2-morphism between reduction contexts rather than between precontexts. The hom category L'(I, T) is the operational semantics of lambda calculus without reduction under lambda.

In the next post, I’ll relate Seely’s 2-categorical approach and our 2-categorical by extending to double categories and using Melliès and Zeilberger’s notion of type refinement.


Posted in Uncategorized by Mike Stay on 2015 May 18

Greg Meredith and I have a short paper that’s been accepted to Higher-Dimensional Rewriting and Applications
(HDRA) 2015
on modeling the asynchronous polyadic pi calculus with 2-categories. We avoid domain theory entirely and model the operational semantics directly; full abstraction is almost trivial. As a nice side-effect, we get a new tool for reasoning about consumption of resources during a computation.

It’s a small piece of a much larger project, which I’d like to describe here in a series of posts. This post will talk about lambda calculus for a few reasons. First, lambda calculus is simpler, but complex enough to illustrate one of our fundamental insights. Lambda calculus is to serial computation what pi calculus is to concurrent computation; lambda calculus talks about a single machine doing a computation, while pi calculus talks about a network of machines communicating over a network with potentially random delays. There is at most one possible outcome for a computation in the lambda calculus, while there are many possible outcomes in a computation in the pi calculus. Both the lazy lambda calculus and the pi calculus, however, have as an integral part of their semantics the notion of waiting for a sub-computation to complete before moving onto another one. Second, the denotational semantics of lambda calculus in Set is well understood, as is its generalization to cartesian closed categories; this semantics is far simpler than the denotational semantics of pi calculus and serves as a good introduction. The operational semantics of lambda calculus is also simpler than that of pi calculus and there is previous work on modeling it using higher categories.


Alonzo Church invented the lambda calculus as part of his attack on Hilbert’s third problem, also known as the Entscheidungsproblem, which asked for an algorithm to solve any mathematical problem. Church published his proof that no such algorithm exists in 1936. Turing invented his eponymous machines, also to solve the Entscheidungsproblem, and published his independent proof a few months after Church. When he discovered that Church had beaten him to it, Turing proved in 1937 that the two approaches were equivalent in power. Since Turing machines were much more “mechanical” than the lambda calculus, the development of computing machines relied far more on Turing’s approach, and it was only decades later that people started writing compilers for more friendly programming languages. I’ve heard it quipped that “the history of programming languages is the piecemeal rediscovery of the lambda calculus by computer scientists.”

The lambda calculus consists of a set of “terms” together with some relations on the terms that tell how to “run the program”. Terms are built up out of “term constructors”; in the lambda calculus there are three: one for variables, one for defining functions (Church denoted this operation with the Greek letter lambda, hence the name of the calculus), and one for applying those functions to inputs. I’ll talk about these constructors and the relations more below.

Church introduced the notion of “types” to avoid programs that never stop. Modern programming languages also use types to avoid programmer mistakes and encode properties about the program, like proving that secret data is inaccessible outside certain parts of the program. The “simply-typed” lambda calculus starts with a set of base types and takes the closure under the binary operation \to to get a set of types. Each term is assigned a type; from this one can deduce the types of the variables used in the term. An assignment of types to variables is called a typing context.

The search for a semantics for variants of the lambda calculus has typically been concerned with finding sets or “domains” such that the interpretation of each lambda term is a function between domains. Scott worked out a domain D such that the continuous functions from D to itself are precisely the computable ones. Lambek and Scott generalized the category where we look for semantics from Set to arbitrary cartesian closed categories (CCCs).

Lambek and Scott constructed a CCC out of lambda terms; we call this category the syntactical category. Then a structure-preserving functor from the syntactical category to Set or some other CCC would provide the semantics. The syntactical category has types as objects and equivalence classes of certain terms as morphisms. A morphism in the syntactical category goes from a typing context to the type of the term.

John Baez has a set of lecture notes from Fall 2006 through Spring 2007 describing Lambek and Scott’s approach to the category theory of lambda calculus and generalizing it from cartesian closed categories to symmetric monoidal closed categories so it can apply to quantum computation as well: rather than taking a functor from the syntactical category into Set, we can take a functor into Hilb instead. He and I also have a “Rosetta stone” paper summarizing the ideas and connecting them with the corresponding generalization of the Curry-Howard isomorphism.

The Curry-Howard isomorphism says that types are to propositions as programs are to proofs. In practice, types are used in two different ways: one as propositions about data and the other as propositions about code. Programming languages like C, Java, Haskell, and even dynamically typed languages like JavaScript and Python use types to talk about propositions that data satisfies: is it a date or a name? In these languages, equivalence classes of programs constitute constructive proofs. Concurrent calculi are far more concerned about propositions that the code satisfies: can it reach a deadlocked state? In these languages, it is the rewrite rules taking one term to another that behave like proofs. Melliès and Zeilberger’s excellent paper “Functors are Type Refinement Systems” relates these two approaches to typing to each other.

Note that Lambek and Scott’s approach does not have the sets of terms or variables as objects! The algebra that defines the set of terms plays only a minor role in the category; there’s no morphism in the CCC, for instance, that takes a term t and a variable x to produce the term \lambda x.t. This failure to capture the structure of the term in the morphism wasn’t a big deal for lambda calculus because of “confluence” (see below), but it turns out to matter a lot more in calculi like Milner’s pi calculus that describe communicating over a network, where messages can be delayed and arrival times matter for the end result (consider, for instance, two people trying to buy online the last ticket to a concert).

The last few decades have seen domains becoming more and more complicated in order to try to “unerase” the information about the structure of terms that gets lost in the domain theory approach and recover the operational semantics. Fiore, Moggi, and Sangiorgi, Stark and Cattani, Stark, and Winskel all present domain models of the pi calculus that recursively involve the powerset in order to talk about all the possible futures for a term. Industry has never cared much about denotational semantics: the Java Virtual Machine is an operational semantics for the Java language.

What we did

Greg Meredith and I set out to model the operational semantics of the pi calculus directly in a higher category rather than using domain theory. An obvious first question is, “What about types?” I was particularly worried about how to relate this approach to the kind of thing Scott and Lambek did. Though it didn’t make it into the HDRA paper and the details won’t make it into this post, we found that we’re able to use the “type-refinement-as-a-functor” idea of Melliès and Zeilberger to show how the algebraic term-constructor functions relate to the morphisms in the syntactical category.

We’re hoping that this categorical approach to modeling process calculi will help with reasoning about practical situations where we want to compose calculi; for instance, we’d like to put a hundred pi calculus engines around the edges of a chip and some ambient calculus engines, which have nice features for managing the location of data, in the middle to distribute work among them.

Lambda calculus

The lambda calculus consists of a set of “terms” together with some relations on the terms. The set T of terms is defined recursively, parametric in a countably infinite set V of variables. The base terms are the variables: if x is an element of V, then x is a term in T. Next, given any two terms t, t' \in T, we can apply one to the other to get t(t'). We say that t is in the head position of the application and t' in the tail position. (When the associativity of application is unclear, we’ll also use parentheses around subterms.) Finally, we can abstract out a variable from a term: given a variable x and a term t, we get a term \lambda x.t.

The term constructors define an algebra, a functor LC from Set to Set that takes any set of variables V to the set of terms T = LC(V). The term constructors themselves become functions:

\displaystyle   \begin{array}{rll}  -\colon & V \to T &\mbox{variable}\\  -(-)\colon & T \times T \to T &\mbox{application}\\  \lambda\colon & V \times T \to T &\mbox{abstraction}  \end{array}

Church described three relations on terms. The first relation, alpha, relates any two lambda abstractions that differ only in the variable name. This is exactly the same as when we consider the function f(x) = x^2 to be identical to the function f(y) = y^2. The third relation, eta, says that there’s no difference between a function f and a “middle-man” function that gets an input x and applies the function f to it: \lambda x.f(x) = f. Both alpha and eta are equivalences.

The really important relation is the second one, “beta reduction”. In order to define beta reduction, we have to define the free variables of a term: a variable occurring by itself is free; the set of free variables in an application is the union of the free variables in its subterms; and the free variables in a lambda abstraction are the free variables of the subterm except for the abstracted variable.

\displaystyle   \begin{array}{rl}  \mathrm{FV}(x) = & \{x\} \\  \mathrm{FV}(t(t')) = & \mathrm{FV}(t) \cup \mathrm{FV}(t') \\  \mathrm{FV}(\lambda x.t) = & \mathrm{FV}(t) / \{x\} \\  \end{array}

Beta reduction says that when we have a lambda abstraction \lambda x.t applied to a term t', then we replace every free occurrence of x in t by t':

\displaystyle  (\lambda x.t)(t') \downarrow_\beta t\{t' / x\},

where we read the right hand side as “t with t' replacing x.” We see a similar replacement of y in action when we compose the following functions:

\displaystyle   \begin{array}{rl}  f(x) = & x + 1 \\  g(y) = & y^2 \\  g(f(x)) = & (x + 1)^2 \\  \end{array}

We say a term has a normal form if there’s some sequence of beta reductions that leads to a term where no beta reduction is possible. When the beta rule applies in more than one place in a term, it doesn’t matter which one you choose to do first: any sequence of betas that leads to a normal form will lead to the same normal form. This property of beta reduction is called confluence. Confluence means that the order of performing various subcomputations doesn’t matter so long as they all finish: in the expression (2 + 5) * (3 + 6) it doesn’t matter which addition you do first or whether you distribute the expressions over each other; the answer is the same.

“Running” a program in the lambda calculus is the process of computing the normal form by repeated application of beta reduction, and the normal form itself is the result of the computation. Confluence, however, does not mean that when there is more than one place we could apply beta reduction, we can choose any beta reduction and be guaranteed to reach a normal form. The following lambda term, customarily denoted \omega, takes an input and applies it to itself:

\displaystyle \omega = \lambda x.x(x)

If we apply \omega to itself, then beta reduction produces the same term, customarily called \Omega:

\displaystyle \Omega = \omega(\omega)

\displaystyle \Omega \downarrow_\beta \Omega.

It’s an infinite loop! Now consider this lambda term that has \Omega as a subterm:

\displaystyle (\lambda x.\lambda y.x)(\lambda x.x)(\Omega)

It says, “Return the first element of the pair (identity function, \Omega)”. If it has an answer at all, the answer should be “the identity function”. The question of whether it has an answer becomes, “Do we try to calculate the elements of the pair before applying the projection to it?”

Lazy lambda calculus

Many programming languages, like Java, C, JavaScript, Perl, Python, and Lisp are “eager”: they calculate the normal form of inputs to a function before calculating the result of the function on the inputs; the expression above, implemented in any of these languages, would be an infinite loop. Other languages, like Miranda, Lispkit, Lazy ML, and Haskell and its predecessor Orwell are “lazy” and only apply beta reduction to inputs when they are needed to complete the computation; in these languages, the result is the identity function. Abramsky wrote a 48-page paper about constructing a domain that captures the operational semantics of lazy lambda calculus.

The idea of representing operational semantics directly with higher categories originated with R. A. G. Seely, who suggested that beta reduction should be a 2-morphism; Barney Hilken and Tom Hirschowitz have also contributed to looking at lambda calculus from this perspective. In the “Rosetta stone” paper that John Baez and I wrote, we made an analogy between programs and Feynman diagrams. The analogy is precise as far as it goes, but it’s unsatisfactory in the sense that Feynman diagrams describe processes happening over time, while Lambek and Scott mod out by the process of computation that occurs over time. If we use 2-categories that explicitly model rewrites between terms, we get something that could potentially be interpreted with concepts from physics: types would become analogous to strings, terms would become analogous to space, and rewrites would happen over time.

The idea from the “algebra of terms” perspective is that we have objects V and T for variables and terms, term constructors as 1-morphisms, and the nontrivial 2-morphisms generated by beta reduction. Seely showed that this approach works fine when you’re unconcerned with the context in which reduction can occur.

This approach, however, doesn’t work for lazy lambda calculus! Horizontal composition in a 2-category is a functor, so if a term t reduces to a term t', then by functoriality, \lambda x.t must reduce to \lambda x.t'—but this is forbidden in the lazy lambda calculus! Functoriality of horizontal composition is a “relativity principle” in the sense that reductions in one context are the same as reductions in any other context. In lazy programming languages, on the other hand, the “head” context is privileged: reductions only happen here. It’s somewhat like believing that measuring differences in temperature is like measuring differences in space, that only the difference is meaningful—and then discovering absolute zero. When beta reduction can happen anywhere in a term, there are too many 2-morphisms to model lazy lambda calculus.

In order to model this special context, we reify it: we add a special unary term constructor [-]\colon T \to T that marks contexts where reduction is allowed, then redefine beta reduction so that the term constructor [-] behaves like a catalyst that enables the beta reduction to occur. This lets us cut down the set of 2-morphisms to exactly those that are allowed in the lazy lambda calculus; Greg and I did essentially the same thing in the pi calculus.

More concretely, we have two generating rewrite rules. The first propagates the reduction context to the head position in an application; the second is beta reduction restricted to a reduction context.

\displaystyle  [t(t')]\, \downarrow_{ctx}\, [[t](t')]

\displaystyle  [[\lambda x.t](t')]\, \downarrow_\beta\, [t]\, \{t'/x\}

When we surround the example term from the previous section with a reduction context marker, we get the following sequence of reductions:

\displaystyle   \begin{array}{rl}  & [(\lambda x.\lambda y.x)(\lambda x.x)(\Omega)] \\    \downarrow_{ctx}& [[(\lambda x.\lambda y.x)(\lambda x.x)](\Omega)] \\    \downarrow_{ctx}& [[[\lambda x.\lambda y.x](\lambda x.x)](\Omega)] \\    \downarrow_{\beta}& [[\lambda y.(\lambda x.x)](\Omega)]\\    \downarrow_{\beta}& [\lambda x.x] \\  \end{array}

At the start, none of the subterms were of the right shape for beta reduction to apply. The first two reductions propagated the reduction context down to the projection in head position. At that point, the only reduction that could occur was at the application of the projection to the first element of the pair, and after that to the second element. At no point was \Omega ever in a reduction context.

Compute resources

In order to run a program that does anything practical, you need a processor, time, memory, and perhaps disk space or a network connection or a display. All of these resources have a cost, and it would be nice to keep track of them. One side-effect of reifying the context is that we can use it as a resource.

The rewrite rule \downarrow_{ctx} increases the number of occurrences of [-] in a term while \downarrow_\beta decreases the number. If we replace \downarrow_{ctx} by the rule

\displaystyle   [t(t')]\, \downarrow_{ctx'}\, [t](t')

then the number of occurences of [-] can never increase. By forming the term [[\cdots[t]\cdots]], we can bound the number of beta reductions that can occur in the computation of t.

If we have a nullary constructor c\colon 1 \to T, then we can define [t] = c(t) and let the program dynamically decide whether to evaluate an expression eagerly or lazily.

In the pi calculus, we have the ability to run multiple processes at the same time; each [-] in that situation represents a core in a processor or computer in a network.

These are just the first things that come to mind; we’re experimenting with variations.


We figured out how to model the operational semantics of a term calculus directly in a 2-category by requiring a catalyst to carry out a rewrite, which gave us full abstraction without needing a domain based on representing all the possible futures of a term. As a side-effect, it also gave us a new tool for modeling resource consumption in the process of computation. Though I haven’t explained how yet, there’s a nice connection between the “algebra-of-terms” approach that uses V and T as objects and Lambek and Scott’s approach that uses types as objects that uses Melliès and Zeilberger’s ideas about type refinement. Next time, I’ll talk about the pi calculus and types.

Q, Jokers, and Clowns

Posted in Uncategorized by Mike Stay on 2014 August 5

Conor McBride has a beautiful functional pearl, “Clowns to the Left of me, Jokers to the Right”, in which he discusses the idea of suspending the computation of a catamorphism. I just wanted to point out the relation of his work to the q-derivative. Since Star Trek’s character Q is a trickter god like Loki, Anansi, or Coyote, q-derivatives also fit nicely with McBride’s theme.

The usual definition of the derivative is

\displaystyle\frac{df(x)}{dx} = \lim_{h \to 0} \frac{f(x + h) - f(x)}{(x + h) - x}.

Instead of translating x by an infinitesimal amount h, we can scale x by an infinitesimal amount q; these two definitions coincide in the limit:

\displaystyle\frac{df(x)}{dx} = \lim_{q \to 1} \frac{f(qx) - f(x)}{qx - x}.

However, when people talk about the q-derivative, they usually mean the operator we get when we don’t take the limit and q \ne 1. It should probably be called the “q-difference”, but we’ll see that the form of the difference is so special that it deserves the exceptional name.

The q-derivative, as one would hope, behaves linearly:

\begin{array}{rl} \displaystyle \frac{d_qf(x)}{d_qx} & \displaystyle = \frac{f(qx) - f(x)}{qx - x} \\ \\    \displaystyle \frac{d_q(f(x) + g(x))}{d_qx} & \displaystyle = \frac{(f(qx) + g(qx)) - (f(x) + g(x))}{qx - x} \\    & \displaystyle = \frac{(f(qx) - f(x)) + (g(qx)- g(x))}{qx - x} \\    & \displaystyle = \frac{d_qf(x)}{d_qx} + \frac{d_qg(x)}{d_qx}\end{array}

Even better, the q-derivative of a power of x is separable into a coefficient that depends only on q and a single smaller power of x:

\begin{array}{rl}\displaystyle \frac{d_q x^n}{d_qx} & \displaystyle = \frac{(qx)^n - x^n}{qx - x} \\    &\displaystyle = \frac{q^n - 1}{q - 1}\frac{x^n}{x} \\    &\displaystyle = [n]_q x^{n-1},\end{array}


\displaystyle [n]_q = \frac{q^n - 1}{q - 1} = 1 + q + \cdots + q^{n-1}.

Clearly, as q \to 1, [n]_q \to n. Whereas n counts the number of ways to insert a point into an ordered list of n-1 items, [n]_q counts the number of ways to insert a linearly independent ray passing through the origin into an (n-1)-dimensional vector space over a field with q elements with a given ordered basis.

The q-derivative even works when q is an operator rather than a number. Polynomials work in any rig, so if x is, say, a function instead of a number, q could be the Fourier transform.

Let’s lift this to types. The derivative of a datatype is the type of its one-holed contexts, so we expect the q-derivative to have a similar interpretation. When we take Q and X to be types, the Q-derivative of a tuple X^n is

\displaystyle [n]_Q X^{n-1} = X^{n-1} + (QX)X^{n-2} + (QX)^2X^{n-3} + \cdots + (QX)^{n-1}.

Each option factors into two parts: the ‘clowns’ are a power of QX to the left of the hole, followed by the ‘jokers’, a power of X after the hole. This type is the one we expect for the intermediate state when mapping a function f\colon X \to Q\times X over the tuple; any function g\colon X \to Q can be lifted to such a function f(x) = (g(x), x).

Similarly, the Q-derivative of the list datatype 1/(1-X) is 1/(1-QX)(1-X); that is, the ‘clowns’ form the list of the outputs of type Q \times X for the elements that have already been processed and the ‘jokers’ form the list of the elements yet to process.

When we abstract from thinking of q as a number to thinking of it as an operator on ring elements, the corresponding action on types is to think of Q as a functor. In this case, QX is not a pair type, but rather a parametric type Q[X]. One might, for example, consider mapping the function f(x) = 1/x over a list of real numbers. The resulting outputs will be real except when the input is zero, so we’ll want to adjoin an undefined value. Taking Q = Option, the Q-derivative of List[Int] is List[Option[Int]]\times List[Int], consisting of the list of values that have already been processed, followed by the list of values yet to process.

Funny identity matrix

Posted in Math by Mike Stay on 2013 June 11

Given any category C with finite products and coproducts such that the products distribute over the coproducts, you can make a compact closed bicategory Mat(C) of (natural numbers, matrices of elements of Ob(C), matrices of elements of Mor(C)) and define matrix composition by \displaystyle (M \circ N)(i,k) = \sum_j M(i,j) \times N(j,k).

Take the partial order whose objects are nonnegative integers and whose morphisms m \to n mean m | n; the product is gcd and the coproduct is lcm. In this category, the terminal object is 0 and the initial object is 1, so the identity matrix looks like

\left( \begin{array}{cccc}0 & 1 & \ldots & 1 \\ 1 & 0 & \ldots & 1 \\ \vdots & \vdots & \ddots & \vdots \\ 1 & 1 & \ldots & 0 \end{array} \right)

and matrix multiplication is \displaystyle (M \circ N)(i,k) = \mathop{\mbox{lcm}}_j \mbox{gcd}(M(i,j), N(j,k)).

Reflective calculi

Posted in Category theory, Programming by Mike Stay on 2013 May 15

This is mostly for my own benefit.

We start with some term-generating functor T(G, X) parametric in a set of ground terms G and a set of names X.

Now specialize to a single ground term:

T_2(X) = T(1, X)

Now mod out by structural equivalence:

T_3(X) = T_2(X) / \equiv

Let N be a set of names and let M be the set of name-equivalence classes N / \equiv_N (where \equiv_N is yet to be defined—it’s part of the theory of names in which we’re still parametric).

Prequoting and predereference are an algebra and coalgebra of T_3, respectively:

PQ: T_3(M) \to M

PD: M \to T_3(M)

such that

PD \circ PQ: T_3(M) \to T_3(M) = 1_{T_3(M)}.

Real quoting and dereference have to use T_4(X), defined below.

Define N = \mu X.T_2(X). Does N / \equiv = \mu X.T_3(X)? I think so; assuming it does, define

M = \mu X.T_3(X)

so name equivalence is structural equivalence; equivalence of prequoted predereferences is automatic by the definition above.

The fixed point gives us an isomorphism

F: T_3(M) \to M.

We can define PQ = F, PD = F^{-1}, because undecidability doesn’t come into play until we add operational semantics. It’s decidable whether two terms are structurally equivalent. Thus

PD \circ PQ:T_3(M) \to T_3(M)

is the identity, satisfying the condition, and

PQ \circ PD: M \to M

is the identity, which we get for free.

When we mod out by operational semantics (following the traditional approach rather than the 2-categorical one needed for pi calculus):

T_4(X) = T_3(X) / \beta,

we have the quotient map

q_X: T_3(X) \to T_4(X)

and a map

r_X: T_4(X) \to T_3(X)

that picks a representative from the equivalence class.

It’s undecidable whether two terms are in the same operational equivalence class, so q_X may not halt. However, it’s true by construction that

q_X \circ r_X: T_4(X) \to T_4(X)

is the identity.

We can extend prequoting and predereference to quoting and dereference on T_4 by

Q: T_4(M) \stackrel{r_M}{\to} T_3(M) \stackrel{PQ}{\to} M

D: M \stackrel{PD}{\to} T_3(M) \stackrel{q_M}{\to} T_4(M)

and then

D \circ Q

= T_4(M) \stackrel{r_M}{\to} T_3(M) \stackrel{PQ}{\to} M \stackrel{PD}{\to} T_3(M) \stackrel{q_M}{\to} T_4(M)

= T_4(M) \stackrel{r_M}{\to} T_3(M) \stackrel{q_M}{\to} T_4(M)

= T_4(M) \stackrel{id}{\to} T_4(M)

which is what we want for quoting and dereference. The other way around involves undecidability.

Monads without category theory, redux

Posted in Category theory, Programming by Mike Stay on 2012 September 26

I intended the last version of this post to be a simple one-off, but people cared too much. A fire-breathing Haskellite said that jQuery must never be used as an example of a monad because it’s (gasp) not even functional! And someone else didn’t understand JavaScript well enough to read my code snippets. So here’s another attempt that acknowledges the cultural divide between functional and imperative programmers and tries to explain everything in English as well as in source code.

Monads are a design pattern that is most useful to functional programmers because their languages prefer to implement features as libraries rather than as syntax. In Haskell there are monads for input / output, side-effects and exceptions, with special syntax for using these to do imperative-style programming. Imperative programmers look at that and laugh: “Why go through all that effort? Just use an imperative language to start with.” Functional programmers also tend to write programs as—surprise!—applying the composite of a bunch of functions to some data. The basic operation for a monad is really just composition in a different guise, so functional programmers find this comforting. Functional programmers are also more likely to use “continuations”, which are something like extra-powerful exceptions that only work well when there is no global state; there’s a monad for making them easier to work with.

There are, however, some uses for monads that even imperative programmers find useful. Collections like sets and lists (with or without parentheses), parsers, promises, and membranes are a few examples, which I’ll explain below.


Many collection types support the following three operations:

  1. Map a function over the elements of the collection.
  2. Flatten a collection of collections into a collection.
  3. Create a collection from a single element.

A monad is a class that provides a generalized version of these three operations. When I say “generalized”, I mean that they satisfy some of the same rules that the collections’ operations satisfy in much the same way that multiplication of real numbers is associative and commutative just like addition of real numbers.

The way monads are usually used is by mapping a function and then flattening. If we have a function f that takes an element and returns a list, then we can say and get a new list.


A parser is an object with a list of tokens that have already been parsed and the remainder of the object (usually a string) to be parsed.

var Parser = function (obj, tokens) {
  this.obj = obj;
  // If tokens are not provided, use the empty list.
  this.tokens = tokens || [];

It has three operations like the collections above.

  1. Mapping a function over a parser applies the function to the contained obj. = function (f) {
      return new Parser(f(this.obj), this.tokens);
  2. Flattening a parser of parsers concatenates the list of tokens.
    Parser.prototype.flatten = function () {
      return new Parser(this.obj.obj, this.obj.tokens.concat(this.tokens));

    The definition above means that

    new Parser(new Parser(x, tokens1), tokens2).flatten()
    is equivalent to
    new Parser(x, tokens1.concat(tokens2)).

  3. We can create a parser from an element x: new Parser(x).

If we have a function f that takes a string, either parses out some tokens or throws an exception, and returns a parser with the tokens and the remainder of the string, then we can say

and get a new parser. In what follows, I create a parser with the string “Hi there” and then expect a word, then some whitespace, then another word.

var makeMatcher = function (re) {
  return function (s) {
    var m = s.match(re);
    if (!m) { throw new Error('Expected to match ' + re); }
    return new Parser(m[2], [m[1]]);

var alnum = makeMatcher(/^([a-zA-Z0-9]+)(.*)/);
var whitespace = makeMatcher(/^(s+)(.*)/);

new Parser('Hi there')
// is equivalent to new Parser('', ['Hi', ' ', 'there']);


A promise is an object that represents the result of a computation that hasn’t finished yet; for example, if you send off a request over the network for a webpage, the promise would represent the text of the page. When the network transaction completes, the promise “resolves” and code that was waiting on the result gets executed.

  1. Mapping a function f over a promise for x results in a promise for f(x).
  2. When a promise represents remote data, a promise for a promise is still just remote data, so the two layers can be combined; see promise pipelining.
  3. We can create a resolved promise for any object that we already have.

If we have a function f that takes a value and returns a promise, then we can say

and get a new promise. By stringing together actions like this, we can set up a computation that will execute properly as various network actions complete.


An object-capability language is an object-oriented programming language where you can’t get a reference to an object unless you create it or someone calls one of your methods and passes a reference to it. A “membrane” is a design pattern that implements access control.

Say you have a folder object with a bunch of file objects. You want to grant someone temporary access to the folder; if you give them a reference to the folder directly, you can’t force them to forget it, so that won’t work for revokable access. Instead, suppose you create a “proxy” object with a switch that only you control; if the switch is on, the object forwards all of its method calls to the folder and returns the results. If it’s off, it does nothing. Now you can give the person the object and turn it off when their time is up.

The problem with this is that the folder object may return a direct reference to the file objects it contains; the person could lose access to the folder but could retain access to some of the files in it. They would not be able to have access to any new files placed in the folder, but would see updates to the files they retained access to. If that is not your intent, then the proxy object should hide any file references it returns behind similar new proxy objects and wire all the switches together. That way, when you turn off the switch for the folder, all the switches turn off for the files as well.

This design pattern of wrapping object references that come out of a proxy in their own proxies is a membrane.

  1. We can map a function f over a membrane for x and get a membrane for f(x).
  2. A membrane for a membrane for x can be collapsed into a single membrane that checks both switches.
  3. Given any object, we can wrap it in a membrane.

If we have a function f that takes a value and returns a membrane, then we can say

and get a new membrane. By stringing together actions like this, we can set up arbitrary reference graphs, while still preserving the membrane creator’s right to turn off access to his objects.


Monads implement the abstract operations map and flatten, and have an operation for creating a new monad out of any object. If you start with an instance m of a monad and you have a function f that takes an object and returns a monad, then you can say;

and get a new instance of a monad. You’ll often find scenarios where you repeat that process over and over.

Overloading JavaScript’s dot operator with direct proxies

Posted in Category theory, Math, Programming by Mike Stay on 2012 August 28

With the new ECMAScript 6 Proxy object that Firefox has implemented, you can make dot do pretty much anything you want. I made the dot operator in JavaScript behave like Haskell’s bind:

// I'll give a link to the code for lift() later,
// but one thing it does is wrap its input in brackets.
lift(6); // [6]
lift(6)[0]; // 6
lift(6).length; // 1

// lift(6) has no "upto" property
lift(6).upto; // undefined

// But when I define this global function, ...

// Takes an int n, returns an array of ints [0, ..., n-1].
var upto = function (x) {
  var r = [], i;
  for (i = 0; i < x; ++i) {
  return r;

// ... now the object lift(6) suddenly has this property
lift(6).upto; // [0,1,2,3,4,5]
// and it automagically maps and flattens!
lift(6).upto.upto; // [0,0,1,0,1,2,0,1,2,3,0,1,2,3,4]
lift(6).upto.upto.length; // 15

To be clear, ECMAScript 6 has changed the API for Proxy since Firefox adopted it, but you can implement the new one on top of the old one. Tom van Cutsem has code for that.

I figured this out while working on a contracts library for JavaScript. Using the standard monadic style (e.g. jQuery), I wrote an implementation that doesn’t use proxies; it looked like this:

lift(6)._(undo)._(undo).value; // [0,0,1,0,1,2,0,1,2,3,0,1,2,3,4]

The lift function takes an input, wraps it in brackets, and stores it in the value property of an object. The other property of the object, the underscore method, takes a function as input, maps that over the current value and flattens it, then returns a new object of the same kind with that flattened array as the new value.

The direct proxy API lets us create a “handler” for a target object. The handler contains optional functions to call for all the different things you can do with an object: get or set properties, enumerate keys, freeze the object, and more. If the target is a function, we can also trap when it’s used as a constructor (i.e. new F()) or when it’s invoked.

In the proxy-based implementation, rather than create a wrapper object and set the value property to the target, I created a handler that intercepted only get requests for the target’s properties. If the target has the property already, it returns that; you can see in the example that the length property still works and you can look up individual elements of the array. If the target lacks the property, the handler looks up that property on the window object and does the appropriate map-and-flatten logic.

I’ve explained this in terms of the list monad, but it’s completely general. In the code below, mon is a monad object defined in the category theorist’s style, a monoid object in an endofunctor category, with multiplication and unit. On line 2, it asks for a type to specialize to. On line 9, it maps the named function over the current state, then applies the monad multiplication. On line 15, it applies the monad unit.

var kleisliProxy = function (mon) {
  return function (t) {
    var mont = mon(t);
    function M(mx) {
      return Proxy(mx, {
        get: function (target, name, receiver) {
          if (!(name in mx)) {
            if (!(name in window)) { return undefined; }
            return M(mont['*'](mon(window[name]).t(mx)));
          return mx[name];
    return function (x) { return M(mont[1](x)); };

var lift = kleisliProxy(listMon)(int32); 
lift(6).upto.upto; // === [0,0,1,0,1,2,0,1,2,3,0,1,2,3,4]

Cantor’s diagonal proof

Posted in Math by Mike Stay on 2012 August 20

There’s a riddle about a library where the index had two parts, each in its own book. The first index book listed every book whose title appeared between its own covers; nearly every book in the library was listed in this index, since the title is typically on the title page. The other index book, a mere pamphlet really, was for those rare cases where the title was not between the covers of the book. Neither index had a title page, just a list of book titles and what shelf each book was on.

In which book was the first book listed? It could have been listed in itself, which is consistent with the claim that it lists all the books that contain their own titles. Or it could have been listed in the second book and not in itself, which would also be consistent.

The riddle is this: in which index was the second index book listed? If we suppose that the second book did not list itself, then it would be incomplete, since it is a book in the library that does not contain its title between the covers. If we suppose that it did list itself, then it would be inconsistent, since it is supposed to list only those books that do not contain their own titles. There is no way for the second index book to be both consistent and complete.

Georg Cantor was a mathematician who used this idea to show that there are different sizes of infinity! But before I go there, consider the Munduruku tribe of the Amazon, which has no words for specific numbers larger than five; how would a man with eight children tell if he has enough fruit to give one to each child? He would probably name each child and set aside a piece of fruit for them—that is, he would set up an isomorphism between the children and the fruit. Even though he cannot count either set, he knows that they are the same size.

We can do the same thing to compare infinite sets: even though we can’t count them, if we can show that there is an isomorphism between two infinite sets, we know they are the same size. Likewise, if we can prove that there is no possible isomorphism between the sets, then they must be different sizes.

So now suppose that we take the set of natural numbers {0, 1, 2, 3, …} and the set of positive even numbers {0, 2, 4, 6, …}. These sets both have infinitely many elements. Are they the same size? We can double every natural number and get an even one, and halve every positive even number and get a natural number. That’s an isomorphism, so they’re the same size.

How about natural numbers and pairs of natural numbers? Given two numbers like 32768 and 137, we can pad them to the same length and then interleave them: 3020716387; they come apart just as easily.

Since we can take the first number to be the denominator of a nonnegative rational number and the second number to be the numerator, we also find that the size of the set of rationals is the same as the size of the set of natural numbers.

Cantor came up with isomorphisms for each of these sets and thought that perhaps all infinite sets were the same size. But then he tried to come up with a way to match up the natural numbers and infinite sequences of bits, like the sequence of all zeros:

    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...

or the sequence that’s all zeros except for the third one:

    0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, ...

or the sequence whose nth bit is 1 if n is prime:

    0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, ...

or sequences with no pattern at all:

    0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, ...

He tried and tried, but couldn’t figure out a way to make them match up. Then he thought of an ingenious proof that it’s impossible! It uses the same insight as the library riddle above.

Suppose we try to match them up. We write in the left column a natural number and in the right column a binary sequence:

    1: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
    2: 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, ...
    3: 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, ...
    4: 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, ...
    5: 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, ...
    6: 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, ...

Consider the diagonal of the table above: 0, 0, 1, 1, 0, 1, … This sequence is number 6 above, and is like the first index book: the sixth bit in the sequence could be either 0 or 1, and it would be consistent.

Now consider the opposite of the diagonal, its “photographic negative” sequence: 1, 1, 0, 0, 1, 0, … This sequence is like the second index book: it cannot consistently appear anywhere in the table. To see that, imagine if we had already assigned this sequence to number 7. (The number 7 is arbitrary; any other number will work just as well.)  Then the table would look like this:

    1: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
    2: 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, ...
    3: 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, ...
    4: 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, ...
    5: 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, ...
    6: 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, ...
    7: 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, ...

But look at the seventh bit of sequence 6! It’s supposed to be the seventh bit along the diagonal, and it’s wrong. If we correct it, then the 7th bit of sequence 7 is wrong, since it’s supposed to be the opposite of the 7th bit of sequence 6.

This proof shows that there are too many sequences of bits to match up to natural numbers. No matter how we try match them up, we can always find a contradiction by looking at the sequence defined by the opposite of the diagonal sequence.  And as if two different sizes of infinity weren’t enough, Cantor was able to repeat this process and show that there are infinitely many sizes of infinity, each bigger than the last!

Many influential mathematicians didn’t like this idea, and attacked Cantor personally rather than trying to find fault with his proof.  Religious philosophers misinterpreted his results and equated the idea of multiple infinities with pantheism!  Confronted with so much antagonism, Cantor often fought with depression.

He persevered, however, and later in life was greatly praised for his work. The Royal Society gave him their highest honor, the Sylvester Medal; and no less a mathematician than David Hilbert exclaimed, “No one shall expel us from the paradise that Cantor has created.”

Alan Hom, 1924-1951

Posted in Borges by Mike Stay on 2012 May 22

The Hom set was named for the Chinese-American mathematician Alan Hom, who worked with MacLane and Eilenberg and did pioneering work on profunctors.  Hom was killed in the Korean war.

Symmetric monoidal closed objects

Posted in Uncategorized by Mike Stay on 2012 April 12

I conjecture that there’s a compact closed bicategory Th(SMCC) such that the 2-category hom(Th(SMCC), Prof) of

  • sylleptic monoidal functors (of bicategories),
  • braided monoidal transformations and
  • monoidal modifications

is equivalent as a 2-category to the 2-category SMCC of

  • symmetric monoidal closed categories,
  • braided monoidal closed functors, and
  • monoidal closed natural isomorphisms.

If we model Th(SMCC) in the compact closed bicategory 3Cob2, then we get

where I didn’t draw

  • the right unitor
  • the pentagon equation for the associator
  • the triangle equations for the unitors
  • the hexagon equations for the braiding
  • the yanking for internal hom
  • a,b,c,l,r composed with their formal inverses equals the identity

but I think they’re pretty obvious given this other stuff.


Posted in Uncategorized by Mike Stay on 2011 October 31

I used this Halloween as motivation to start working out a few months ago, because I wanted to be a passable Wolverine; I made a lot of progress, but not so much that I feel comfortable posting pictures of muscles. Fortunately, my mom is a seamstress and made me a great jacket. Here’s the result:

Making of Wolverine’s claws

Posted in Fun links by Mike Stay on 2011 July 11


Posted in Borges by Mike Stay on 2011 June 2

(4/5 stars) The Euphraneid is a collection of Book of Mormon pseudepigrapha. They were ostensibly written by Moroni, the son of Mormon, during and after his work on the Nephite record. The title is Greek: euphrano means “to shout for joy”; it seems to be asserting that the name Moroni derives from the Hebrew ranan רנן, “to shout or rejoice”.

The stories are short; they contradict one another and the Book of Mormon account itself—but taken together, they make a satisfying whole. If you don’t want spoilers, stop reading here; I summarize my favorites below.

Some stories don’t seem to have a point. In one, Moroni has barely survived a plague—perhaps diphtheria—and returns to his home late in the spring. He’s told in a dream that he has to get gold to make plates, but knows he needs to plow and plant his fields. Trusting in a miracle, he goes up into the mountains to gather gold from alluvial streambeds using fleece to trap the flakes. He spends several weeks setting up traps; when he fears he can’t wait any longer to plant, he returns to find his fields plowed and planted, but also finds his thatch hut occupied by a Lamanite scouting party. That night, he creeps up to the hut, shouts a Lamanite insult and stabs a man through one of the gaps in the wall. In the confusion, the Lamanites kill each other. After burying the men, he returns to the mountain and hangs out the gold-laden fleece to dry. It’s never made clear who plowed the field; I like to think it was the Three Nephites.

There’s an amusing chapter in the Book of Mormon where Mormon repeats his own name twelve times in a single chapter, and five times in a single verse, all without mentioning himself once. Nephi, son of Helaman, writes about how his father named him after the titan-hero Nephi that came out of Jerusalem. In the Euphraneid, Moroni keeps up the tradition, playing the role of a bard singing war stories about Captain Moroni and his men.

My favorite was the retelling of the king-men plot. In this version, Amalickiah is almost vampiric: a landed lord of old blood, meting out extraordinarily harsh punishments in his jurisdiction. Both amputation and impaling were not uncommon, though any punishment could be avoided by paying a large enough fine; by paying a regular tribute, a wealthy man could be immune to the law. Amalickiah’s oath to drink Moroni’s blood was not unusual; he’d done the same to many other enemies, though where he did not have power he sought to gain it by intrigue rather than by force. Like Rasputin, he miraculously survived multiple assassination attempts. I thought it particularly fitting that Teancum played Van Helsing to Amalickiah’s Dracula, killing him with a stake through his heart.

The last story details how, after sixteen years of wandering, Moroni returns to Cumorah. He describes the lay of the land he spent so many months scouting, the remnants of the great battle he fought. He sees the bowl-shaped depression where Cumenihah’s battalion allowed themselves to be drawn out by the Lamanites using a feinted retreat. He comes to the stones of Cumorah’s fortifications and sights along them to the hidden mouth of the cave where he has stored the records; when he arrives, he finds the door open, the room empty and dark, the treasures gone. Moroni feels as though he is drowning, unable to breathe. Then a calm comes over him as he sees that this is not his Cumorah, but another; he is on a different continent, among the ruins of a different civilization that collapsed in a different great battle, their historian’s treasures plundered. His book yet lies hidden and safe, somewhere among the hills.

I said it was the last story, but here even the pseudepigrapha has dubious appendices. The first is told in third person, with no indication of who the narrator is. As it begins, Moroni is carrying the plates, struggling to reach the cave. Six men are in pursuit; they slip on scree. Moroni throws aside the vegetation masking the entrance and makes his way down long narrow passage into a room. He strides past implements, vestments, precious metals and stones, drops the plates on a table, grabs Laban’s sword from the wall. He meets the pursuers halfway down the hall; they measure swords, then attack. Moroni smites him and he falls dead; another advances and contends with him. This one also falls by his sword; a third then steps forth and meets the same fate; a fourth afterwards contends with him, but in the struggle with the fourth, Moroni, being exhausted, is killed. The remaining two tread on Moroni as they enter the cave; they slip on blood bathing the smooth rock. The cave is empty; they curse and rage. When they turn to leave, even Moroni’s sword is gone.

The final appendix is an alternate account of Joseph Smith getting plates from Moroni. It’s all wrong, but very familiar at the same time.

Joseph has recently acquired a seerstone made of polished quartz; to get some time alone, he takes a deer-trail through the woods. He reaches a widening of the trail where the canopy does not block the sunlight and he takes the stone from his pocket; it is late afternoon, and he starts a small fire by focusing the light with the lens. He’s alarmed to find that he’s unable to extinguish it and fears that he’ll start the forest on fire, but it doesn’t spread. He notices that the wood is not consumed, so immediately bends down to remove his shoes.

At this point, he’s seized from behind; despite Joseph’s fame as a grappler, his assailant was better, or at least good enough to keep Joseph from escaping. “Put out the fire!” the man commands. Joseph replies “I have tried; I cannot.” The assailant mutters some words in a language Joseph does not speak but recognizes, and he calls Moroni by name and tells him to release him. Moroni refuses, citing his fear of seers. Joseph tells him he carries the second sign and to give it to him. At that, Moroni releases him, though Joseph doesn’t turn.

“I have the first sign already,” says Joseph, and tells Moroni to check his pocket. Moroni puts his hand in to touch the stone but quickly pulls his hand back as though burned. Moroni says of the plates, “They are so heavy.”

Joseph turns and looks at Moroni. He’s about 5 feet 8 or 9 inches tall and heavy set; his face is as large and hawk-like. He is dressed in a suit of brown woolen clothes, his hair and beard white, and has on his back a sort of knapsack with something in it, shaped like a book. He is miserable and worn. Joseph summons himself and commands Moroni to give it to him immediately, or “this very moment either you or I shall die.”

At this, Moroni hands over the plates; he is ecstatic to be rid of them. As Joseph receives them, the fire goes out, and then a woman steps out of the darkness and calls Joseph by name.

It is Sallie Chase, sister to Willard, and a seer. She casts a hex on them with her green glass, then takes the plates. Mocking Joseph, she takes the seerstone from his pocket and places it into silver frames next to her green one.

At this, a column of light appears from heaven and a personage appears; walls of fire spring up on either side of the trail and force her off it. She is forced to leave the plates and stones behind. Joseph is scolded, but allowed to keep the items. Moroni fades from view and the personage ascends to heaven.

I didn’t give the book five stars because there were parts that tended to drag on a bit, but there were plenty of fun tales, too. Overall well worth the read.

Semantics for the blue calculus

Posted in Category theory, Math, Quantum by Mike Stay on 2011 April 29
Blue calculus MonCat \frac{n \mbox{Cob}_2}{} 2Hilb Set (as a one object bicategory)
Types monoidal categories manifolds 2 Hilbert spaces *
Terms with one free variable monoidal functors manifolds with boundary linear functors sets
Rewrite rules monoidal natural transformation manifolds with corners linear natural transformations functions
Tensor product \boxtimes juxtaposition (disjoint union) tensor product cartesian product
(T | T'):Y where x:X is free ([[T]] \otimes_Y [[T']]):X \to Y formal sum of cobordisms with boundary from X to Y sum of linear functors disjoint union

In the MonCat column, \boxtimes is the categorified version of the tensor product of monoids.

Axiom of fun choice

Posted in Fun links, Math by Mike Stay on 2011 April 14

A fun choice function is a function f defined on a collection J of jobs that must be done such that for every job j \in J, f(j) is an element of fun. The axiom of fun choice states,

For any set \displaystyle J of jobs that must be done, there exists a fun choice function defined on \displaystyle J.

This axiom asserts that one can always find the fun in any job that must be done; a theorem of Poppins deduces from this that all such jobs are games.

Monster manual

Posted in Uncategorized by Mike Stay on 2011 April 13

What magical creature lives deep in the earth’s crust and has a horrible lisp? A BATHOLITHK.

Better than anything 2

Posted in Uncategorized by Mike Stay on 2011 April 1

Better than This Week from Baez,
Better than Lou Kauffman’s knots,
Better than all of the string theory
Witten’s forgot,

Better than Feynman’s graffiti,
Better than Calabi-Yau,
Better than Pauli’s neutrino,
Better than mu and than tau,

Better than Curie, Poincaré,
And Niels and Albert at Solvay
Better than anything except being in love.

Better than anything 1

Posted in Poetry by Mike Stay on 2011 March 31

Better than memories wholesale,
Better than lasers on mars,
Better than alien obelisks
Chock full of stars,

Better than blade runners dreaming,
Better than spice that must flow,
Better than River and reavers,
Better than lightsaber glow,

Better than “Dammit, I’m a doc!”
Or simply “fascinating” Spock,
Better than anything except being in love.

Powers of 10

Posted in Astronomy by Mike Stay on 2011 March 10

This is an image my brother Doug and I made. A blanket is at the far left; then grass, streets, city, shoreline, clouds, the earth, orbits of the moon, Earth, Mars, Jupiter, Saturn, Neptune, nearby stars, the Milky Way, and distant galaxies. (Click for a much bigger view.)

Can you guess how we made it?

More views of the metapixel

Posted in Uncategorized by Mike Stay on 2011 March 10

Double spiral:

Single spiral the other way:

Double spiral the other way:

Escher “Print gallery” metapixel

Posted in Math by Mike Stay on 2011 March 10

This one uses a conformal transformation like Escher did in his “Print Gallery“:

A picture that contains a scaled-down copy of itself is periodic both in theta and in log (r). Taking the complex log of such an image gives a rectangular tiling of the plane; by scaling and rotating, I got a new tiling, then exponentiated it. Now going around 2 pi radians takes you diagonally across the original rectangle, so you end up one level away from where you started.

[Edit] See also these better versions.

Droste OTCA metapixel

Posted in Math by Mike Stay on 2011 March 8

John Conway and some of his friends invented a cellular automaton called “Life”. It’s Turing complete, so you can make a Life pattern that runs Life. The OTCA metapixel has all its logic around the edges of a vast field that gets filled with light-weight spaceships when the cell is on. (See here, here, and here for more details.)

This image uses a logarithmic spiral to zoom through a factor of 2048 after one turn. There’s a little bit of misalignment between the largest scale and the smallest one, but I think it still turned out pretty well (click for a larger view):

Hooped-up 24-cell

Posted in Uncategorized by Mike Stay on 2011 March 1

The 24-cell is really cool. It’s the only self-dual regular polytope that’s not a simplex or a polygon. Its vertices form the multiplicative group of units in the Hurwitz quaternion ring. It tiles Euclidean 4-space. Spheres inscribed in the octahedra give the closest packing of spheres in 4 dimensions. And, well, there are 24 of them.

We ran out of tape while making the inner cuboctahedron–one of the triangular sides is missing. But we came darn close! Compare:

Astounding protein folding paper

Posted in Chemistry, General physics by Mike Stay on 2011 February 22

By assuming that to get from point A to point B you don’t have to hit a sequence of points in between—i.e. the way quantum particles work—these guys accurately predict protein folding rates in 15 different real proteins. This is huge.

The Best is Lost

Posted in Poetry by Mike Stay on 2011 February 19

I am not resigned to the shutting away of loving hearts in the hard ground
So it is, and so it will be, for so it has been, time out of mind:
Into the darkness they go, the wise and the lovely. Crowned
With lilies and laurel they go: but I am not resigned.

Lovers and thinkers, into the earth with you.
Be one with the dull, the indiscriminate dust.
A fragment of what you felt, of what you knew,
A formula, a phrase remains – but the best is lost.

The answers quick and keen, the honest look, the laughter, the love –
They are gone. They have gone to feed the roses. Elegant and curled
Is the blossom. Fragrant is the blossom. I know. But I do not approve.
More precious was the light in your eyes than all the roses in the world.

Down, down, down into the darkness of the grave
Gently they go, the beautiful, the tender, the kind:
Quietly they go, the intelligent, the witty, the brave.
I know. But I do not approve. And I am not resigned.

– Edna St Vincent Millay

Yudkowski interview

Posted in Uncategorized by Mike Stay on 2011 February 16

In case it wasn’t already abundantly clear…

Posted in Math by Mike Stay on 2011 February 15

You know a guy’s a nerd if he pulls over to take a picture of his odometer because it’s showing the Fibonacci sequence.

But it takes a special kind of nerd to have reset his trip odometer 132.1 miles ahead of time in anticipation of taking the picture.

Queensland flooding

Posted in Uncategorized by Mike Stay on 2011 February 9

Polarization for breakfast

Posted in Uncategorized by Mike Stay on 2011 February 5

Legos this morning

Posted in Uncategorized by Mike Stay on 2011 February 5

This one ended up looking rather shark-like:

William chose–who else?–Yoda to pilot the green ship.


Posted in Evolution, Fun links by Mike Stay on 2011 January 31

Australia’s massive forest fires in 2006 were followed by 10cm of rain, which washed all the nutrient-rich ash into the lakes, which caused a bioluminescent algae bloom in 2008.

Regular tilings of three-dimensional spaces

Posted in Astronomy, Fun links, General physics, Math by Mike Stay on 2011 January 31

If you start at the north pole and make an equilateral triangle 6000 miles on a side, the bottom will lie on the equator, each of the angles will be 90 degrees, and only four of them will fit around the pole.

In a similar way, large enough tetrahedra would tile the surface of a hypersphere. This paper identifies the eleven regular tilings of three-dimensional spaces and whether they’re spherical, Euclidean, or hyperbolic tilings, and then looks at the geometry of spacetime to see how it might be tiled.

The “cubic” tilings (where eight polyhedra meet around a vertex like cubes do in Euclidean space) are amenable to taking cross-sections; this tiling of hyperbolic space with dodecahedra

has a cross section with a tiling of the hyperbolic plane with pentagons:

Willie’s fugue

Posted in Uncategorized by Mike Stay on 2011 January 23

Toccata in D minor

Posted in Uncategorized by Mike Stay on 2011 January 22

This is what about ten minutes every other day or so for a year gets you.


Posted in Uncategorized by Mike Stay on 2011 January 18

Apollo. adj 1. brave, without fear (from L. ab- not + pollo chicken).

Silicon carbide

Posted in Uncategorized by Mike Stay on 2011 January 17

I took this photo of my chunk of silicon carbide.

With some time I could probably find a crystal pure enough to emit light.

Doodling like Vi

Posted in Uncategorized by Mike Stay on 2011 January 15


Say you’re me and you just watched Vi Hart’s video on infinity elephants and you totally missed the joke about Mr. Tusks even though you read Dinosaur Comics all the time but you liked the bit about Apollonian gaskets, which don’t blow out in Battle Mountain like the one in your car did but rather on the way to the L1 point and then you need Richard Feynman to tell you why. You thought she was going to draw the tiniest camels going through the eye of a needle, but you suppose that would ruin the hyperbole in the parable, so the ellipsis was justified. Anyway, you decide to avoid circular reasoning and doodle rectangles instead, filling them up with squares.

Eventually you wonder which ones you can fill up with finitely many squares and which ones you need infinitely many for, and so you start with squares and build up some rectangles. One square can only make one rectangle, itself. Two squares can only make one rectangle, but it can be lying down or standing up so you decide to say they’re different. Three squares make four rectangles and four squares make eight rectangles, and then you start thinking about Vi Hart’s video on binary trees. So you put the numbers into a tree, but it looks kind of stern, so you add some fareys to cheer it up. Then you see that the height of a rectangle is just the sum of its neighbors’ heights, and similarly for the width. You see lots of nice patterns in the dimensions involving flipping things over or running them backwards, kind of like the Blues Brothers’ police car when it was being chased by the Neo Nazis or that V6 racecar that Johann sent to Frederick.

Now instead of breadth, you decide to go for depth. Making the rectangles very long or very tall is too boring, so you add one square each time, alternately making it longer and taller. 1 1 2 3 5 8 13 21… You get the Fibonacci numbers; the limiting ratio is the golden ratio, [1+sqrt(5)]/2 to 1. This rectangle is the worst at being approximated by repeated squares so it shows up in systems where repetition is bad, like the angle at which plant leaves grow so they overlap the least and gather the most sunlight or how sunflowers pack the most seeds into a flowerhead and Roger Penrose thinks the Fibonacci spirals in the microtubules in the neurons in your brain are doing quantum error correction.

You decide to look at other irrational numbers to see if they have any nice patterns. e does. pi doesn’t. square roots of positive integers give repeating palindromes! You wonder whether all palindromes occur and if not which of the lyrics to Weird Al’s song Bob are special that way. And maybe then you make up a palindrome with vi hart’s name in it and turn it into a square root.


Maybe you decide that you want to doodle some circles after all, so you start with this gasket and figure out where the circles touch the line. The numbers look very familiar. You wonder what the areas of the circles are and how the gasket relates to the modular group and Poincare’s half-plane model of the hyperbolic plane and wish you had time to just sit in math class and doodle…

Buckyball Bacteriophage

Posted in Design, Evolution, Math by Mike Stay on 2011 January 11

The t4 bacteriophage infects e. coli.

Dodecahedral Buckyballs

Posted in Design, Math by Mike Stay on 2011 January 6

Hyperbolic Buckyballs

Posted in Design, Math by Mike Stay on 2011 January 6

Icosahedral Buckyballs

Posted in Design, Math by Mike Stay on 2011 January 6

Math doodles

Posted in Uncategorized by Mike Stay on 2010 December 30

Vi Hart‘s got some great stuff.

The Word of God

Posted in Astronomy, Chemistry, Evolution, General physics, History, Poetry, Theocosmology by Mike Stay on 2010 November 3

From desert cliff and mountaintop we trace the wide design,
Strike-slip fault and overthrust and syn and anticline…
We gaze upon creation where erosion makes it known,
And count the countless aeons in the banding of the stone.
Odd, long-vanished creatures and their tracks & shells are found;
Where truth has left its sketches on the slate below the ground.
The patient stone can speak, if we but listen when it talks.
Humans wrote the Bible; God wrote the rocks.

There are those who name the stars, who watch the sky by night,
Seeking out the darkest place, to better see the light.
Long ago, when torture broke the remnant of his will,
Galileo recanted, but the Earth is moving still.
High above the mountaintops, where only distance bars,
The truth has left its footprints in the dust between the stars.
We may watch and study or may shudder and deny,
Humans wrote the Bible; God wrote the sky.

By stem and root and branch we trace, by feather, fang and fur,
How the living things that are descend from things that were.
The moss, the kelp, the zebrafish, the very mice and flies,
These tiny, humble, wordless things–how shall they tell us lies?
We are kin to beasts; no other answer can we bring.
The truth has left its fingerprints on every living thing.
Remember, should you have to choose between them in the strife,
Humans wrote the Bible; God wrote life.

And we who listen to the stars, or walk the dusty grade,
Or break the very atoms down to see how they are made,
Or study cells, or living things, seek truth with open hand.
The profoundest act of worship is to try to understand.
Deep in flower and in flesh, in star and soil and seed,
The truth has left its living word for anyone to read.
So turn and look where best you think the story is unfurled.
Humans wrote the Bible; God wrote the world.

-Catherine Faber, The Word of God

The Sum of Forking Paths

Posted in Borges, Perception, Quantum by Mike Stay on 2010 September 7

Paracelsus threw the rose into the fire; it bent on impact, recoiled, and fell deeper among the logs. In the gold-orange light, the stem was already black. The husk began to shrivel and split as the sap boiled away on its surface. The petals blistered and blackened and fell. The five-fingered star beneath them danced subtly, swaying in the brittle heat. For nearly an hour it lay visibly unchanged save for a gradual loss of hue and a universal greyness, then fell into three large pieces as the log crumbled around it. The ashes glowed orange, then gradually dimmed; the last visible flash of light burst outward from the remains of the stem. Like all light, it carried within it a timepiece.

Once, when the clock read noon, it traveled without hesitation in a straight path to my retina. Once it took another course, only to bend around a molecule of nitrogen and reach the same destination.

Once it traced the signature of a man no one remembers.

Once, at half past three, it decayed into a tiny spark and its abominable opposite image; their mutual horrified fascination drew them together, each a moth in the other’s flame. The last visible flash of light from their fiery consummation was indistinguishable from the one that spawned them.

Once, when the clock ticked in the silence just before dawn, the light decayed into two mirrored worlds, somewhat better than ours due to the fact that I was never born there. Both worlds were consumed by mirrored dragons before collapsing back into the chaos from which they arose; all that remained was an orange flash of light.

Once it traveled to a far distant galaxy, reflected off the waters of a billion worlds and witnessed the death of a thousand stars before returning to the small room in which we sat.

Once it transcribed a short story of Borges in his own cramped scrawl, the six versions he discarded, the corrupt Russian translation by Nabokov, and a version in which the second-to-last word was illegible.

Once it traveled every path, each in its time; once it became and destroyed every possible world. All these summed to form what was: I saw an orange flash, and in that moment, I was enlightened.

Formal axiomatic divination

Posted in Borges, Math, Programming, Time by Mike Stay on 2010 September 1

It’s well known that “in Xanadu did Kublai Khan a stately pleasure dome decree,” but his true legacy is the field of formal axiomatic divination. In 1279, Khan sought an auspicious date on which to begin construction of the palace. He consulted each of his twelve astrologers separately and without warning; unsurprisingly, he received twelve different answers. Khan flew into a rage and said that until the astrologer’s craft was as precise as that of his masons and carpenters, they were banished from his presence.

Kublai Khan died in 1294 and his successor Temur Khan was convinced to reinstate the astrologers. Despite this, the young mathematician Zhu Shijie took up the old Khan’s challenge in 1305. Zhu had already completed two enormously influential mathematical texts: Introduction to Mathematical Studies, published in 1299, and True reflections of the four unknowns, published in 1303. This latter work included a table of “the ancient method of powers”, now known as Pascal’s triangle, and Zhu used it extensively in his analysis of polynomials in up to four unknowns.

In turning to the analysis of divination, Zhu naturally focused his attention on the I Ching. The first step in performing an I Ching divination is casting coins or yarrow stalks to construct a series of hexagrams. In 1308, Zhu published his treatise on probability theory, Path of the falling stone. It included an analysis of the probability for generating each hexagram as well as betting strategies for several popular games of chance. Using his techniques, Zhu became quite wealthy and began to travel; it was during this period that he was exposed to the work of the mathematicians in northern China. In the preface to True reflections, Mo Ruo writes that “Zhu Shijie of Yan-shan became famous as a mathematician. He travelled widely for more than twenty years and the number of those who came to be taught by him increased each day.”

Zhu worked for nearly a decade on the subsequent problem, that of interpreting a series of hexagrams. Hexagrams themselves are generated one bit at a time by looking at remainders modulo four of random handfuls of yarrow stalks; the four outcomes either specify the next bit directly or in terms of the previous bit. These latter rules give I Ching its subtitle, The Book of Changes. For mystical reasons, Zhu asserted that the proper interpretation of a series of hexagrams should also be given by a set of changes, but for years he could find no reason to prefer one set of changes to any other. However, in 1316, Zhu wrote to Yang Hui:

“I dreamed that I was summoned to the royal palace. As I stepped upon the threshold, the sun burst forth over the gilded tile; I was blinded and, overcome, I fell to my knees. I lifted my hand to shield my eyes from its brilliance, and the Emperor himself took it and raised me up. To my surprise, he changed his form as I watched; he became so much like me that I thought I was looking in a mirror.

“‘How can this be?’ I cried. He laughed and took the form of a phoenix; I fell back from the flames as he ascended to heaven, then sorrowed as he dove toward the Golden Water River, for the water would surely quench the bird. Yet before reaching the water, he took the form of an eel, dove into the river and swam to the bank; he wriggled ashore, then took the form of a seed, which sank into the earth and grew into a mighty tree. Finally he took his own form again and spoke to me: ‘I rule all things; things above the earth and in the earth and under the earth, land and sea and sky. I can rule all these because I rule myself.’

“I woke and wondered at the singularity of the vision; when my mind reeled in amazement and could stand no more, it retreated to the familiar problem of the tables of changes. It suddenly occurred to me that as the Emperor could take any form, there could be a table of changes that could take the form of any other. Once I had conceived the idea, the implementation was straightforward.”

The rest of the letter has been lost, but Yang Hui described the broad form of the changes in a letter to a friend; the Imperial Changes were a set of changes that we now recognize as a Turing-complete programming language, nearly seven hundred years before Turing. It was a type of register machine similar to Melzak’s model, where seeds were ‘planted’ in pits; the lists of hexagrams generated by the yarrow straws were the programs, and the result of the computation was taken as the interpretation of the casting. Zhu recognized that some programs never stopped–some went into infinite loops, some grew without bound, and some behaved so erratically he couldn’t decide whether they would ever give an interpretation.

Given his fascination with probabilities, it was natural that Zhu would consider the probability that a string of hexagrams had an interpretation. We do not have Zhu’s reasoning, only an excerpt from his conclusion: “The probability that a list of hexagrams has an interpretation is a secret beyond the power of fate to reveal.” It may be that Zhu anticipated Chaitin’s proof of the algorithmic randomness of this probability as well.

All of Zhu’s works were lost soon after they were published; True reflections survived in a corrupted form through Korean (1433 AD) and Japanese (1658 AD) translations and was reintroduced to China only in the nineteenth century. One wonders what the world might have been like had the Imperial Changes been understood and exploited. We suppose it is a secret beyond the power of fate to reveal.

Imaginary Time 2

Posted in Chemistry, General physics, Math by Mike Stay on 2010 July 26

Another part of the analogy I started here, but this time using inverse temperature instead of imaginary time. It describes a thermometer where a mass changes position with temperature. I’m guessing this stuff only applies when the temperature is changing adiabatically.

Thermometer (unitless temperature):
\displaystyle \beta [1] inverse temperature (unitless)
\displaystyle y(\beta) [m] y coordinate
\displaystyle r [kg/s^2 K] spring constant * temp unit conversion
\displaystyle v(\beta) = \frac{dy(\beta)}{d\beta} [m] how position changes with (inverse) temperature
\displaystyle F(\beta) = r \; v(\beta) [kg m/s^2 K] force per Kelvin
\displaystyle T(\beta) = \frac{r}{2}v(\beta)^2 [kg m^2/s^2 K] stretching energy per Kelvin
\displaystyle V(\beta) [kg m^2/s^2 K] potential energy per Kelvin
\displaystyle S = \int (T + V)(\beta) \; d\beta

[kg m^2/s^2 K] entropy
\displaystyle \beta [1/K] inverse temperature
\displaystyle y(\beta) [m] y coordinate
\displaystyle r [kg/s^2 K^2 = bits/m^2 K] how information density changes with temp
\displaystyle v(\beta) = \frac{dy(\beta)}{d\beta} [m K] how position changes with (inverse) temperature
\displaystyle F(\beta) = r \; v(\beta) [kg m/s^2 K = bits/m] force per Kelvin
\displaystyle T(\beta) = \frac{r}{2}v(\beta)^2 [kg m^2/s^2 = bits K] stretching energy = change in stretching information with invtemp
\displaystyle V(\beta) [kg m^2/s^2 = bits K] potential energy = change in potential information with invtemp
\displaystyle S = \int (T + V)(\beta) \; d\beta

[bits] entropy

I assume that the dynamics of such a system would follow a path where \displaystyle \delta S=0; is that a minimum-entropy path or a maximum?

An analogy

Posted in Math, Quantum by Mike Stay on 2010 July 24
Stat mech Quant mech
column vector distribution
\displaystyle |p\rangle = \sum_j p_j|j\rangle
amplitude distribution (wave function)
\displaystyle |\psi\rangle = \sum_j \psi_j |j\rangle
row vector population
\displaystyle \langle p| = \sum_j p^*_j \langle j|,
where p_j^* are the coefficients that, when normalized, maximize the inner product with |p\rangle
\displaystyle \langle \psi| = \sum_j \psi_j^* \langle j|,
where \psi_j^* is the complex conjugate of \psi_j.
normalization \displaystyle \sum_j p_j = 1 \displaystyle \sum_j |\psi_j|^2 = 1
transitions stochastic unitary
harmonic oscillator many HOs at temperature T one QHO evolving for time t
uniform distribution over n states \displaystyle |U\rangle = \frac{1}{n}\sum_j |j\rangle \displaystyle |U\rangle = \frac{1}{\sqrt{n}}\sum_j |j\rangle
special distribution Gibbs distribution \displaystyle p(T) = \sum_j e^{-E_j / k_B T} |j\rangle Free evolution \displaystyle \psi(t) = H|U\rangle = \sum_j e^{-E_j \; it/\hbar} |j\rangle
partition function = inner product with \langle U| \displaystyle Z(T) = \langle U|p(T)\rangle = \frac{1}{n}\sum_j e^{-E_j / k_B T} \displaystyle Z(t) = \langle U|\psi(t)\rangle = \frac{1}{\sqrt{n}}\sum_j e^{-E_j \; it/\hbar}
\displaystyle \langle Q\rangle \displaystyle = \frac{1}{Z(T)}\langle U| \; |Qp(T)\rangle \displaystyle = \frac{1}{Z(t)}\langle U| \; |Q\psi(t)\rangle
path integrals E/T = maximum entropy? Et = least action?


Posted in Uncategorized by Mike Stay on 2010 July 19

Scimitry: the property that something remains the same after part has been cut off.  The Serpinski triangle is scimitric, since you can cut off the bottom two triangles and be left with something isomorphic to the whole.

Entropic gravity

Posted in Astronomy, General physics, Math by Mike Stay on 2010 July 19

Erik Verlinde has been in the news recently for revisiting Ted Jacobson’s suggestion that gravity is an entropic force rather than a fundamental one. The core of the argument is as follows:

Say we have two boxes, one inside the other:

|               |
| +----------+  |
| |          |  |
| |          |  |
| |          |  |
| +----------+  |

Say the inner box has room for ten bits on its surface and the outer one room for twenty. Each box can use as many “1”s as there are particles inside it:

|      X        |
| +----------+  |
| |          |  |
| |  X       |  |
| |          |  |
| +----------+  |

In this case, the inner box has only one particle inside, so there are 10 choose 1 = 10 ways to choose a labeling of the inner box; the outer box has two particles inside, so there are 20 choose 2 = 190 ways. Thus there are 1900 ways to label the system in all.

If both particles are in the inner box, though, the number of ways increases:

|               |
| +----------+  |
| |          |  |
| |  X  X    |  |
| |          |  |
| +----------+  |

The inner box now has 10 choose 2 ways = 45, while the outer box still has 190. So using the standard assumption that all labelings are equally likely, it’s 4.5 times as likely to find both particles in the inner box, and we get an entropic force drawing them together.

The best explanation of Verlinde’s paper I’ve seen is Sabine Hossenfelder’s Comments on and Comments on Comments on Verlinde’s paper “On the Origin of Gravity and the Laws of Newton”.

A first attempt at re-winding Escher’s “Ascending and Descending”

Posted in Uncategorized by Mike Stay on 2010 May 19

And he dreamed, and behold a ladder set up on the earth, and the top of it reached to heaven: and behold the angels of God ascending and descending on it.

Edit (May 20):

Even though it’s not a conformal transformation, this version looks better in a lot of ways.

Rather than cramming the whole picture into a single window frame, it presumes there’s a concentric set of these castles, each half as small as the previous, and built within its open internal patio. Doing it really well would involve extending the walls out to the edge of the outer wall that obscures them.


Get every new post delivered to your Inbox.

Join 27 other followers