Reflective calculi

Posted in Category theory, Programming by Mike Stay on 2013 May 15

This is mostly for my own benefit.

We start with some term-generating functor T(G, X) parametric in a set of ground terms G and a set of names X.

Now specialize to a single ground term:

T_2(X) = T(1, X)

Now mod out by structural equivalence:

T_3(X) = T_2(X) / \equiv

Let N be a set of names and let M be the set of name-equivalence classes N / \equiv_N (where \equiv_N is yet to be defined—it’s part of the theory of names in which we’re still parametric).

Prequoting and predereference are an algebra and coalgebra of T_3, respectively:

PQ: T_3(M) \to M

PD: M \to T_3(M)

such that

PD \circ PQ: T_3(M) \to T_3(M) = 1_{T_3(M)}.

Real quoting and dereference have to use T_4(X), defined below.

Define N = \mu X.T_2(X). Does N / \equiv = \mu X.T_3(X)? I think so; assuming it does, define

M = \mu X.T_3(X)

so name equivalence is structural equivalence; equivalence of prequoted predereferences is automatic by the definition above.

The fixed point gives us an isomorphism

F: T_3(M) \to M.

We can define PQ = F, PD = F^{-1}, because undecidability doesn’t come into play until we add operational semantics. It’s decidable whether two terms are structurally equivalent. Thus

PD \circ PQ:T_3(M) \to T_3(M)

is the identity, satisfying the condition, and

PQ \circ PD: M \to M

is the identity, which we get for free.

When we mod out by operational semantics (following the traditional approach rather than the 2-categorical one needed for pi calculus):

T_4(X) = T_3(X) / \beta,

we have the quotient map

q_X: T_3(X) \to T_4(X)

and a map

r_X: T_4(X) \to T_3(X)

that picks a representative from the equivalence class.

It’s undecidable whether two terms are in the same operational equivalence class, so q_X may not halt. However, it’s true by construction that

q_X \circ r_X: T_4(X) \to T_4(X)

is the identity.

We can extend prequoting and predereference to quoting and dereference on T_4 by

Q: T_4(M) \stackrel{r_M}{\to} T_3(M) \stackrel{PQ}{\to} M

D: M \stackrel{PD}{\to} T_3(M) \stackrel{q_M}{\to} T_4(M)

and then

D \circ Q

= T_4(M) \stackrel{r_M}{\to} T_3(M) \stackrel{PQ}{\to} M \stackrel{PD}{\to} T_3(M) \stackrel{q_M}{\to} T_4(M)

= T_4(M) \stackrel{r_M}{\to} T_3(M) \stackrel{q_M}{\to} T_4(M)

= T_4(M) \stackrel{id}{\to} T_4(M)

which is what we want for quoting and dereference. The other way around involves undecidability.

Monads without category theory, redux

Posted in Category theory, Programming by Mike Stay on 2012 September 26

I intended the last version of this post to be a simple one-off, but people cared too much. A fire-breathing Haskellite said that jQuery must never be used as an example of a monad because it’s (gasp) not even functional! And someone else didn’t understand JavaScript well enough to read my code snippets. So here’s another attempt that acknowledges the cultural divide between functional and imperative programmers and tries to explain everything in English as well as in source code.

Monads are a design pattern that is most useful to functional programmers because their languages prefer to implement features as libraries rather than as syntax. In Haskell there are monads for input / output, side-effects and exceptions, with special syntax for using these to do imperative-style programming. Imperative programmers look at that and laugh: “Why go through all that effort? Just use an imperative language to start with.” Functional programmers also tend to write programs as—surprise!—applying the composite of a bunch of functions to some data. The basic operation for a monad is really just composition in a different guise, so functional programmers find this comforting. Functional programmers are also more likely to use “continuations”, which are something like extra-powerful exceptions that only work well when there is no global state; there’s a monad for making them easier to work with.

There are, however, some uses for monads that even imperative programmers find useful. Collections like sets and lists (with or without parentheses), parsers, promises, and membranes are a few examples, which I’ll explain below.


Many collection types support the following three operations:

  1. Map a function over the elements of the collection.
  2. Flatten a collection of collections into a collection.
  3. Create a collection from a single element.

A monad is a class that provides a generalized version of these three operations. When I say “generalized”, I mean that they satisfy some of the same rules that the collections’ operations satisfy in much the same way that multiplication of real numbers is associative and commutative just like addition of real numbers.

The way monads are usually used is by mapping a function and then flattening. If we have a function f that takes an element and returns a list, then we can say and get a new list.


A parser is an object with a list of tokens that have already been parsed and the remainder of the object (usually a string) to be parsed.

var Parser = function (obj, tokens) {
  this.obj = obj;
  // If tokens are not provided, use the empty list.
  this.tokens = tokens || [];

It has three operations like the collections above.

  1. Mapping a function over a parser applies the function to the contained obj. = function (f) {
      return new Parser(f(this.obj), this.tokens);
  2. Flattening a parser of parsers concatenates the list of tokens.
    Parser.prototype.flatten = function () {
      return new Parser(this.obj.obj, this.obj.tokens.concat(this.tokens));

    The definition above means that

    new Parser(new Parser(x, tokens1), tokens2).flatten()
    is equivalent to
    new Parser(x, tokens1.concat(tokens2)).

  3. We can create a parser from an element x: new Parser(x).

If we have a function f that takes a string, either parses out some tokens or throws an exception, and returns a parser with the tokens and the remainder of the string, then we can say

and get a new parser. In what follows, I create a parser with the string “Hi there” and then expect a word, then some whitespace, then another word.

var makeMatcher = function (re) {
  return function (s) {
    var m = s.match(re);
    if (!m) { throw new Error('Expected to match ' + re); }
    return new Parser(m[2], [m[1]]);

var alnum = makeMatcher(/^([a-zA-Z0-9]+)(.*)/);
var whitespace = makeMatcher(/^(s+)(.*)/);

new Parser('Hi there')
// is equivalent to new Parser('', ['Hi', ' ', 'there']);


A promise is an object that represents the result of a computation that hasn’t finished yet; for example, if you send off a request over the network for a webpage, the promise would represent the text of the page. When the network transaction completes, the promise “resolves” and code that was waiting on the result gets executed.

  1. Mapping a function f over a promise for x results in a promise for f(x).
  2. When a promise represents remote data, a promise for a promise is still just remote data, so the two layers can be combined; see promise pipelining.
  3. We can create a resolved promise for any object that we already have.

If we have a function f that takes a value and returns a promise, then we can say

and get a new promise. By stringing together actions like this, we can set up a computation that will execute properly as various network actions complete.


An object-capability language is an object-oriented programming language where you can’t get a reference to an object unless you create it or someone calls one of your methods and passes a reference to it. A “membrane” is a design pattern that implements access control.

Say you have a folder object with a bunch of file objects. You want to grant someone temporary access to the folder; if you give them a reference to the folder directly, you can’t force them to forget it, so that won’t work for revokable access. Instead, suppose you create a “proxy” object with a switch that only you control; if the switch is on, the object forwards all of its method calls to the folder and returns the results. If it’s off, it does nothing. Now you can give the person the object and turn it off when their time is up.

The problem with this is that the folder object may return a direct reference to the file objects it contains; the person could lose access to the folder but could retain access to some of the files in it. They would not be able to have access to any new files placed in the folder, but would see updates to the files they retained access to. If that is not your intent, then the proxy object should hide any file references it returns behind similar new proxy objects and wire all the switches together. That way, when you turn off the switch for the folder, all the switches turn off for the files as well.

This design pattern of wrapping object references that come out of a proxy in their own proxies is a membrane.

  1. We can map a function f over a membrane for x and get a membrane for f(x).
  2. A membrane for a membrane for x can be collapsed into a single membrane that checks both switches.
  3. Given any object, we can wrap it in a membrane.

If we have a function f that takes a value and returns a membrane, then we can say

and get a new membrane. By stringing together actions like this, we can set up arbitrary reference graphs, while still preserving the membrane creator’s right to turn off access to his objects.


Monads implement the abstract operations map and flatten, and have an operation for creating a new monad out of any object. If you start with an instance m of a monad and you have a function f that takes an object and returns a monad, then you can say;

and get a new instance of a monad. You’ll often find scenarios where you repeat that process over and over.

Overloading JavaScript’s dot operator with direct proxies

Posted in Category theory, Math, Programming by Mike Stay on 2012 August 28

With the new ECMAScript 6 Proxy object that Firefox has implemented, you can make dot do pretty much anything you want. I made the dot operator in JavaScript behave like Haskell’s bind:

// I'll give a link to the code for lift() later,
// but one thing it does is wrap its input in brackets.
lift(6); // [6]
lift(6)[0]; // 6
lift(6).length; // 1

// lift(6) has no "upto" property
lift(6).upto; // undefined

// But when I define this global function, ...

// Takes an int n, returns an array of ints [0, ..., n-1].
var upto = function (x) {
  var r = [], i;
  for (i = 0; i < x; ++i) {
  return r;

// ... now the object lift(6) suddenly has this property
lift(6).upto; // [0,1,2,3,4,5]
// and it automagically maps and flattens!
lift(6).upto.upto; // [0,0,1,0,1,2,0,1,2,3,0,1,2,3,4]
lift(6).upto.upto.length; // 15

To be clear, ECMAScript 6 has changed the API for Proxy since Firefox adopted it, but you can implement the new one on top of the old one. Tom van Cutsem has code for that.

I figured this out while working on a contracts library for JavaScript. Using the standard monadic style (e.g. jQuery), I wrote an implementation that doesn’t use proxies; it looked like this:

lift(6)._(upto)._(upto).value; // [0,0,1,0,1,2,0,1,2,3,0,1,2,3,4]

The lift function takes an input, wraps it in brackets, and stores it in the value property of an object. The other property of the object, the underscore method, takes a function as input, maps that over the current value and flattens it, then returns a new object of the same kind with that flattened array as the new value.

The direct proxy API lets us create a “handler” for a target object. The handler contains optional functions to call for all the different things you can do with an object: get or set properties, enumerate keys, freeze the object, and more. If the target is a function, we can also trap when it’s used as a constructor (i.e. new F()) or when it’s invoked.

In the proxy-based implementation, rather than create a wrapper object and set the value property to the target, I created a handler that intercepted only get requests for the target’s properties. If the target has the property already, it returns that; you can see in the example that the length property still works and you can look up individual elements of the array. If the target lacks the property, the handler looks up that property on the window object and does the appropriate map-and-flatten logic.

I’ve explained this in terms of the list monad, but it’s completely general. In the code below, mon is a monad object defined in the category theorist’s style, a monoid object in an endofunctor category, with multiplication and unit. On line 2, it asks for a type to specialize to. On line 9, it maps the named function over the current state, then applies the monad multiplication. On line 15, it applies the monad unit.

var kleisliProxy = function (mon) {
  return function (t) {
    var mont = mon(t);
    function M(mx) {
      return Proxy(mx, {
        get: function (target, name, receiver) {
          if (!(name in mx)) {
            if (!(name in window)) { return undefined; }
            return M(mont['*'](mon(window[name]).t(mx)));
          return mx[name];
    return function (x) { return M(mont[1](x)); };

var lift = kleisliProxy(listMon)(int32);
lift(6).upto.upto; // === [0,0,1,0,1,2,0,1,2,3,0,1,2,3,4]

Semantics for the blue calculus

Posted in Category theory, Math, Quantum by Mike Stay on 2011 April 29
Blue calculus MonCat \frac{n \mbox{Cob}_2}{} 2Hilb Set (as a one object bicategory)
Types monoidal categories manifolds 2 Hilbert spaces *
Terms with one free variable monoidal functors manifolds with boundary linear functors sets
Rewrite rules monoidal natural transformation manifolds with corners linear natural transformations functions
Tensor product \boxtimes juxtaposition (disjoint union) tensor product cartesian product
(T | T'):Y where x:X is free ([[T]] \otimes_Y [[T']]):X \to Y formal sum of cobordisms with boundary from X to Y sum of linear functors disjoint union

In the MonCat column, \boxtimes is the categorified version of the tensor product of monoids.


Posted in Category theory, Math, Quantum by Mike Stay on 2010 April 11

Coends are a categorified version of “summing over repeated indices”. We do that when we’re computing the trace of a matrix and when we’re multiplying two matrices. It’s categorified because we’re summing over a bunch of sets instead of a bunch of numbers.

Let C be a small category. The functor \mbox{hom}:C^{\mbox{op}} \times C \to Set assigns

  • to each pair of objects the set of morphisms between them, and
  • to each pair of morphisms (f:c \to c', h:d\to d') a function that takes a morphism g \in \mbox{hom}(c', d) and returns the composite morphism h \circ g \circ f \in \mbox{hom}(c, d'), where c, c', d, d' \in \mbox{Ob}(C).

It turns out that given any functor S:C^{\mbox{op}} \times D \to \mbox{Set}, we can make a new category where C and D are subcategories and S is actually the hom functor; some keywords for more information on this are “collages” and “Artin glueing”. So we can also think of S as assigning

  • to each pair of objects a set of morphisms between them, and
  • to each pair of morphisms (f:c \to c', h:d\to d') a function that takes a morphism g \in S(c', d) and returns the composite morphism h \circ g \circ f \in S(c, d'), where c,c' \in \mbox{Ob}(C) and d,d' \in \mbox{Ob}(D).

We can think of these functors as adjacency matrices, where the two parameters are the row and column, except that instead of counting the number of paths, we’re taking the set of paths. So S is kind of like a matrix whose elements are sets, and we want to do something like sum the diagonals.

The coend of S is the coequalizer of the diagram

\begin{array}{c}\displaystyle \coprod_{f:c \to c'} S(c', c) \\ \\ \displaystyle S(f, c) \downarrow \quad \quad \downarrow S(c', f) \\ \\ \displaystyle \coprod_c S(c, c) \end{array}

The top set consists of all the pairs where

  • the first element is a morphism f \in \mbox{hom}(c, c') and
  • the second element is a morphism g \in S(c', c).

The bottom set is the set of all the endomorphisms in S.

The coequalizer of the diagram, the coend of S, is the bottom set modulo a relation. Starting at the top with a pair (f, g), the two arrows give the relation

\displaystyle c \stackrel{f}{\to} c' \stackrel{g}{\multimap} c \stackrel{c}{\to} c \quad \sim \quad c' \stackrel{c'}{\to} c' \stackrel{g}{\multimap} c \stackrel{f}{\to} c',

where I’m using the lollipop to mean a morphism from S.

So this says take all the endomorphisms that can be chopped up into a morphism f from \mbox{hom} going one way and a g from S going the other, and then set fg \sim gf. For this to make any sense, it has to identify any two objects related by such a pair. So it’s summing over all the endomorphisms of these equivalence classes.

To get the trace of the hom functor, use S = \mbox{hom} in the analysis above and replace the lollipop with a real arrow. If that category is just a group, this is the set of conjugacy classes. If that category is a preorder, then we’re computing the set of isomorphism classes.

The coend is also used when “multiplying matrices”. Let S(c', c) = T(b, c) \times U(c', d). Then the top set consists of triples (f: c\to c',\quad g:b \multimap c,\quad h:c' \multimap d), the bottom set of pairs (g:b \multimap c, \quad h:c \multimap d), and the coend is the bottom set modulo

(\displaystyle b \stackrel{g}{\multimap} c \stackrel{c}{\to} c, \quad c \stackrel{f}{\to} c' \stackrel{h}{\multimap} d) \quad \sim \quad  (\displaystyle b \stackrel{g}{\multimap} c \stackrel{f}{\to} c', \quad c' \stackrel{c'}{\to} c' \stackrel{h}{\multimap} d)

That is, it doesn’t matter if you think of f as connected to g or to h; the connection is associative, so you can go all the way from b to d.

Notice here how a morphism can turn “inside out”: when f and the identities surround a morphism in S, it’s the same as being surrounded by morphisms in T and U; this is the difference between a trace, where we’re repeating indices on the same matrix, and matrix multiplication, where we’re repeating the column of the first matrix in the row of the second matrix.

Theories and models

Posted in Category theory, Math, Programming by Mike Stay on 2010 March 4

The simplest kind of theory is just a set T, thought of as a set of concepts or Platonic ideals. We typically have some other set S, thought of as the set of real things that are described by concepts. Then a model is a function f:T \to S. For example, we could let T = {0, 1}; this is the theory of the number “two”, since it has two elements. Whatever set we choose for S, the models of the theory are going to be pairs of elements of S. So if S = the set of people, models of T in S are going to be pairs of people (where choosing the same person twice is allowed).

Concepts, however, are usually related to each other, whereas in a set, you can only ask if elements are the same or not. So the way a theory is usually presented is as a category T. For example let T be the category with two objects E, V and two parallel nontrivial morphisms \sigma, \tau:E\to V, and let S be the category Set of sets and functions. A model is a structure-preserving map from the category T to Set, i.e. a functor. Each object of T gets mapped to a set; here we think of the image of V as a set of vertices and the image of E as a set of edges. Each morphism of T gets mapped to a function; \sigma and \tau take an edge and produce the source vertex or target vertex, respectively. The category T = Th(Graph) is the theory of a graph, and its models are all graphs.

Usually, our theories have extra structure. Consider the first example of a model, a function between sets. We can add structure to the theory; for example, we can take the set T to be the ring of integers \mathbb{Z}. Then a model is a structure-preserving function, a homomorphism between T and S. Of course, this means that S has to have at least as much structure as T. We could, for instance, take S to be the real numbers under multiplication. Since this ring homomorphism is entirely determined by where we map 1, and we can choose any real number for its image, there would be one model for each real number; each integer x would map to a^x for some a. Another option is to take S = \mathbb{Z}_4, the integers modulo 4. There are three nonisomorphic models of \mathbb{Z} in \mathbb{Z}_4. If we map 1 to 0, we get the trivial ring; if we map 1 to 1 or 3, we get integers modulo 4; and if we map 1 to 2, we get integers modulo 2.

Similarly, we can add structure to a category. If we take monoidal categories T, S, then we can tensor objects together to get new ones. A model of such a theory is a tensor-product-preserving functor from T to S. See my paper “Physics, Topology, Computation, and Logic: a Rosetta Stone” with John Baez for a thorough exploration of theories that are braided monoidal closed categories and models of these.

An element of the set \mathbb{Z}_4 is a number, while an object of the category Th(Graph) is a set. A theory is a mathematical gadget in which we can talk about theories of one dimension lower. In Java, we say “interface” instead of “theory” and “class” instead of “model”. With Java interfaces we can describe sets of values and functions between them; it is a cartesian closed category whose objects are datatypes and whose morphisms are (roughly) programs. Models of an interface are different classes that implement that interface.

And there’s no reason to stop at categories; we can consider bicategories with structure and structure-preserving functors between these; these higher theories should let us talk about different models of computation. One model would be Turing machines, another lambda calculus, a third would be the Java Virtual Machine, a fourth Pi calculus.

Renormalization and Computation 4

Posted in Category theory, Math, Programming, Quantum by Mike Stay on 2009 October 15

This is the fourth in a series of posts on Yuri Manin’s pair of papers. In the previous posts, I laid out the background; this time I’ll actually get around to his result.

A homomorphism from the Hopf algebra into a target algebra is called a character. The functor that assigns an action to a path, whether classical or quantum, is a character. In the classical case, it’s into the rig (\mathbb{R} \cup \{\infty\}, \min, \infty, +, 0), and we take an infimum over paths; in the quantum it’s into the rig (\mathbb{C}, +, 0, \cdot, 1), and we take an integral over paths. Moving from the quantum to the classical case is called Maslov dequantization.

Manin mentions that the runtime of a parallel program is a character akin to the classical action, with the runtime of the composition of two programs being the sum of the respective runtimes, while the runtime of two parallel programs is the maximum of the two. A similar result holds for nearly any cost function. He also points out that computably enumerable reals \mathbb{R}^{c.e.} form a rig (\mathbb{R}^{c.e.}\cup \{\infty\}, \max, \infty, +, 0). He examines Rota-Baxter operators as a way to generalize what “polar part” means and extend the theorems on Hopf algebra renormalization to such rigs.

In the second paper, he looks at my work with Calude as an example of a character. He uses our same argument to show that lots of measures of program behavior have the property that if the measure hasn’t stopped growing after reaching a certain large amount with respect to the program size, then the density of finite values the measure could take decreases like 1/\log(t). Surprisingly, though he referred to these results as cutoffs, he didn’t actually use them anywhere for doing regularization.

Reading between the lines, he might be suggesting something like approximating the Kolmogorov complexity that he uses later by using a time cutoff, motivated by results from our paper: there’s a constant c depending only on the programming language such that if you run the nth program cn^2 steps and it hasn’t stopped, then the density of times near t > cn^2 at which it could stop is roughly 1/\log(t).

Levin suggested using a computable complexity that’s the sum of the program length and the log of the number of time steps; I suppose you could “regularize” the Kolmogorov complexity by adding \Lambda \log(t) to the length of the program, renormalize, and then let \Lambda go to zero, but that’s not something Manin does.

Instead, he proposed two other constructions suitable for renormalization; here’s the simplest. Given a partial computable function f:\mathbb{Z}^+\to \mathbb{Z}^+, define the computably enumerable \bar{f}:\mathbb{N}\to\mathbb{N} by \bar{f}(k) = f(k) if f(k) is defined, and 0 otherwise. Now define

\displaystyle \Psi(k,f;z) = \sum_{n=0}^{\infty} \frac{z^n}{\left(1+n\bar{f}(k)\right)^2}.

When f(k) is undefined, \Psi(k,f;z) = 1/(1-z), which has a pole at z=1. When f(k) is defined, \Psi(k,f;z) converges everywhere except z=\infty. Birkhoff decomposition would separate these two cases, though I’m not sure what value \Psi_+(f,k;1) would take or what it would mean.

The other construction involves turning \bar{f} into a permutation (x,y) \mapsto (x+\bar{f}(y),y), and inventing a function that has poles when the permutation has fixpoints.

So Manin’s idea of renormalizing the halting problem is to do some uncomputable stuff to get an easy-to-renormalize function and then throw the Birkhoff decomposition at it; since we know the halting problem is undecidable, perhaps the fact that he didn’t come up with a new technique for extracting information about the problem is unsurprising, but after putting in so much effort to understand it, I was left rather disappointed: if you’re going to allow yourself to do uncomputable things, why not just solve the halting problem directly?

I must suppose that his intent was not to tackle this hard problem, but simply to play with the analogy he’d noticed; it’s what I’ve done in other papers. And being forced to learn renormalization was exhilarating! I have a bunch of ideas to follow up; I’ll write them up as I get a chance.

Renormalization and Computation 1

Posted in Category theory, Math, Programming, Quantum by Mike Stay on 2009 October 7

Yuri Manin recently put two papers on the arxiv applying the methods of renormalization to computation and the Halting problem. Grigori Mints invited me to speak on Manin’s results at the weekly Stanford logic seminar because in his second paper, he expands on some of my work.

In these next few posts, I’m going to cover the idea of Feynman diagrams (mostly taken from the lecture notes for the spring 2004 session of John Baez’s Quantum Gravity seminar); next I’ll talk about renormalization (mostly taken from Andrew Blechman’s overview and B. Delamotte’s “hint”); third, I’ll look at the Hopf algebra approach to renormalization (mostly taken from this post by Urs Schreiber on the n-Category Café); and finally I’ll explain how Manin applies this to computation by exploiting the fact that Feynman diagrams and lambda calculus are both examples of symmetric monoidal closed categories (which John Baez and I tried to make easy to understand in our Rosetta stone paper), together with some results on the density of halting times from my paper “Most programs stop quickly or never halt” with Cris Calude. I doubt all of this will make it into the talk, but writing it up will make it clearer for me.

Renormalization is a technique for dealing with the divergent integrals that arise in quantum field theory. The quantum harmonic oscillator is quantum field theory in 0+1 dimensions—it describes what quantum field theory would be like if space consisted of a single point. It doesn’t need renormalization, but I’m going to talk about it first because it introduces the notion of a Feynman diagram.

“Harmonic oscillator” is a fancy name for a rock on a spring. The force exerted by a spring is proportional to how far you stretch it:

\displaystyle F = kx.

The potential energy stored in a stretched spring is the integral of that:

\displaystyle V_0 = \frac{1}{2}kx^2 + C,

and to make things work out nicely, we’re going to choose C = -1/2. The total energy H_0 is the sum of the potential and the kinetic energy:

\displaystyle H_0 = V_0 + T = \frac{1}{2}kx^2 + \frac{1}{2}mv^2 - \frac{1}{2}.

By choosing units so that k = m = 1, we get

\displaystyle H_0 = \frac{x^2}{2} + \frac{p^2}{2} - \frac{1}{2},

where p is momentum.

Next we quantize, getting a quantum harmonic oscillator, or QHO. We set p = -i \frac{\partial}{\partial x}, taking units where \hbar = 1. Now

\begin{array}{rl}\displaystyle [x, p]x^n & \displaystyle = xp - px \\ & = (- x i \frac{\partial}{\partial x} + i \frac{\partial}{\partial x} x)x^n \\\ & \displaystyle = -i(nx^n - (n+1)x^n) \\ & \displaystyle = ix^n.\end{array}

If we define a new observable z = \frac{p + ix}{\sqrt{2}}, then

\begin{array}{rl} \displaystyle z z^* & \displaystyle = \frac{(p + ix)}{\sqrt{2}} \frac{(p - ix)}{\sqrt{2}} \\ & = \frac{1}{2}(p^2 + i(xp - px) + x^2) \\ & = \frac{1}{2}(p^2 -1 + x^2) \\ & = H_0.\end{array}

We can think of z^* as \frac{d}{dz} and write the energy eigenvectors as polynomials in z:

\displaystyle H_0 z^n = z \frac{d}{dz} z^n = n z^n.

The creation operator z adds a photon to the mix; there’s only one way to do that, so z\cdot z^n = 1 z^{n+1}. The annihilation operator \frac{d}{dz} destroys one of the photons; in the state z^n, there are n photons to choose from, so \frac{d}{dz} z^n = n z^{n-1}.

Schrödinger’s equation says i \frac{d}{dt} \psi = H_0 \psi, so

\displaystyle \psi(t) = \sum_{n=0}^{\infty} e^{-itn} a_n z^n.

This way of representing the state of a QHO is known as the “Fock basis”.

Now suppose that we don’t have the ideal system, that the quadratic potential V_0 = \frac{1}{2}kx^2 - \frac{1}{2} is only a good local approximation to the real potential V_0 + \lambda V. Then we can write the total as H = H_0 + \lambda V, where V is a function of position and momentum, or equivalently of z and \frac{d}{dz}, and \lambda is small.

Now we solve Schrödinger’s equation perturbatively. We know that

\displaystyle \psi(t) = e^{-itH} \psi(0),

and we assume that

\displaystyle e^{-itH}\psi(t) \approx e^{-itH_0} \psi(t)

so that it makes sense to solve it perturbatively. Define

\displaystyle \psi_1(t) = e^{itH_0} e^{-itH}\psi(t)


\displaystyle V_1(t) = e^{itH_0} \lambda V e^{-itH_0}.

After a little work, we find that

\displaystyle \frac{d}{dt}\psi_1(t) = -i V_1(t) \psi_1(t),

and integrating, we get

\displaystyle \psi_1(t) = -i\int_0^t V_1(t_0) \psi_1(t_0) dt_0 + \psi(0).

We feed this equation back into itself recursively to get

\begin{array}{rl}\displaystyle \psi_1(t) & \displaystyle = -i \int_0^t V_1(t_0) \left[-i\int_0^{t_0} V_1(t_1) \psi_1(t_1) dt_1 + \psi(0) \right] dt_0 + \psi(0) \\ & \displaystyle = \left[\psi(0)\right] + \left[\int_0^t i^{-1} V_1(t_0)\psi(0) dt_0\right] + \left[\int_0^t\int_0^{t_0} i^{-2} V_1(t_0)V_1(t_1) \psi_1(t_1) dt_1 dt_0\right] \\ & \displaystyle = \sum_{n=0}^{\infty} \int_{t \ge t_0 \ge \ldots \ge t_{n-1} \ge 0} i^{-n} V_1(t_0)\cdots V_1(t_{n-1}) \psi(0) dt_{n-1}\cdots dt_0 \\ & \displaystyle = \sum_{n=0}^{\infty} (-\lambda i)^n \int_{t \ge t_0 \ge \ldots \ge t_{n-1} \ge 0} e^{-i(t-t_0)H_0} V e^{-i(t_0-t_1)H_0} V \cdots V e^{-i(t_{n-1}-0)H_0} \psi(0) dt_{n-1}\cdots dt_0.\end{array}

So here we have a sum of a bunch of terms; the nth term involves n interactions with the potential interspersed with evolving freely between the interactions, and we integrate over all possible times at which those interactions could occur.

Here’s an example Feynman diagram for this simple system, representing the fourth term in the sum above:

Three interactions with the perturbation.

The lines represent evolving under the free Hamiltonian H_0, while the dots are interactions with the potential V.

As an example, let’s consider V = (z + \frac{d}{dz}) and choose \lambda = \frac{1}{\sqrt{2}} so that \lambda V = p. When V acts on a state \psi = z^n, we get V \psi = z^{n+1} + nz^{n-1}. So at each interaction, the system either gains a photon or changes phase and loses a photon.

A particle moving in a quadratic potential in n-dimensional space gives the tensor product of n QHOs, which is QFT in a space where there are n possible harmonics. Quantum electrodynamics (QED) amounts to considering infinitely many QHOs, one for each possible energy-momentum, which forms a continuum. The diagrams for QED start to look more familiar:
Feynman diagrams
The vertices are interactions with the electromagnetic field. The straight lines are electrons and the wiggly ones are photons; between interactions, they propagate under the free Hamiltonian.

Monad for weakly monoidal categories

Posted in Category theory, Math by Mike Stay on 2009 August 19

We’ve got free and forgetful functors L:\mbox{Cat} \to \mbox{WeakMonCat}, R:\mbox{WeakMonCat} \to \mbox{Cat}. Define T = RL:\mbox{Cat} \to \mbox{Cat}. Given a category X, the category TX has

  • binary trees with \mbox{Ob}(X)-labeled leaves as objects and
  • binary trees with \mbox{Mor}(X)-labeled leaves together with the natural isomorphisms from the definition of a weakly monoidal category as its morphisms.

The multiplication \mu_X:TTX\to TX collapses two layers of trees down to one. The unit \eta_X:X \to TX gives a one-leaf tree.

An algebra of the monad is a category X together with a functor h:TX \to X such that h \circ Th = h \circ \mu_X and h \circ \eta_X = X. Define

x \otimes_X x' = h(\eta_X(x) \otimes_{TX} \eta_X(x')).

Then the associator should be a morphism

a_X:(x \otimes_X x') \otimes_X x'' \to x \otimes_X (x' \otimes_X x'').

However, it isn’t immediately evident that the associator that comes from \otimes_{TX} does the job, since just applying h to a_{TX} gives

h((\eta_X(x) \otimes_{TX} \eta_X(x')) \otimes_{TX} \eta_X(x''))

for the source instead of

h(\eta_X(h(\eta_X(x) \otimes_{TX} \eta_X(x'))) \otimes_{TX} \eta_X(x'')),

which we get by replacing \otimes_X with its definition above. We need an isomorphism

m:(x \otimes_X x') \otimes_X x'' \stackrel{\sim}{\to} h((\eta_X(x) \otimes_{TX} \eta_X(x')) \otimes_{TX} \eta_X(x''))

so we can define a_x = m^{-1} \circ h(a_{TX}) \circ m. Now we use the equations an algebra has to satisfy to derive this isomorphism. Since h \circ Th = h \circ \mu_X, the following two objects are equal:

\begin{array}{rl} & h(\eta_X(h(\eta_X(x) \otimes_{TX} \eta_X(x'))) \otimes_{TX} \eta_X(x''))\\ = & h(\eta_X(h(\eta_X(x) \otimes_{TX} \eta_X(x'))) \otimes_{TX} \eta_X(h(\eta_X(x'')))) \\ = & (h \circ Th) ( \eta_{TX}(\eta_X(x) \otimes_{TX} \eta_X(x')) \otimes_{TTX} \eta_{TX}(\eta_X(x'')) \\ = & (h \circ \mu_X) ( \eta_{TX}(\eta_X(x) \otimes_{TX} \eta_X(x')) \otimes_{TTX} \eta_{TX}(\eta_X(x'')) \\ = & h((\eta_X(x) \otimes_{TX} \eta_X(x')) \otimes_{TX} \eta_X(x'')). \end{array}

Therefore, the isomorphism m we wanted is simply equality and a_X = h(a_{TX}). It also means that a_X satisfies the pentagon equation.

A similar derivation works for the unitors and the triangle equation.

A morphism of algebras is a functor f:X \to Y such that f \circ h_X = h_Y \circ Tf. Now

\begin{array}{rll} & f(x \otimes_X x') &  \\ = & (f \circ h_X) (\eta_X(x) \otimes_{TX} \eta_X(x')) & \mbox{by de}\mbox{fn of }\otimes_X  \\ = & (h_Y \circ Tf) (\eta_X(x) \otimes_{TX} \eta_X(x')) & \\ = & h_Y(\eta_X(f(x)) \otimes_{TX} \eta_X(f(x'))) & \mbox{by } T \\ = & f(x) \otimes_Y f(x')& \mbox{by de}\mbox{fn of }\otimes_Y\end{array}


\begin{array}{rll} & f(I_X) \\ = & (f \circ h_X) (I_{TX}) & \mbox{by de}\mbox{fn of }I_X \\ = & (h_Y \circ Tf) (I_{TX}) & \\ = & h_Y(I_{TY}) & \mbox{since }T\mbox{ preserves the empty tree} \\ = & I_Y & \mbox{by de}\mbox{fn of }I_Y \end{array}

so we have the coherence laws for a strict monoidal functor.


\begin{array}{rll} & f(a_X) & \\ = & (f \circ h_X) (a_{TX}) & \mbox{by the derivation above} \\ = & (h_Y \circ Tf) (a_{TX}) & \\ = & h_Y(a_{TY}) & \mbox{since }T\mbox{ preserves the associator} \\ = & a_Y & \mbox{again by the derivation above},\end{array}

so it preserves the associator as well. The unitors follow in the same way, so morphisms of these algebras are strict monoidal functors that preserve the associator and unitors.

Functors and monads

Posted in Category theory, Math, Programming by Mike Stay on 2009 June 23

In many languages you have type constructors; given a type A and a type constructor Lift, you get a new type Lift<A>. A functor is a type constructor together with a function

   lift: (A -> B) -> (Lift<A> -> Lift<B>)

that preserves composition and identities. If h is the composition of two other functions g and f

   h (a) = g (f (a)),

then lift (h) is the composition of lift (g) and lift (f)

   lift (h) (la) = lift (g) (lift (f) (la)),

where the variable la has the type Lift<A>. Similarly, if h is the identity function on variables of type A

   h (a: A) = a,

then lift (h) will be the identity on variables of type Lift<A>

   lift (h) (la) = la.


  • Multiplication
    Lift<> adjoins an extra integer to any type:

       Lift<A> = Pair<A, int>

    The function lift() pairs up f with the identity function on integers:

       lift (f) = (f, id)
  • Concatenation
    Lift<> adjoins an extra string to any type:

       Lift<A> = Pair<A, string>

    The function lift() pairs up f with the identity function on strings:

       lift (f) = (f, id)
  • Composition
    Let Env be a type representing the possible states of the environment and

       Effect = Env -> Env

    Also, we’ll be explicit in the type of the identity function

       id<A>: A -> A
       id<A> (a) = a,

    so one possible Effect is id<Env>, the “trivial side-effect”.

    Then Lift<> adjoins an extra side-effect to any type:

       Lift<A> = Pair<A, Effect>

    The function lift() pairs up f with the identity on side-effects:

       lift (f) = (f, id<Effect>)
  • Lists
    The previous three examples used the Pair type constructor to adjoin an extra value. This functor is slightly different. Here, Lift<> takes any type A to a list of A‘s:

       Lift<A> = List<A>

    The function lift() is the function map():

       lift = map
  • Double negation, or the continuation passing transform
    In a nice type system, there’s the Unit type, with a single value, and there’s also the Empty type, with no values (it’s “uninhabited”). The only function of type X -> Empty is the identity function id<Empty>. This means that we can think of types as propositions, where a proposition is true if it’s possible to construct a value of that type. We interpret the arrow as implication, and negation can be defined as “arrowing into Empty“: let F = Empty and T = F -> F. Then T -> F = F (since T -> F is uninhabited) and T is inhabited since we can construct the identity function of type F -> F. Functions correspond to constructive proofs. “Negation” of a proof is changing it into its contrapositive form:

      If A then B => If NOT B then NOT A.

    Double negation is doing the contrapositive twice:

      IF A then B => If NOT NOT A then NOT NOT B.

    Here, Lift<> is double negation:

       Lift<A> = (A -> F) -> F.

    The function lift takes a proof to its double contrapositive:

       lift: (A -> B)   ->   ((A -> F) -> F) -> ((B -> F) -> F)
       lift  (f) (k1) (k2) = k1 (lambda (a) { k2 (f (a)) })


A monad is a functor together with two functions

   m: Lift<Lift<A>> -> Lift<A>
   e: A -> Lift<A>

satisfying some conditions I’ll get to in a minute.


  • Multiplication
    If you adjoin two integers, m() multiplies them to get a single integer:

       m: Pair<Pair<A, int>, int> -> Pair<A, int>
       m  (a, i, j)               =  (a, i * j).

    The function e() adjoins the multiplicative identity, or “unit”:

       e: A   -> Pair<A, int>
       e  (a) = (a, 1)
  • Concatenation
    If you adjoin two strings, m() concatenates them to get a single string:

       m: Pair<Pair<A, string>, string> -> Pair<A, string>
       m  (a, s, t)                     =  (a, s + t).

    The function e() adjoins the identity for concatenation, the empty string:

       e: A   -> Pair<A, string>
       e  (a) = (a, "")
  • Composition
    If you adjoin two side-effects, m() composes them to get a single effect:

       m: Pair<Pair<A, Effect>, Effect> -> Pair<A, Effect>
       m  (a, s, t)                     =  (a, t o s),


       (t o s) (x) = t (s (x)).

    The function e() adjoins the identity for composition, the identity function on Env:

       e: A   -> Pair<A, Effect>
       e  (a) = (a, id<Env>)
  • Lists
    If you have two layers of lists, m() flattens them to get a single layer:

       m: List<List<A>> -> List<A>
       m = flatten

    The function e() makes any element of A into a singleton list:

       e: A -> List<A>
       e (a) = [a]
  • Double negation, or the continuation passing transform
    If you have a quadruple negation, m() reduces it to a double negation:

       m: ((((A -> F) -> F) -> F) -> F)   ->  ((A -> F) -> F)
       m (k1) (k2) = k1 (lambda (k3) { k3 (k2) })

    The function e() is just reverse application:

       e: A -> (A -> F) -> F
       e (a) (k) = k (a)

The conditions that e and m have to satisfy are that m is associative and e is a left and right unit for m. In other words, assume we have

   llla: Lift<Lift<Lift<A>>>
   la: Lift<A>


   m (lift (m) (llla))  =  m (m (llla))


   m (e (la))  =  m (lift (e) (la))  =  la


  • Multiplication:
    There are two different ways we can use lifting with these two extra functions e() and m(). The first is applying lift() to them. When we apply lift to m(), it acts on three integers instead of two; but because

       lift (m) = (m, id),

    it ignores the third integer:

       lift (m) (a, i, j, k) = (a, i * j, k).

    Similarly, lifting e() will adjoin the multiplicative unit, but will leave the last integer alone:

       lift (e) = (e, id)
       lift (e) (a, i) = (a, 1, i)

    The other way to use lifting with m() and e() is to apply Lift<> to their input types. This specifies A as Pair<A', int>, so the first integer gets ignored:

       m (a, i, j, k) = (a, i, j * k)
       e (a, i) = (a, i, 1)

    Now when we apply m() to all of these, we get the associativity and unit laws. For associativity we get

       m (lift (m) (a, i, j, k)) = m(a, i * j, k) = (a, i * j * k)
       m (m (a, i, j, k)) = m(a, i, j * k) = (a, i * j * k)

    and for unit, we get

       m (lift (e) (a, i)) = m (a, 1, i) = (a, 1 * i) = (a, i)
       m (e (a, i)) = m (a, i, 1) = (a, i * 1) = (a, i)
  • Concatenation
    There are two different ways we can use lifting with these two extra functions e() and m(). The first is applying lift() to them. When we apply lift to m(), it acts on three strings instead of two; but because

       lift (m) = (m, id),

    it ignores the third string:

       lift (m) (a, s, t, u) = (a, s + t, u).

    Similarly, lifting e() will adjoin the empty string, but will leave the last string alone:

       lift (e) = (e, id)
       lift (e) (a, s) = (a, "", s)

    The other way to use lifting with m() and e() is to apply Lift<> to their input types. This specifies A as Pair<A', string>, so the first string gets ignored:

       m (a, s, t, u) = (a, s, t + u)
       e (a, s) = (a, s, 1)

    Now when we apply m() to all of these, we get the associativity and unit laws. For associativity we get

       m (lift (m) (a, s, t, u)) = m(a, s + t, u) = (a, s + t + u)
       m (m (a, s, t, u)) = m(a, s, t + u) = (a, s + t + u)

    and for unit, you get

       m (lift (e) (a, s)) = m (a, "", s) = (a, "" + s) = (a, s)
       m (e (a, s)) = m (a, s, "") = (a, s + "") = (a, s)
  • Composition
    There are two different ways we can use lifting with these two extra functions e() and m(). The first is applying lift() to them. When we apply lift to m(), it acts on three effects instead of two; but because

       lift (m) = (m, id<Effect>),

    it ignores the third effect:

       lift (m) (a, s, t, u) = (a, t o s, u).

    Similarly, lifting e() will adjoin the identity function, but will leave the last string alone:

       lift (e) = (e, id<Effect>)
       lift (e) (a, s) = (a, id<Env>, s)

    The other way to use lifting with m() and e() is to apply Lift<> to their input types. This specifies A as Pair<A', Effect>, so the first effect gets ignored:

       m (a, s, t, u) = (a, s, u o t)
       e (a, s) = (a, s, id<Env>)

    Now when we apply m() to all of these, we get the associativity and unit laws. For associativity we get

       m (lift (m) (a, s, t, u)) = m(a, t o s, u) = (a, u o t o s)
       m (m (a, s, t, u)) = m(a, s, u o t) = (a, u o t o s)

    and for unit, you get

       m (lift (e) (a, s)) = m (a, id<Env>, s) = (a, s o id<Env>) = (a, s)
       m (e (a, s)) = m (a, s, id<Env>) = (a, id<Env> o s) = (a, s)
  • Lists
    There are two different ways we can use lifting with these two extra functions e() and m(). The first is applying lift() to them. When we apply lift to m(), it acts on three layers instead of two; but because

       lift (m) = map (m),

    it ignores the third (outermost) layer:

         lift (m) ([[[a, b, c], [], [d, e]], [[]], [[x], [y, z]]])
       = [[a, b, c, d, e], [], [x, y, z]]

    Similarly, lifting e() will make singletons, but will leave the outermost layer alone:

       lift (e) ([a, b, c]) = [[a], [b], [c]]

    The other way to use lifting with m() and e() is to apply Lift<> to their input types. This specifies A as List<A'>, so the *innermost* layer gets ignored:

         m ([[[a, b, c], [], [d, e]], [[]], [[x], [y, z]]])
       = [[a, b, c], [], [d, e], [], [x], [y, z]]
       e ([a, b, c]) = [[a, b, c]]

    Now when we apply m() to all of these, we get the associativity and unit laws. For associativity we get

         m (lift (m) ([[[a, b, c], [], [d, e]], [[]], [[x], [y, z]]]))
       = m([[a, b, c, d, e], [], [x, y, z]])
       = [a, b, c, d, e, x, y, z]
         m (m ([[[a, b, c], [], [d, e]], [[]], [[x], [y, z]]]))
       = m([[a, b, c], [], [d, e], [], [x], [y, z]])
       = [a, b, c, d, e, x, y, z]

    and for unit, we get

       m (lift (e) ([a, b, c])) = m ([[a], [b], [c]]) = [a, b, c]
       m (e ([a, b, c])) = m ([[a], [b], [c]]) = [a, b, c]

Monads in Haskell style, or “Kleisli arrows”

Given a monad (Lift, lift, m, e), a Kleisli arrow is a function

   f: A -> Lift<B>,

so the e() function in a monad is already a Kleisli arrow. Given

   g: B -> Lift<C>

we can form a new Kleisli arrow

   (g >>= f): A -> Lift<C>
   (g >>= f) (a) = m (lift (g) (f (a))).

The operation >>= is called “bind” by the Haskell crowd. You can think of it as composition for Kleisli arrows; it’s associative, and e() is the identity for bind. e() is called “return” in that context. Sometimes code is less complicated with bind and return instead of m and e.

If we have a function f: A -> B, we can turn it into a Kleisli arrow by precomposing with e():

   (e o f): A -> Lift<B>
   (e o f) (a) = e (f (a)) = return (f (a)).


  • Double negation, or the continuation passing style transform
    We’re going to (1) show that the CPS transform of a function takes a continuation and applies that to the result of the function.  We’ll also (2) show that for two functions r, s,

         CPS (s o r)    =    CPS (s) >>= CPS (r),

    (1) To change a function f: A -> B into a Kleisli arrow (i.e. continuized function) CPS (f): A -> (B -> X) -> X, we just compose with e—or in the language of Haskell, we return the result:

         CPS (f) (a) (k)
       = return (f (a)) (k)
       = (e o f) (a) (k)
       = e (f (a)) (k)
       = k (f (a))

    (2) Given two Kleisli arrows

       f: A -> (B -> F) -> F


       g: B -> (C -> F) -> F,

    we can bind them:

         (g >>= f) (a) (k)
       = m (lift (g) (f (a))) (k)                                // defn of bind
       = lift (g) (f (a)) (lambda (k3) { k3 (k) })               // defn of m
       = f (a) (lambda (b) { (lambda (k3) { k3 (k) }) (g (b)) }) // defn of lift
       = f (a) (lambda (b) { g (b) (k) }),                       // application

    which is just what we wanted.

    In particular, if f and g are really just continuized functions

       f = (e o r)
       g = (e o s)


         (g >>= f) (a) (k)
       = f (a) (lambda (b) { g (b) (k) })             // by above
       = (e o r) (a) (lambda (b) { (e o s) (b) (k) }) // defn of f and g
       = (e o r) (a) (lambda (b) { k (s (b)) })       // defn of e
       = (e o r) (a) (k o s)                          // defn of composition
       = (k o s) (r (a))                              // defn of e
       = k (s (r (a)))                                // defn of composition
       = (e o (s o r)) (a) (k)                        // defn of e
       = CPS (s o r) (a) (k)                          // defn of CPS


       CPS (s) >>= CPS (r)    =    CPS (s o r).

My talk at Perimeter Institute

Posted in Category theory, General physics, Math, Quantum by Mike Stay on 2009 June 11

I spent last week at the Perimeter Institute, a Canadian institute founded by Mike Lazaridis (CEO of RIM, maker of the BlackBerry) that sponsors research in cosmology, particle physics, quantum foundations, quantum gravity, quantum information theory, and superstring theory. The conference, Categories, Quanta, Concepts, was organized by Bob Coecke and Andreas Döring. There were lots of great talks, all of which can be found online, and lots of good discussion and presentations, which unfortunately can’t. (But see Jeff Morton’s comments.) My talk was on the Rosetta Stone paper I co-authored with Dr. Baez.

Syntactic string diagrams

Posted in Category theory, Math, Programming by Mike Stay on 2009 January 24

I hit on the idea of making lambda a node in a string diagram, where its inputs are an antivariable and a term in which the variable is free, and its output is the same term, but in which the variable is bound.  This allows a string diagram notation for lambda calculus that is much closer to the syntactical description than the stuff in our Rosetta Stone paper. Doing it this way makes it easy to also do pi calculus and blue calculus.

There are two types, V (for variable) and T (for term). I’ve done untyped lambda calculus, but it’s straightforward to add subscripts to the types V and T to do typed lambda calculus.

There are six function symbols:

  • \lambda:V^* \times T \to T. Lambda takes an antivariable and a term that may use the corresponding variable.
  • \cap:1 \to V^* \times T. This turns an antivariable “x” introduced by lambda into the term “x”.
  • A:T \times T \to T. (Application) This takes f and x and produces f(x).
  • {\rm swap}:T \times T \to T \times T
  • !:T \to 1
  • \Delta:T \to T \times T. These two mean we can duplicate and delete terms.

The \beta-\eta rule is the real meat of the computation. The “P” is an arbitrary subdiagram. The effect is replacing the “A” application node with the “P” subdiagram, modulo some wiring.

I label the upwards arrows out of lambdas with a variable name in parentheses; this is just to assist in matching up the syntactical representation with the string diagram.

In the example, I surround part of the diagram with a dashed line; this is the part to which the \beta-\eta rule applies. Within that, I surround part with a dash-dot line; this is the subdiagram P in the rule.

When I do blue calculus this way, there are a few more function symbols and the relations aren’t confluent, but the flavor is very much the same.

String diagrams for untyped lambda calculus

String diagrams for untyped lambda calculus

An example calculation

An example calculation

I finally understand the state transformer monad!

Posted in Category theory by Mike Stay on 2008 October 29

It’s the monad arising from the currying adjunction between FX = X \times Y and GZ = Y \multimap Z.  GF attaches an environment Y to X:

GFX = Y \multimap (X \times Y)

This is the type of a function that takes a state of type Y and outputs a result of type X and a new state of type Y. So G is the ability to depend on state and F is the ability to change the state.

The natural transformation \epsilon:FG \Rightarrow 1, is evaluation:

\epsilon_X:(Y \multimap X) \times Y \to X

takes a function and an input point and evaluates the function at that point.  So we get

\mu:GFGF \Rightarrow GF

by evaluating the FG in the middle, while the unit

\eta:1 \Rightarrow GF

is just the curried identity on pairs:

\begin{array}{rl}\eta_X:X & \to Y \multimap (X \times Y) \\ x & \mapsto \lambda y.(x,y)\end{array}

Reading list

Posted in Category theory, Math, Programming by Mike Stay on 2008 April 28

Yoneda embedding as contrapositive and call-cc as double negation

Posted in Category theory, Math, Programming by Mike Stay on 2008 January 26

Consider the problem of how to represent negation of a proposition P when we only have implication and falsehood:

P \to {\rm F}.

Since {\rm F} \to {\rm F} is true and {\rm T} \to {\rm F} is false, this does what we want.

The contrapositive says \neg Q \to \neg P has the same truth value as P \to Q. Using only implication, the contrapositive of P \to Q is

(Q \to {\rm F}) \to (P \to {\rm F}).

What if we don’t even have falsehood? Well, we can pick any proposition X to represent falsehood and form

(Q \to X) \to (P \to X).

The Yoneda embedding takes a category C^{\rm op} and produces a category \mbox{HOM}(C, \mbox{Set}):

\begin{array}{ccccccc}\mbox{Yoneda}_X: & C^{\rm op} & \to & \mbox{HOM}(C, \mbox{Set}) \\ & P & \mapsto & \mbox{hom}(P, X) \\ & f:P \to Q & \mapsto & \mbox{Yoneda}_X(f): & \mbox{hom}(Q, X) & \to & \mbox{hom}(P, X) \\ & & & & k & \mapsto & k \circ f \end{array}

This embedding is better known among computer scientists as the continuation passing style transformation.

In a symmetric monoidal closed category, like lambda calculus, we can move everything “inside:” every morphism f:P \to Q has an internal version f:I\to P \multimap Q. The internal Yoneda embedding of f is

\mbox{Yoneda}_X(f):I \to (Q \multimap X) \multimap (P \multimap X).

Here I is the “unit” type; notice how the target type is isomorphic under the Curry-Howard isomorphism to the contrapositive. This is a term that maps a continuation k:Q\multimap X and a value p:P to k(f(p)).

To get double negation, first do the Yoneda embedding on the identity to get

\mbox{Yoneda}_X(1_P):hom(P,X) \to hom(P,X),

then uncurry, braid, and recurry to get

\mbox{Yoneda}_X(1_P):P \to hom(hom(P,X),X),

or, internally,

\mbox{Yoneda}_X(1_P):I\to P \multimap ((P\multimap X)\multimap X).

This takes a value p to a function value k\mapsto k(p).

Call-with-current-continuation expects a term that has been converted to CPS style as above, and then hands it the remainder of the computation in k.

The category GraphUp

Posted in Category theory, Programming by Mike Stay on 2008 January 17

The category GraphUp of graphs and Granovetter update rules has

  • directed finite graphs as objects
  • morphisms generated by Granovetter rules, i.e. the following five operations:
    • add a new node. (creation, refcount=0)
    • given a node x, add a new node y and an edge x\to y. (creation, refcount=1)
    • given edges x\to y and x \to z, add an edge y\to z. (introduction, ++refcount)
    • remove an edge. (deletion, --refcount)
    • remove a node and all its outgoing edges if it has no incoming edges. (garbage collection)

It’s a monoidal category where the tensor product is disjoint union. Also, since two disjoint graphs can never become connected, they remain disjoint.

Programs in a capability-secure language get interpreted in GraphUp. A program’s state graph consists of nodes, representing the states of the system, and directed edges, representing the system’s transitions between states upon receiving input. A functor from a program state graph to GraphUp assigns an object reference graph as state to each node and an update generated by Granovetter rules to each edge.

Haskell monads for the category theorist

Posted in Category theory, Math, Programming by Mike Stay on 2008 January 1

A monad in Haskell is about composing almost-compatible morphisms. Typically you’ve got some morphism f:A\to B and then you think of a reason that you’d rather have f':A \to TB, where T:{\rm Data} \to {\rm Data} is a functor. In Haskell, though, you don’t define functors right out: you define type constructors, which are like the map of objects in a functor. You have to define the map of morphisms separately.

To define the map of morphisms, we have to define unit (aka return) and bind. We define {\rm unit}:(A \multimap B) \to (A \multimap TB) to be the category-theorist’s unit \eta composed with the input. Applying {\rm unit} to f gives

A\xrightarrow{f} B \xrightarrow{\eta} TB.

To compose “half-functored” morphisms like f', we define {\rm bind}:TA \times (A\multimap TB)\to TB like this: given an input a:TA and a morphism like f', bind produces

TA \xrightarrow{Tf'} TTB \xrightarrow{\mu_B} TB.

So a “monad” in Haskell is always the monad for categories, except the lists are of bindable half-functored morphisms like f' rather than lists of composable morphisms.

A side-effect monad has T(A) = A\times E, where E is the data type for the environment on which the morphism is to act. The IO monad is a specific instance of a side-effect monad with the environment being the keyboard queue, disk drive, network, etc.

Another piece of the stone

Posted in Category theory, Math, Programming by Mike Stay on 2007 December 28

A few days ago, I thought that I had understood pi calculus in terms of category theory, and I did, in a way.

\lambda-calculus model in Set
type set of values
term function
(deterministic outcome)
rewrite rule identity functional
\pi-calculus model in Set
process set of states
reduction rule state update rule
(deterministic outcome)

To make lambda calculus into a category, we mod out by the rewrite rules and consider equivalence classes of terms instead of actual terms. A model of this category (i.e. a functor from the category to Set) picks out a set of values for each datatype and a function for each program. Given a value from the input set, we get a single value from the output set.

Similarly, a model of the pi calculus assigns to each process a set of states and to each reduction rule a function that updates the state. A morphism in this way of thinking is a certain number of reductions to perform. The reductions are deterministic in the sense that we can model “A or B” as A \sqcup B. Given an object’s state at a certain point, we can tell what set of states it can be in after the system reduces n messages.

However, what we really want is to know the set of states it can be in after all the messages have been processed: what is its normal form? This is far more like the rewrite rules in lambda calculus. It suggests that we should be treating the reduction rules like 2-morphisms instead of 1-morphisms. There’s one important difference from lambda calculus, however: the 2-morphisms of pi calculus are not confluent! It matters very much in which order they are evaluated. Thus processes can’t map to anything but trivial functions.

It looks like a better fit for models of the pi calculus is Rel, the category of sets and relations. A morphism in Rel can relate a single input state to multiple output states. This suggests we have a single object * in the pi calculus that gets mapped to a set of possible states for the system, while each process gets mapped to a relation that encodes all its possible reductions.

I’m rather embarrassed that it took me so long to notice this, since my advisor has been talking about replacing Set by Rel for years.

category lambda calculus pi calculus
objects types only one type *, a system of processes
a morphism an equivalence class of terms a structural congruence class of processes
dinatural transformation from the constant functor (mapping to the terminal set and its identity) to a functor generated by hom, products, projections, and exponentials (if they’re there) combinator: template for programs mapping between isomorphic types (usually) since there’s only one type, this is trivial
Model of the category in Rel (usually taken in Set, a subcategory of Rel) a set of values for each data type and a function for each morphism between them * maps to a set S of states for the system, and each process gets mapped to a relation that relates each element of S to its possible reductions in S

The Yoda embedding

Posted in Category theory by Mike Stay on 2007 December 21

\overline{\mbox{Contravariant}}\,(\mbox{it is}).

Continuation passing as a reflection

Posted in Category theory, Math, Programming by Mike Stay on 2007 December 21

We can write any expression like f(g(x,y)) as a full binary tree where the nodes denote evaluation of the left child at the right, and the leaves are values:


Figure 1: f(g(x)(y))

[In the caption of figure 1, the expression is slightly different; when using trees, it’s more convenient to curry all the functions—that is, replace every comma “,” by back-to-back parens: “)(” .]

The continuation passing transform (Yoneda embedding) first reflects the tree across the vertical axis and then replaces the root and all the left children with their continuized form—a value x gets replaced with a function value \overline{x} = (f \mapsto f(x)):


Figure 2: \overline{\overline{ \overline{y}(\overline{x}(g))  } (f)}

What does this evaluate to? Well,

\begin{array}{rl} & \overline{\overline{ \overline{y}(\overline{x}(g))  } (f) }\\ = & \overline{f(\overline{y}(\overline{x}(g)))} \\ = & \overline{f(\overline{x}(g)(y))} \\ = & \overline{f(g(x)(y))} \end{array}

As we hoped, it’s the continuization (Yoneda embedding) of the original expression. Iterating, we get


Figure 3: \overline{\overline{\overline{f}\left(\overline{\overline{\overline{g}\left(\overline{x}\right)}\left(\overline{y}\right)}\right)}}

At this point, we get an enormous simplification: we can get rid of overlines whenever the left and right branch both have them. For instance,

\overline{g}(\overline{x}) = \overline{x}(g) = g(x).

Actually working out the whole expression would mean lots of epicycular reductions like this one, but taking the shortcut, we just get rid of all the lines except at the root. That means that we end up with


for our final expression.

However, if this expression is just part of a larger one—if what we’re calling the “root” is really the child of some other node—then the cancellation of lines on siblings applies to our “root” and its sibling, and we really do get back to where we started!

A piece of the Rosetta stone

Posted in Category theory, Math, Programming by Mike Stay on 2007 December 20
category lambda calculus pi calculus Turing machine
objects types structural congruence classes of processes \mathbb{N}\times S^*, where \mathbb{N} is the natural numbers and S^* is all binary sequences with finitely many ones.
a morphism an equivalence class of terms a specific reduction from one process state to the next a specific transition from one state and position of the machine to the next
dinatural transformation from the constant functor (mapping to the terminal set and its identity) to a functor generated by hom, products, projections, and exponentials (if they’re there) combinator reduction rule (covers all reductions of a particular form) tape-head update rule (covers all transitions with the current cell and state in common)
products product types parallel processes multiple tapes
internal hom exponential types all types are exponentials? ?
Model of the category in Set A set of values for each data type and a function for each morphism between them A set of states for each process and a single evolution function out of each set. ?

This won’t be appearing in our Rosetta stone paper, but I wanted to write it down. What flavor of logic corresponds to the pi calculus? To the Turing machine?

The continuation passing transform and the Yoneda embedding

Posted in Category theory, Math, Programming by Mike Stay on 2007 December 19

They’re the same thing! Why doesn’t anyone ever say so?

Assume A and B are types; the continuation passing transform takes a function (here I’m using C++ notation)

B f(A a);

and produces a function

 CPT_f( (*k)(B), A a) {
  return k(f(a));

where X is any type. In CPT_f, instead of returning the value f(a) directly, it reads in a continuation function k and “passes” the result to it. Many compilers use this transform to optimize the memory usage of recursive functions; continuations are also used for exception handling, backtracking, coroutines, and even show up in English.

The Yoneda embedding takes a category C^{\rm op} and produces a category \mbox{HOM}(C, \mbox{Set}):

\begin{array}{ccccccc}\mbox{CPT}: & C^{\rm op} & \to & \mbox{HOM}(C, \mbox{Set}) \\ & A & \mapsto & \mbox{hom}(A, -) \\ & f:B\leftarrow A & \mapsto & \mbox{CPT}(f): & \mbox{hom}(B, -) & \to & \mbox{hom}(A, -) \\ & & & & k & \mapsto & k \circ f \end{array}

We get the transformation above by uncurrying to get \mbox{CPT}(f):\mbox{hom}(B, -) \times A \to -.

In Java, a (cartesian closed) category C is an interface C with a bunch of internal interfaces and methods mapping between them. A functor F:C \to {\rm Set} is written

class F implements C.

Then each internal interface C.A gets instantiated as a set F.A of values and each method C.f() becomes instantiated as a function F.f() between the sets.

The continuation passing transform can be seen as a parameterized functor {\rm CPT\langle X\rangle}:C \to {\rm Set}. We’d write

class CPT<X> implements C.

Then each internal interface C.A gets instantiated as a set CPT.A of methods mapping from C.A to X—i.e. continuations that accept an input of type C.A—and each method C.f maps to the continuized function CPT.f described above.

Then the Yoneda lemma says that for every model of C—that is, for every class F implementing the interface C—there’s a natural isomorphism between the set F(A) and the set of natural transformations {\rm hom}({\rm hom}(A, -), F).

A natural transformation \alpha between F:C \to {\rm Set} and {\rm CPT\langle X \rangle}:C \to {\rm Set} is a way to cast the class F to the class CPT<X> such that for any method of C, you can either

  • invoke its implementation directly (as a method of F) and then continuize the result (using the type cast), or
  • continuize first (using the type cast) and then invoke the continuized function (as a method of CPT<X>) on the result

and you’ll get the same answer. Because it’s a natural isomorphism, the cast has an inverse.

The power of the Yoneda lemma is taking a continuized form (which apparently turns up in lots of places) and replacing it with the direct form. The trick to using it is recognizing a continuation when you see one.

Category Theory for the Java Programmer

Posted in Category theory, Math, Programming by Mike Stay on 2007 November 3

[Edit May 11, 2012: I’ve got a whole blog on Category Theory in JavaScript.]

There are several good introductions to category theory, each written for a different audience. However, I have never seen one aimed at someone trained as a programmer rather than as a computer scientist or as a mathematician. There are programming languages that have been designed with category theory in mind, such as Haskell, OCaml, and others; however, they are not typically taught in undergraduate programming courses. Java, on the other hand, is often used as an introductory language; while it was not designed with category theory in mind, there is a lot of category theory that passes over directly.

I’ll start with a sentence that says exactly what the relation is of category theory to Java programming; however, it’s loaded with category theory jargon, so I’ll need to explain each part.

A collection of Java interfaces is the free3 cartesian4 category2 with equalizers5 on the interface6 objects1 and the built-in7 objects.

1. Objects

Both in Java and in category theory, objects are members of a collection. In Java, it is the collection of values associated to a class. For example, the class Integer may take values from -2^{63} up to 2^{63}-1. The class

enum Seasons { Spring, Summer, Fall, Winter }

has four possible values.

In category theory, the collection of objects is “half” of a category. The other part of the category is a collection of processes, called “morphisms,” that go between values; morphisms are the possible ways to get from one value to another. The important thing about morphisms is that they can be composed: we can follow a process from an initial value to an intermediate value, and then follow a second process to a final value, and consider the two processes together as a single composite process.

2. Categories

Formally, a category is

  • a collection of objects, and
  • for each pair of objects A, B, a set {\rm hom}(A, B) of morphisms between them

such that

  • for each object X the set {\rm hom}(X, X) contains an identity morphism 1_X,
  • for each triple of objects X, Y, Z and pair of morphisms f \in {\rm hom}(X, Y) and g \in {\rm hom}(Y, Z), there is a composite morphism (g \circ f) \in {\rm hom}(X, Z),
  • for each pair of objects X, Y and morphism f\in {\rm hom}(X, Y), the identitiy morphisms are left and right units for composition: 1_Y \circ f = f = f \circ 1_X, and
  • for each 4-tuple of objects X, Y, Z, W and triple of morphisms f \in {\rm hom}(X, Y), g \in {\rm hom}(Y, Z), h \in {\rm hom}(Z, W) composition is associative: h\circ (g \circ f) = (h \circ g)\circ f.

If a morphism f \in {\rm hom}(A,B), category theorists write f:A\to B.

We can also define a category in Java. A category is any implementation of the following interface:

interface Category {
  interface Object {}
  interface Morphism {}
  class IllegalCompositionError extends Error;

  Object source(Morphism);
  Object target(Morphism);
  Morphism identity(Object);
  Morphism compose(Morphism, Morphism)
    throws IllegalCompositionError;

that passes the following tests on all inputs:

void testComposition(Morphism g, Morphism f) {
  if (target(f) != source(g)) {
    assertRaises(compose(g, f),
  } else {
    assertEqual(source(compose(g, f)), source(f));
    assertEqual(target(compose(g, f)), target(g));
void testAssociativity(Morphism h,
                       Morphism g,
                       Morphism f) {
  assertEqual(compose(h, compose(g, f)),
              compose(compose(h, g), f));
void testIdentity(Object x) {
  assertEqual(source(identity(x)), x);
  assertEqual(target(identity(x)), x);
void testUnits(Morphism f) {
  assertEqual(f, compose(identity(target(f)), f));
  assertEqual(f, compose(f, identity(source(f))));

One kind of category that programmers use every day is a monoid. Monoids are sets equipped with a multiplication table that’s associative and has left- and right units. Monoids include everything that’s typically called addition, multiplication, or concatenation. Adding integers, multiplying matrices, concatenating strings, and composing functions from a set to itself are all examples of monoids.

In the language of category theory, a monoid is a one-object category (monos is Greek for “one”). The set of elements you can “multiply” is the set of morphisms from that object to itself.

In Java, a monoid is any implementation of the Category interface that also passes this test on all inputs:

void testOneObject(Morphism f, Object x) {
  assertEqual(source(f), x);

The testOneObject test says that given any morphism f and any object x, the source of the morphism has to be that object: given another object y, the test says that source(f) == x and source(f) == y, so x == y. Therefore, if the category passes this test, it has only one object.

For example, consider the monoid of XORing bits together, also known as addition modulo 2. Adding zero is the identity morphism for the unique object; we need a different morphism to add 1:

interface Parity implements Category {
  Morphism PlusOne();

The existence of the PlusOne method says that there has to be a distinguished morphism, which could potentially be different from the identity morphism. The interface itself can’t say how that morphism should behave, however. We need tests for that. The testPlusOne test says that (1 + 1) % 2 == 0. The testOneIsNotZero test makes sure that we don’t just set 1 == 0: since (0 + 0) % 2 == 0, the first test isn’t enough to catch this case.

void testOnePlusOne(Object x) {
  assertEqual(compose(PlusOne(), PlusOne()),

void testOneIsNotZero(Object x) {
  assertNotEqual(PlusOne(), identity(x));

Here’s one implementation of the Parity interface that passes all of the tests for Category, testOneObject, and the two Parity tests:

class ParityImpl implements Parity {
  enum ObjectImpl implements Object { N };
  enum MorphismImpl implements Morphism {
  Object source(Morphism) { return N; }
  Object target(Morphism) { return N; }
  Morphism identity(Object) { return PlusZero; }
  Morphism compose(Morphism f, Morphism g) {
    if (f == PlusZero) return g;  // 0 + g = g
    if (g == PlusZero) return f;  // f + 0 = f
    return PlusZero;              // 1 + 1 = 0
  Morphism PlusOne() { return PlusOne; }

3. Free

Java tests are a kind of “blacklisting” approach to governing implementations. Had we not added testOneIsNotZero, we could have returned PlusZero and the compiler would have been happy. In a free category, relations are “whitelisted:” an implementation must not have relations (constraints) other than the ones that are implied by the definitions.

  • “The free category” has no objects and no morphisms, because there aren’t any specified.
  • “The free category on one object N” has one object and one morphism, the identity 1_N. The morphism has to be there, because the definition of category says that every object has to have an identity morphism, but we can’t put in any other morphisms.
  • “The free category on the directed graph G
                         A -f-> B
                         |    /
                     G = h   g
                         |  /
                         V L


    • three objects A, B, C and
    • seven morphisms:
      • 1_A:A \to A,
      • 1_B:B \to B,
      • 1_C:C \to C,
      • f:A \to B,
      • g:B \to C,
      • h:A \to C, and
      • (g \circ f):A \to C.

    The three vertices and the three edges become objects and morphisms, respectively. The three identity morphisms and (g \circ f) are required to be there because of the definition of a category. And because the category is free, we know that (g \circ f) \ne h.

    For abritrary directed graphs, the free category on the graph has the vertices of the graph as objects and the paths in the graph as morphisms.

  • The parity monoid is completely defined by “the free category on one object N, one morphism \mbox{PlusOne}:N\to N, and one relation \mbox{PlusOne} \circ \mbox{PlusOne} = 1_{N}.
  • If we leave out the relation and consider “the free category on one object N and one morphism \mbox{PlusOne}:N\to N,” then we get the natural numbers under addition: since we didn’t specify a point at which repeated applications of PlusOne“wrap around” back to zero, the free category has no such constraint. To add the number n, we form

    \underbrace{\mbox{PlusOne} \circ \mbox{PlusOne} \circ \cdots \circ \mbox{PlusOne}}_{n \mbox{ copies}}.

  • Exercise: what is the common name for “the free category on one object N and two morphisms \mbox{ConcatenateZero}:N\to N, \mbox{ConcatenateOne}:N\to N?” What should the identity morphism be named?

4. Cartesian

A cartesian category has lists as its objects. It has a way to put objects together into ordered pairs, a way to copy objects, and an object that’s “the empty” object.

It’s time to do the magic! Recall the interface Category:

interface Category {
  interface Object {}
  interface Morphism {}
  class IllegalCompositionError extends Error;

  Object source(Morphism);
  Object target(Morphism);
  Morphism identity(Object);
  Morphism compose(Morphism, Morphism)
    throws IllegalCompositionError;

Now let’s change some names:

interface Interface {
 interface InternalInterfaceList {}
 interface ComposableMethodList {}
 class IllegalCompositionError extends Error;

 InternalInterfaceList source(ComposableMethodList);
 InternalInterfaceList target(ComposableMethodList);
 ComposableMethodList identity(InternalInterfaceList);
 ComposableMethodList compose(ComposableMethodList,
   throws IllegalCompositionError;

Category theory uses cartesian categories to describe structure; Java uses interfaces. Whenever you see “cartesian category,” you can think “interface.” They’re pretty much the same thing. Practically, that means that a lot of the drudgery of implementing the Category interface is taken care of by the Java compiler.

For example, recall the directed graph G above. We can get effectively the same implementation of the graph by using this interface:

interface G {
  interface A;
  interface B;
  interface C;
  B f(A);
  C g(B);
  C h(A);

That’s it! We’re considering the free category on G, so there are no tests. We can compose lists of methods: g(f(a)). The compiler will give us an error if we try to compose methods whose source and target don’t match: h(g(b)) doesn’t work.

Because the objects of the cartesian category Interface are lists, we can define methods in our interfaces that have multiple inputs.

interface Monoid {
  interface M;
  M x(M, M);
  M e();
void testX(M a, M b, Mc) {
  assertEqual(x(a, x(b, c)),
              x(x(a, b), c));
void testE(M a) {
  assertEqual(a, x(a, e()));
  assertEqual(a, x(e(), a));

Here, the method x takes a two-element list as input, while e takes an empty list.

Exercise: figure out how this definition of a monoid relates to the one I gave earlier.

Implementation as a functor

Cartesian categories (interfaces) provide the syntax for defining data structures. The meaning, or semantics, of Java syntax comes from implementing an interface.

In category theory, functors give meaning to syntax. Functors go between categories like functions go between sets. A functor F:C \to D

  • maps objects of C to objects of D and
  • maps morphisms of C to morphisms of D such that
  • identities and composition are preserved.

There are several layers of functors involved in implementing a typical Java program. First there’s the implementation of the interface in Java as a class that defines everything in terms of the built-in classes and their methods; next, there’s the implementation of the built-in methods in the Java VM, then implementation of the bytecodes as native code on the machine, and finally, physical implementation of the machine in silicon and electrons. The composite of all these functors is supposed to behave like a single functor into {\rm Set}, the category whose objects are sets and whose morphisms are functions between sets. We end up approximating many of these sets: instead of integers, we use ints and hope that they don’t overflow. Instead of real numbers, we use doubles and are surprised when we lose precision. In category theory, we ignore this and say that in any practical application, we’ll use datatypes that are “big enough.”

The upshot of all this is that a Java class F implementing an interface X can be thought of as a functor F:X \to {\rm Set}. The class definition assigns to each inner interface a set of values and to each method a function mapping between those sets.

Here’s an example of three different classes that implement the Monoid interface from the last subsection. Recall that a monoid is a set of things that we can combine; we can add two integers to get an integer, multiply two doubles to get a double, or concatenate two strings to get a string. The combining operation is associative and there’s a special element that has no effect when you combine it with something else: adding zero, multiplying by 1.0, or concatenating the empty string all do nothing.

So, for example,

class AddBits implements Monoid {
  enum Bit implements M { Zero, One }
  M x(M f, M g) {
    if (f == e()) return g;  // 0+g=g
    if (g == e()) return f;  // f+0=f
    return Zero;             // 1+1=0
  M e() { return Zero; }

Here, the functor {\rm AddBits}:{\rm Monoid} \to {\rm Set} assigned the set \{0, 1\} to the object M, assigned the function XOR to the morphism x, and assigned the constant function returning {0} to the morphism e.

class MultiplyBits implements Monoid {
  enum Bit implements M { Zero, One }
  M x(M f, M g) {
    if (f == e()) return g;   // 1*g=g
    if (g == e()) return f;   // f*1=f
    return Zero;              // 0*0=0
  M e() { return One; }

Here, the functor {\rm MultiplyBits}:{\rm Monoid} \to {\rm Set} assigned to M the same set as before, \{0, 1\}, but assigned the function AND to the morphism x and the constant function returning 1 to the morphism e.

class Concatenate implements Monoid {
  class ConcatString implements M {
    ConcatString(String x) {
      this.x = x;
    String x;
  M x(M f, M g) {
    return new ConcatString(
        f.x + g.x);
  M e() {
    return new ConcatString("");

Here, the functor {\rm Concatenate}:{\rm Monoid} \to {\rm Set} assigned the set of strings to M, assigned the string concatenation function to the morphism x and the constant function returning the empty string to the morphism e.

5. Equalizers

An equalizer of two functions f:A \to C and g:B \to C picks out the set A {{}_f\times_g} B= \{\mbox{pairs of elements } (a,b) \mbox{ such that } f(a) = g(b)\}. This gets used in the definition of a category: we let A = B = the set of morphisms in our category, and let C be the set of objects. We let f, g be the source and target maps, repsectively. Then A {{}_f\times_g} B is the set of composable pairs of morphisms.

In Java, this means that we can throw an exception if the pair isn’t composable.

6, 7. Interface objects and built-in objects

The objects of the free category in question are java interfaces, whether defined by the programmer or built-in. Because it’s cartesian, we can combine interfaces into new interfaces:

interface XYPair { interface X; interface Y; }

The built-in objects have some relations between them–we can cast an integer into a double, for instance, or turn an integer into a string–so these relations exist between combinations of arbitrary interfaces with built-in ones. But there are no automatic cast relationships between user-defined interfaces, so the category is free.

Next time

This post was about how to translate between Java ideas and category-theory ideas. Next time, I’ll show what category theory has to offer to the Java programmer.

Cartesian categories and the problem of evil

Posted in Category theory, Math by Mike Stay on 2007 September 19

How many one-element sets are there? Well, given any set S, we can construct the one-element set \{S\}, so the collection of one-element sets has to be a proper class, a mindbogglingly enormous collection far larger than any mere set could be. However, they’re all the same from the point of view of functions coming out of them: the one element, no matter what it is, maps to a point in the range. The internal nature of a given one-element set is completely irrelevant to the way it behaves as a member of the category Set of all sets and functions.

For a category theorist, making a distinction between one-element sets is evil. Instead of looking inside an object to see how it’s made, we should only care about how it interacts with the world around it. There are certain kinds of objects that are naturally special because of the way they interact with everything else; we say they satisfy universal properties.

Just as it is evil to dwell on the differences between isomorphic one-element sets, it is evil to care about the inner workings of ordered pairs. Category theory elevates an ordered pair to a primitive concept by ignoring all details about the implementation of an ordered pair except how it interacts with the rest of the world.  Ordered pairs are called “products” in category theory.

A product of the objects G and H in the category C is

  • an object of C, which we’ll label G\times H, together with
  • two maps, called projections
    • \pi_G:G\times H\to G
    • \pi_H:G\times H\to H

that satisfy the following universal property: for any triple (X, f_G:X\to G, f_H:X \to H), there is a unique map from X to G\times H making the following diagram commute:

Xto Gtimes H making the whole diagram commute

In particular, given two different representations of ordered pairs, there’s a unique way to map between them, so they must be isomorphic.

A category will either have products or it won’t:


1. The category Set has the obvious cartesian product.

2. The trivial category has one object and one morphism, the identity, so there’s only one choice for a triple like the one in the definition:

(X, 1_X, 1_X),

and it’s clearly isomorphic to itself, so the trivial category has products.

3. A preorder is a set S equipped with a \le relation on pairs of elements of S that is transitive and reflexive. Given any preorder, we can construct a category whose

  • objects are the elements of S and
  • there is an arrow from x to y if x \le y.

So a product in a preorder is

  • an element z = x \times y of S together with maps
    • \pi_x:z\to x (that is, z \le x)
    • \pi_y:z \to y (that is. z \le y)

    such that for any other element w \in S, \, w\le x, \, w \le y, we have w \le z.

In other words, a product x \times y in a preorder is the greatest lower bound of x, y.  For example, in the preorder (\mathbb{R}, \le), the product of two numbers x, y is min(x, y).  In the preorder (\mbox{Set}, \subseteq), the product is x \cap y.  In the preorder (\mathbb{N}, |), where “|” is “divides”, the product is gcd(x, y).

Exercise: in what preorder over \mathbb{R} is the cartesian product the same as multiplication?


A cartesian category is a category with products. (Note the lowercase ‘c:’ you know someone’s famous if things named after them aren’t capitalized. C.f. ‘abelian.’) You can think of the cartesian coordinate system for the plane with its ordered pair of coordinates to remind you that you can make ordered pairs of objects in a cartesian category.

Functors as shadows

Posted in Category theory, Math by Mike Stay on 2007 September 15

The last example in the previous post said that the collection of all algebraic gadgets of a given kind and structure-preserving maps between them forms a category. The example given was the category of rings. It’s also true that a category itself is an algebraic gadget with structure (the ability to compose morphisms); a structure-preserving map between categories is called a functor. A functor F:C\to D between categories C and D maps

  • objects of C to objects of D
  • morphisms of C to morphisms of D

such that identities and composition are preserved.

One way of thinking about a functor F:C\to D from the category C to the category D is as a “copy” or a “shadow” of C in D. For example, recall that graphs and manifolds both give rise to categories. If we let C be the graph of a cube, where

  • objects are the eight vertices and
  • morphisms are paths along the edges of the cube,

    A labelled cube.

and D=\mathbb{R}^2 be the plane, where

  • objects are points of the plane and
  • morphisms are paths in the plane,

then a functor F:C\to D is a two dimensional picture of a cube, a shadow cast by the cube. The functor F takes

  • each vertex to a point in the plane, and
  • each path on the cube to a path in the plane

such that

  • composing paths is associative and
  • identity paths on the cube map to constant paths in the plane.

These last two requirements imply that the paths in the shadow of the cube are generated by the shadows of each edge.

Pictures of a) cube, b) square, c) point, d) triangle.  Each is an image of a functor from the cube to the plane.

Illustrated are the images of four functors from the cube C to the plane D. Figure (a) maps each vertex to a distinct point and each edge to a distinct path. Figure (b) maps the front face of the cube and the back face of the cube to the same square, and maps the edges running from the front to the back to the constant paths at the corners. This is a degenerate functor, i.e. a functor that maps multiple vertices to the same point in \mathbb{R}^2. Figure (c) maps all the points of the cube to the same point and all the edges to the constant path at that point. This functor is totally degenerate, since there’s no way to map to a smaller subset of points in \mathbb{R}^2.

Exercise: define a functor F:C\to D such that figure d) is the image of F.


Posted in Category theory, Math by Mike Stay on 2007 September 12

The last example in the previous post was the monoid consisting of all functions from a set T to itself under composition. We could multiply the elements (i.e. compose them) in any order because the source and the target were the same, the set T.

\displaystyle \stackrel{T}{\mathop{\circlearrowleft}_f}

For arbitrary sets, we still know how to compose, but we have some restrictions on what functions are composable: the target of the first has to match the source of the second. Given f:T\to U and g:U\to V, we can compose them:

\displaystyle T \stackrel{f}{\to} U \stackrel{g}{\to} V.

So where multiplication had to start and end at the same place, composition can wander around.

Just as the essence of a monoid is multiplication, the essence of a category is composition.

A category consists of

  • a collection of objects and
  • for each pair of objects x,y a set {\rm hom}(x,y) of morphisms or arrows from x to y (if f \in {\rm hom}(x,y), we write f:x\to y)

such that

  • for every object x there is a morphism 1_x:x\to x,
  • for every triple of objects x,y,z and pair of morphisms f:x\to y, g:y\to z there is a composite morphism

    g\circ f:x\to z,


  • composition is associative

    (h \circ g) \circ f = h \circ (g \circ f)

  • and obeys the left- and right-unit laws

    f \circ 1_x = f = 1_y \circ f.

One simple structure on which something can wander is a directed graph. It has vertices and directed edges between them. We can’t compose two edges and get a new edge, but we can compose two paths and get a new path. Every directed graph gives rise to a category whose objects are the vertices of the graph and whose morphisms are paths in the graph, including “identity paths” that start and end at the same point without going anywhere in between.

Whereas a graph is discrete, a manifold (like a circle or Euclidean space) is continuous. Any manifold gives rise to a category whose objects are the points of the space and whose morphisms are paths between them.

Here are more examples of categories:

1. The empty category:

  • no objects
  • no morphisms

2. The category “1”:

  • one object T
  • one trivial morphism 1_T:T\to T

3. The category “2”:

  • two objects S, T
  • the trivial morphisms 1_S, 1_T and one nontrivial morphism a:S\to T

4. The category Set of all sets:

  • objects are sets
  • morphisms are functions
  • For example, S = {1,2} and T = {purple, green, yellow} are two objects in Set, and the function f:S\to T, where f(1) ={\rm purple} and f(2)={\rm green}, is a morphism in {\rm hom}(S,T).

5. Any monoid “is” a one-object category:

  • one object T
  • the set of morphisms from T to itself is equipped with extra structure that makes it a monoid. We define the category in terms of the monoid as follows:
    • {\rm hom}(T,T):= S
    • 1_T:=  e(\bullet)
    • b \circ a := m(a,b)

For example, the real numbers under multiplication form a monoid, so we get a category where

  • there’s one object T, and
  • {\rm hom}(T,T) = \mathbb{R}.
  • Given r, s \in \mathbb{R}, we compose them by multiplying them:

    T \stackrel{r}{\to} T \stackrel{s}{\to} T = T \stackrel{sr}{\to} T

Note: T could be anything, as long as it makes sense to multiply it by a real number.

For example, T could be the set of scalings of a picture of a dog. Then a morphism from T to itself would be a function, since functions are the things that go between sets, and the function would be parameterized by a real number r \in \mathbb{R}. We define

f_r(t) = the picture t scaled by a factor r, and

(f_s \circ f_r)(t) = f_s(f_r(t)) = f_{rs}(t).

The identity morphism in this case is

1_T(t) = f_1(t) = t.

Or, T could be the lone dot in a one-dot graph with infinitely many self-loops, each labelled with a real number. We then say the weight of a path on the graph is the product of the loop labels. The identity morphism is the self-loop labelled with the number 1.

6. Any collection of algebraic gadgets and the structure-preserving maps between them. For example, a ring is an algebraic gadget with two monoids that work together:

  • there’s a set S
  • two different special points, or identities: the multiplicative identity e_{\cdot}(\bullet) = 1 and the additive identity e_+(\bullet) = 0
  • two different associative binary operations
    • multiplication:

      (ab)c = a(bc)

    • addition, which is also commutative:

      (a+b)+c = a+(b+c)

  • multiplication distributes over addition:

    a(b+c) = ab + ac

There is a category Ring consisting of all rings and maps between them that preserve the ring structure.

  • objects: rings
  • morphisms: ring homomorphisms
    • A ring homomorphism F:R_1 \to R_2 is a map between rings that preserves the ring structure: identities, multiplication and addition. That is, it doesn’t matter which way you go around the squares below; the answer will be the same either way:

      \begin{array}{ccc}R_1 \times R_1 & \stackrel{F \times F}{\to} & R_2 \times R_2\\ \cdot_1 \downarrow & & \downarrow \cdot_2 \\ R_1 & \stackrel{F}{\to} & R_2\end{array}


      \begin{array}{ccc}R_1 \times R_1 & \stackrel{F \times F}{\to} & R_2 \times R_2\\ +_1 \downarrow & & \downarrow +_2 \\ R_1 & \stackrel{F}{\to} & R_2\end{array}

    • For example, the ring of integers \mathbb{Z} is an object in Ring, as is the ring \mathbb{Z}_2 (where \mathbb{Z}_n denotes integers modulo n). The set {\hom}(\mathbb{Z}, \mathbb{Z}_2) = \{ 0, p \}, where 0 is the ring homomorphism that takes all integers to zero and p is the map that takes each integer to its remainder modulo 2.

      \begin{array}{ccc}(-4, 7) & \stackrel{p \times p}{\mapsto} & (0, 1) \\ \cdot \downarrow & & \downarrow \cdot_2 \\ -28 & \stackrel{p}{\mapsto} & 0\end{array}


      \begin{array}{ccc}(-4, 7) & \stackrel{p \times p}{\mapsto} & (0, 1)\\ + \downarrow & & \downarrow +_2 \\ 3 & \stackrel{p}{\mapsto} & 1\end{array}


Posted in Category theory, Math by Mike Stay on 2007 September 12

A set has no structure. It’s just a collection of things, all of them equally unimportant.

\{ \blacksquare, \blacktriangle, \bigstar \}

Figure 1. A set.


1 = \{ \bullet \}

Figure 2. Another set, the one-element set we’ll call “1.”

A function, or map, “goes between” sets. It has a source set (also called the domain) and a target set (also called the range). To each element of the source set, it assigns an element of the target set. We’ll use the usual shorthand for denoting functions, with the name of the function, a colon, the source set, an arrow, and the target set. Underneath, we’ll write a typical element of the source, an arrow with a small bar at the start, and the element of the target it goes to. For example, consider the function d mapping each integer to its double. The integers are denoted by the symbol \mathbb{Z} (German Zahlen, “numbers”), so we have

\begin{array}{rl} d:\mathbb{Z} & \to \mathbb{Z} \\ x & \mapsto 2x\end{array}

Perhaps the simplest structure we can add to a set is to pick an element of it and make that element important. A pointed set S is

  • a set S equipped with
  • a function e:1\to S, where 1 is the one-element set \{\bullet\}.

The special element of S is given by e(\bullet). The name e should bring to mind the word “element,” since there’s a different choice for what this map should be for each element of S. For example, given the set S=\{ \blacksquare, \blacktriangle, \bigstar \}, there are three possible choices for the function e, corresponding to the three different elements we could pick:

  1. e(\bullet)=\blacksquare
  2. e(\bullet)=\blacktriangle
  3. e(\bullet)=\bigstar

A monoid is a pointed set with a little more structure. Way back in kindergarten, they taught us how to count. We counted fingers and marbles and cookies. We used base 1, unary, tally marks:


Sometimes they had us add things that weren’t the same:

\blacksquare \blacktriangle \bigstar + \clubsuit \blacksquare = \blacksquare \blacktriangle \bigstar \clubsuit \blacksquare.

Note that when we add this way, the order matters:

\blacksquare \blacktriangle \blacksquare \blacktriangle \blacksquare = \blacksquare \blacktriangle \blacksquare + \blacktriangle \blacksquare \ne \blacktriangle \blacksquare + \blacksquare \blacktriangle \blacksquare = \blacktriangle \blacksquare \blacksquare \blacktriangle \blacksquare.

While the number of symbols is the same, the result is not!

It takes a long time to teach kids to abstract the concept of a number that’s independent of the order of the symbols in the result. Since that’s one of the earliest things we learn in math, it’s particularly hard to forget, so when anyone says “addition,” we automatically assume it’s commutative. When we get to sticking things like matrices together, there are a couple of natural ways to do it; the commutative way we call “addition,” and the noncommutative one we call “multiplication.”

The only thing a generic monoid knows how to do is stick things together like a kindergartener who hasn’t abstracted the concept of a number yet. Since (in the general case) that’s noncommutative, we call it “multiplication.” When we learned about multiplication in school, we had to memorize our “times table.” The table had numbers across the top and numbers down the sides. Given a row and a column of the table, we could look up the result of multiplying those two numbers.

A “times table” is a function m:S\times S\to S; the set S\times S cosists of ordered pairs of elements of a pointed set S, a row and a column. The output of the function is the value in that cell of the table. When we were memorizing the times table, the easiest row was multiplying by 1. You just got the same thing back! So one of the elements of S should behave like 1. And the function e tells us which one we should pick!

So that’s one constraint on the table: multiplying by e(\bullet) shouldn’t do anything. We call this “obeying the left-and right-unit laws.” When we use real numbers, we write it like this:


so in our multiplication table, we have

\displaystyle m(e(\bullet), a) = a = m(a, e(\bullet)).

The other constraint on the table is that if you’re multiplying three things, it shouldn’t matter in what order you multiply the pairs. We call this “being associative.” For real numbers, it looks like this:

a(bc) = (ab)c,

so in our multiplication table, we have

\displaystyle m(a, m(b,c)) = m(m(a,b), c).

Now we’re ready for the definition: a monoid consists of a pointed set S, where the special point e(\bullet) is called the “multiplicative identity,” and a times table m that’s associative and obeys the left-and right-unit laws above. Here are several more examples of monoids:

1. The trivial monoid:

  • S=\{1\}
  • e(\bullet)=1
  • m(1,1)=1

2. The free monoid on one element (tally marks under concatenation)

  • S=\{\mbox{(blank)},1,11,111,1111,\ldots\}
  • e(\bullet)= (blank)
  • m(\underbrace{1\ldots 1}_{a},\underbrace{1\ldots 1}_{b}) = \underbrace{1\ldots 1}_{a+b}

3. The whole numbers under addition:

  • S=\{0,1,2,3,\ldots\}
  • e(\bullet)=0
  • m(a,b)=a+b

4. The free monoid on two elements (binary strings under concatenation):

  • S=\{\mbox{(blank)},0,1,00,01,10,11,000,001,\ldots\}
  • e(\bullet)= (blank)
  • m(a,b)=ab, where ab is the string a followed by the string b

5. The reals under multiplication:

  • S=\mathbb{R}
  • e(\bullet)=1
  • m(a,b)=ab
  • Note: the multiplicative group of real numbers excludes 0, because a group has to have inverses. A monoid does not have to have inverses, so we can include it here.

6. Functions from a set T to itself under composition

  • S= all functions of the form a:T\to T
  • e(\bullet) = the identity function
  • m(a,b) = b\circ a, where \circ is composition of functions:

    (b \circ a)(x) = b(a(x)).

So, for example, given the set T=\{0,1\}, we have

  • S contains the four possible functions from T back to itself:
    00) The constant function mapping everything to zero:

    \begin{array}{rl}k_0:T & \to T\\ t & \mapsto 0\end{array}

    01) The identity function:

    \begin{array}{rl}1_T:T & \to T\\ t & \mapsto t\end{array}

    10) The function that toggles the input:

    \begin{array}{rl}NOT:T & \to T\\ t & \mapsto (1-t)\end{array}

    11) The constant function mapping everything to one:

    \begin{array}{rl}k_1:T & \to T\\ t & \mapsto 1\end{array}

  • e(\bullet) = 1_T, the identity function
  • m(a,b) = b \circ a

The identity function is the unit for composition:

(a \circ 1_T)(t) =  a(1_T(t)) = a(t)

(1_T \circ a)(t) = 1_T(a(t)) = a(t)

Exercise: verify that composition is associative.

Negative dimensional objects and groupoid cardinality

Posted in Category theory, Math by Mike Stay on 2007 July 26

I was thinking about some stuff involving fractals and non-positive-real dimension. It’s still a very rough idea, though.

There’s the concept of topological dimension, which is necessarily an integer. It looks like it’s typically the floor of the Minkowski dimension.

One way of talking about iterated function systems is to consider patterns of digits in n-ary strings. For instance, the typical Cantor set consists of those points in the unit interval whose ternary expansion contains only 0’s and 2’s. It’s a single coordinate, but restricted in the values it can take.

We can add the dimension of two vector spaces by taking the categorical product = direct sum \oplus. We can multiply the dimension by taking the tensor product \otimes. We can add and multiply the dimension of two unit hypercubes with the same construction.

It’s clear how to tensor integer-dimension objects; I was looking at how one might tensor fractional-dimension objects. I have an example with the Cantor set that probably generalizes.

Consider the set of points C in the unit interval whose 4-ary expansion consists exclusively of the digits 0 and 3:

----        ----
-  -        -  -


C has dimension \log_4(2)=1/2. C \otimes C is isomorphic to the set of points where pairs of digits in base 4 are either 00 or 33—which is the same as saying that in base 16, each digit is either either 0 or F=15. The dimension of this set is \log_{16}(2)=1/4. So as before, tensoring two objects multiplies their dimension.

Now consider \displaystyle D=\prod_{i=0}^{\infty}C^{\otimes i}

  • C^{\otimes 0} is the unit interval.
  • C^{\otimes 0}\times C^{\otimes 1}= 1\times C is a subset of the unit square, a set of parallel stripes. The dimension of 1\times C is 1+|C|+|C|=1.5
  • C^{\otimes 0}\times C^{\otimes 1}\times C^{\otimes 2}= 1\times C\times C\otimes C is a subset of the unit cube, consisting of parallel copies of the previous set. The dimension of 1\times C\times C\otimes C is 1+|C|+|C|^2=1.75

and so on until

  • \displaystyle D=\prod_{i=0}^{\infty}C^{\otimes i} is a subset of an infinite-dimensional hypercube with dimension \displaystyle \sum_{i=0}^{\infty} |C|^i = 1/(1-|C|) = 2

John Baez introduced something called groupoid cardinality. We can add and multiply finite sets using the disjoint sum and cartesian product. We can “divide” sets by using the weak quotient of a set by a group:

\displaystyle |S//G| = \sum_{[x]} \frac{1}{|\mbox{Aut}(x)|},

where [x] is an equivalence class and x is equivalent to y if they’re in the same orbit.

So consider the set 3=\{a,b,c\} under the action of \mathbb{Z}_2 via reflection:

\leftarrow | \rightarrow

a\, b\, c

That is, 0 in \mathbb{Z}_2 maps each point to itself, while 1 in \mathbb{Z}_2 maps b to itself and swaps a and c. So a and c are in the same orbit and form one equivalence class. The automorphism group of a is only the identity. The orbit of b includes only itself, so |\rm{Aut}(b)| = |\mathbb{Z}_2| = 2. This forms a groupoid: a category where all morphisms are isomorphisms.

Summing over equivalence classes, we get

\displaystyle |3//Z_2| = 1/|\mbox{Aut}(a)| + 1/|\mbox{Aut}(b)| = 1/1 + 1/2 = 3/2.

Is there some way to view the dimension of the Cantor set as arising from some kind of weak quotient?

He also talked about something he calls “structure types”, which is very much like a generating function, but acts on groupoids (with sets as a special case) instead of on numbers. So in the structure type

\displaystyle \frac{a_0}{0!}x^0 + \frac{a_1}{1!} x^1 + \frac{a_2}{2!} x^2 + \ldots,

the term a_n/n! really means a_n//S_n, where a_n is the set of structures you can put on an n-element finite set (like, say, the set of binary trees with n nodes) and S_n is the group of permutations of n objects. The term x^n is the cartesian product of n copies of the groupoid x.

It apparently makes sense to talk about groupoids with negative cardinality as well as fractional cardinality. For example, taking x to be the groupoid 3//Z_2 described above, the cardinality of the infinte groupoid

\displaystyle 1 + x + x^2 + \ldots=\frac{1}{1-x}=\frac{1}{1-3/2}

is -2!

This suggested to me that we could do something similar and construct a negative-dimensional object. If we stick in a square, we get a set

line \times square \times 4-cube \times 8-cube \times 16-cube \times \ldots

This doesn’t make much sense as an isolated set: all we can say is that it’s an infinite-dimensional hypercube. But if we had some way of knowing that the product we were taking was over 2^n-cubes, then we could use the formula above.

Here’s the idea: if restricting the values a digit can take in an n-ary expansion reduces the dimension, then expanding the values ought to increase it. Since this doesn’t make much sense with points in the real number line, let’s move to \mathbb{N}[[x]] instead: let C be the set of Taylor series in x with coefficients in \{0,2\}. We could say C has dimension \log_n(2) when x=1/n. (I still have to work out how the topology changes with x.)

Then we can define a set T of series with, say, dimension 2, and construct an infinite product

\displaystyle \prod_{i=0}^{\infty} T^{\otimes i}.

This object ought to have dimension -1; at least, it will satisfy the property that multiplying the dimension by 2 (by tensoring with a square) and adding one to the dimension (by multiplying by a line) will give the original object: if d is the dimension, then d satisfies 2d+1=d.

The structure type for binary trees evaluated at the trivial one-element groupoid gives an infinite groupoid with cardinality \frac{1+i \sqrt{3}}{2}, so complex cardinalities (and therefore complex-dimensional objects) can also arise.

Can we make this rigorous? Is there some obvious connection to the complex dimensions of fractal strings?

MILL, BMCCs, and dinatural transformations

Posted in Category theory, Math, Quantum by Mike Stay on 2007 February 3

I’m looking at multiplicative intuitionistic linear logic (MILL) right now and figuring out how it’s related to braided monoidal closed categories (BMCCs).

The top and bottom lines in the inference rules of MILL are functors that you get from the definition of BMCCs, so they exist in every BMCC. Given a braided monoidal category C, the turnstile symbol

\displaystyle \vdash:C^{\rm{op}} \times C \to \mathbf{Set}

is the external hom functor (that is, x \vdash y is the set of morphisms from x to y). Inference rules are natural or dinatural transformations.

Let D be a category; then a natural transformation \alpha between two functors

\displaystyle F,G:C\to D

can be seen as a functor from the category (A \times C), where A is the category with two objects labelled F and G and one nontrivial arrow between them, labelled \alpha:

\displaystyle F\xrightarrow{\alpha}G

For every morphism f: x \to y in C we get a commuting square in (A \times C):

\left( X^{X^X} \right\downarrow

\displaystyle \begin{array}{ccc}(F,x) & \xrightarrow{(1_F, f)} & (F,y) \\ \left. (\alpha, 1_x)\right\downarrow & & \left\downarrow (\alpha,1_y)\right. \\ (G,x) & \xrightarrow{(1_G, f)} & (G,y)\end{array}

that maps to a commuting square in D.

\displaystyle \begin{array}{ccc}Fx & \xrightarrow{Ff} & Fy \\ \left. \alpha_x\right\downarrow & & \left\downarrow \alpha_y \right. \\ Gx & \xrightarrow{Gf} & Gy \end{array}

In other words, it assigns to each object x in C a morphism α_x in D such that the square above commutes.

Now consider the case where we want a natural transformation α between two functors

         op    F,G: C   × C -> D.

Given f:x->y, g:s->t, we get a commuting cube in (A × C^op × C) that maps to a commuting cube in D.

                        Gtx -------------------> Gty
                        7|                       7|
                       / |                      / |
                 α_tx /  |G(g,1_x)        α_ty /  |
                     /   |                    /   | G(g,1_y)
                    /    |                   /    |
                   /     V      G(1_s,f)    /     V
                  /     Gsx -------------- /---> Gsy
                 /      7                 /      7
                /      /                 /      /
               /      /                 /      /
              /      /  F(1_t,f)       /      / α_sy
            Ftx -------------------> Fty     /
             |     /                  |     /
             |    /                   |    /
F(g,1_x)   |   / α_sx               | F(g,1_y)
             |  /                     |  /
             | /                      | /
             V/         F(1_s,f)      V/
            Fsx -------------------> Fsy

This is bigger, but still straightforward.

To get a dinatural transformation, we set g:=f and then choose a specific route around the cube so that both of the indices are the same on α.

                         ....................... Gyy
                        ..                       7|
                       . .                      / |
                      .  .                α_yy /  |
                     .   .                    /   | G(f,1_y)
                    .    .                   /    |
                   .     .      G(1_x,f)    /     V
                  .     Gxx -------------- /---> Gxy
                 .      7                 /      .
                .      /                 /      .
               .      /                 /      .
              .      /  F(1_y,f)       /      .
            Fyx -------------------> Fyy     .
             |     /                  .     .
             |    /                   .    .
  F(f,1_x)   |   / α_xx               .   .
             |  /                     .  .
             | /                      . .
             V/                       ..
            Fxx .......................

In other words, a dinatural transformation α: F -> G assigns to each object x a morphism α_xx such that the diagram above commutes.

Dinatural transformations come up when you’re considering two of MILL’s inference rules, namely

                               x ⊢ y   y ⊢ z
   ------- (Identity)   and   --------------- (Cut)
    x ⊢ x                          x ⊢ z

These two have the same symbol appearing on both sides of the turnstile, x in the Identity rule and y in the Cut rule. Setting

   Fxy = * ∈ Set,

where * is the one-element set, and

   Gxy = x ⊢ y ∈ Set,

the Identity rule specifies that given f:x->y we have that f o 1_x = 1_y o f:

                         .......................y ⊢ y
                        ..                       7|
                       . .                      / |
                      .  .                α_yy /  |
                     .   .                    /   | 1_y o f
                    .    .                   /    |
                   .     .      f o 1_x     /     V
                  .    x ⊢ x ------------- /--> x ⊢ y
                 .      7                 /      .
                .      /                 /      .
               .      /                 /      .
              .      /     1           /      .
             * ---------------------> *      .
             |     /                  .     .
             |    /                   .    .
          1  |   / α_xx               .   .
             |  /                     .  .
             | /                      . .
             V/                       ..
             * ........................


   α_xx:*  -> x ⊢ x        * |-> 1_x

picks out the identity morphism on x.

In the Cut rule, we let j:x->s, k:t->z, and f:s->t,

   F(t,s) = (x ⊢ s, t ⊢ z)

                j      f         k
   F(1,f) = (x ---> s ---> t, t ---> z)

                j         f      k
   F(f,1) = (x ---> s, s ---> t ---> z)

   G(s,t) = x ⊢ z

and consider the diagram for a morphism f:s->t in C.

                         .......................x ⊢ z
                        ..                       7|
                       . .                      / |
                      .  .         composition /  |
                     .   .                    /   | 1
                    .    .                   /    |
                   .     .          1       /     V
                  .    x ⊢ z ------------- /--> x ⊢ z
                 .      7                 /      .
                .      /                 /      .
               .      /                 /      .
              .      / F(f,1)          /      .
       (x ⊢ s, t ⊢ z) -------> (x ⊢ s, s ⊢ z)
             |     /                  .     .
             |    /                   .    .
             |   / composition        .   .
     F(1,f)  |  /                     .  .
             | /                      . .
             V/                       ..
       (x ⊢ t, t ⊢ z) .................

which says that composition of morphisms is associative.

Quantum lambda calculus, symmetric monoidal closed categories, and TQFTs

Posted in Category theory, Math, Programming, Quantum by Mike Stay on 2006 August 22

The last post was all about stuff that’s already known. I’ll be working in my thesis on describing the syntax of what I’ll call simply-typed quantum lambda calculus. Symmetric monoidal closed categories (SMCCs) are a generalization of CCCs; they have tensor products instead of products. One of the immediate effects is the lack of built-in projection morphisms \pi_1 and \pi_2 for extracting the first and second element of a pair. And without these, there’s no duplication morphism $\Delta: A\to A\otimes A$; morphisms of this type can exist, but there’s no way to express the notion that it should make a copy of the input! The no-cloning theorem is built in.

The typical CCC in which models of a lambda theory are interpreted is Set. The typical SMCC in which models of a quantum lambda theory will be interpreted is Hilb, the category of Hilbert spaces and linear transformations between them. Models of lambda theories correspond to functors from the CCC arising from the theory into Set. Similarly, models of a quantum lambda theory should be functors from a SMCC to Hilb.

Two-dimensional topological quantum field theories (TQFTs) are close to being a model of a quantum lambda theory, but not quite.

There’s a simpler syntax than the simply-typed lambda calculus, called universal algebra, in which one writes algebraic theories. These give rise to categories with finite products (no exponentials) , and functors from these categories to Set pick out sets with the given structure. There’s an algebraic theory of groups, and the functors pick out sets with functions that behave like unit, inverse, and multiplication. So our “programs” consist solely of operations on our data type.

TQFTs are functors from 2Cob, the theory of a commutative Frobenius algebra, to Hilb. We can look at 2Cob as defining a data type and a TQFT as a quantum implementation of the type. When we take the free SMCC over 2Cob, we ought to get a full-fledged quantum programming language with a commutative Frobenius algebra data type. A TQFT would be part of the implementation of the language.

CCCs and lambda calculus

Posted in Category theory, Math, Programming, Quantum by Mike Stay on 2006 August 22

I finally got it through my thick skull what the connection is between lambda theories and cartesian closed categories.

The lambda theory of a set has a single type, with no extra terms or axioms. This gives rise to a CCC where the objects are generated by products and exponentials of a distinguished object, the free CCC on one object. For example, from a type X we generate objects 1, X, XX, (X^X)^{(X^X)} = X^{XX^X}, etc. The morphisms are reduction-equivalence classes of all the programs we can make out of the appropriately-typed S and K combinators. A product-exponential-preserving functor from this CCC to the category Set (which is also a CCC) picks out a set S_X for X, and because it preserves the structure, maps the product AB to S_A \times S_B and the exponential A^B to {S_A}^{S_B}.

The functor itself, however, can be uncomputable: one could, for example, have S_X be the set of halting programs for some universal Turing machine. This set is only computably enumerable, not computable.

When we have types and axioms involved, then we add structure to the set, constraints that the sets and functions on them have to satisfy. For instance, in the lambda theory of groups, we have:

  • a type G
  • terms
    • m:XX \to X
    • inv:X\to X
    • e:1 \to X
  • axioms for right- and left-unit laws, right-and left-inverses, and associativity

The CCC arising from this theory has all the morphisms from the free CCC on one object and extra morphisms arising from products and compositions of the terms. A structure-preserving functor to Set assigns G to a set and m, inv, and e to functions satisfying the axioms. These functions needn’t be computable, either—they only have to satisfy the group axioms.

So in terminology programmers are more familiar with, the terms and axioms define an abstract data type, an interface. The functor gives a class implementing the interface. But this implementation doesn’t need to be computable! Here’s another example: we start with a lambda theory with a data type \mathbb{N}, along with a term succ:\mathbb{N}\to \mathbb{N} and the axioms of Peano arithmetic; a functor from this lambda theory to Set will give us an implementation of natural numbers. Now we add a term f:\mathbb{N} \to \mathbb{N} to the theory, with no other constraints. One model of this theory is Peano arithmetic with an oracle to \Omega: it assigns f to the function that returns the nth bit of the Omega number for a universal prefix-free Turing machine.

I think that in order to get a computable model, we have to use a “computability functor” (my term). If I’m right, this means that instead of taking a functor directly into Set, we have to take a functor into a CCC with no extra terms to get a “computable theory” (again, my term), and then another from there into Set. This way, since all the morphisms in the category arising from the computable theory are built out of S and K combinators, the functor has to pick an explicit program for the implementation, not just an arbitrary function. From there, whether the implementation of the S and K combinators is computable or not really doesn’t matter; we can’t get at anything uncomputable from within the language.

Now, changing gears and looking at the “programs as proofs” aspect of all this: morphisms in the free CCC on one object are proofs in a minimal intuitionistic logic, where \to now means implication rather than exponentiation. The only two axioms are the ones from S and K. Adding a term of a given type to the theory adds a new axiom to the logic, while adding an axiom to the theory defines an equivalence of proofs in the logic.

A computable theory wouldn’t add any axioms, just assign names to proofs so they can be used as subproofs. And because the programs already satisfy the axioms of the computable theory, asserting the equivalence of two proofs is redundant: they’re already equivalent.