Semantics for the blue calculus

Posted in Category theory, Math, Quantum by Mike Stay on 2011 April 29
Blue calculus MonCat \frac{n \mbox{Cob}_2}{} 2Hilb Set (as a one object bicategory)
Types monoidal categories manifolds 2 Hilbert spaces *
Terms with one free variable monoidal functors manifolds with boundary linear functors sets
Rewrite rules monoidal natural transformation manifolds with corners linear natural transformations functions
Tensor product \boxtimes juxtaposition (disjoint union) tensor product cartesian product
(T | T'):Y where x:X is free ([[T]] \otimes_Y [[T']]):X \to Y formal sum of cobordisms with boundary from X to Y sum of linear functors disjoint union

In the MonCat column, \boxtimes is the categorified version of the tensor product of monoids.

The Sum of Forking Paths

Posted in Borges, Perception, Quantum by Mike Stay on 2010 September 7

Paracelsus threw the rose into the fire; it bent on impact, recoiled, and fell deeper among the logs. In the gold-orange light, the stem was already black. The husk began to shrivel and split as the sap boiled away on its surface. The petals blistered and blackened and fell. The five-fingered star beneath them danced subtly, swaying in the brittle heat. For nearly an hour it lay visibly unchanged save for a gradual loss of hue and a universal greyness, then fell into three large pieces as the log crumbled around it. The ashes glowed orange, then gradually dimmed; the last visible flash of light burst outward from the remains of the stem. Like all light, it carried within it a timepiece.

Once, when the clock read noon, it traveled without hesitation in a straight path to my retina. Once it took another course, only to bend around a molecule of nitrogen and reach the same destination.

Once it traced the signature of a man no one remembers.

Once, at half past three, it decayed into a tiny spark and its abominable opposite image; their mutual horrified fascination drew them together, each a moth in the other’s flame. The last visible flash of light from their fiery consummation was indistinguishable from the one that spawned them.

Once, when the clock ticked in the silence just before dawn, the light decayed into two mirrored worlds, somewhat better than ours due to the fact that I was never born there. Both worlds were consumed by mirrored dragons before collapsing back into the chaos from which they arose; all that remained was an orange flash of light.

Once it traveled to a far distant galaxy, reflected off the waters of a billion worlds and witnessed the death of a thousand stars before returning to the small room in which we sat.

Once it transcribed a short story of Borges in his own cramped scrawl, the six versions he discarded, the corrupt Russian translation by Nabokov, and a version in which the second-to-last word was illegible.

Once it traveled every path, each in its time; once it became and destroyed every possible world. All these summed to form what was: I saw an orange flash, and in that moment, I was enlightened.

An analogy

Posted in Math, Quantum by Mike Stay on 2010 July 24
Stat mech Quant mech
column vector distribution
\displaystyle |p\rangle = \sum_j p_j|j\rangle
amplitude distribution (wave function)
\displaystyle |\psi\rangle = \sum_j \psi_j |j\rangle
row vector population
\displaystyle \langle p| = \sum_j p^*_j \langle j|,
where p_j^* are the coefficients that, when normalized, maximize the inner product with |p\rangle
\displaystyle \langle \psi| = \sum_j \psi_j^* \langle j|,
where \psi_j^* is the complex conjugate of \psi_j.
normalization \displaystyle \sum_j p_j = 1 \displaystyle \sum_j |\psi_j|^2 = 1
transitions stochastic unitary
harmonic oscillator many HOs at temperature T one QHO evolving for time t
uniform distribution over n states \displaystyle |U\rangle = \frac{1}{n}\sum_j |j\rangle \displaystyle |U\rangle = \frac{1}{\sqrt{n}}\sum_j |j\rangle
special distribution Gibbs distribution \displaystyle p(T) = \sum_j e^{-E_j / k_B T} |j\rangle Free evolution \displaystyle \psi(t) = H|U\rangle = \sum_j e^{-E_j \; it/\hbar} |j\rangle
partition function = inner product with \langle U| \displaystyle Z(T) = \langle U|p(T)\rangle = \frac{1}{n}\sum_j e^{-E_j / k_B T} \displaystyle Z(t) = \langle U|\psi(t)\rangle = \frac{1}{\sqrt{n}}\sum_j e^{-E_j \; it/\hbar}
\displaystyle \langle Q\rangle \displaystyle = \frac{1}{Z(T)}\langle U| \; |Qp(T)\rangle \displaystyle = \frac{1}{Z(t)}\langle U| \; |Q\psi(t)\rangle
path integrals E/T = maximum entropy? Et = least action?

Simple activity for teaching about radiometric dating

Posted in General physics, Quantum by Mike Stay on 2010 May 3

This works best with small groups of about 5-10 students and at least thirty dice. Divide the dice evenly among the students.

  1. Count the number of dice held by the students and write it on the board.
  2. Have everyone roll each die once.
  3. Collect all the dice that show a ‘one’, count them, write the sum on the board, then set them aside.
  4. Go back to step 1.

A run with 30 dice will look something like this:

dice number of ones
30 5
25 4
21 4
17 3
14 1
13 3
10 2
8 1
7 1
6 0
6 1
5 0
5 1
4 1
3 0
3 0
3 0
3 1
2 1
1 0
1 0
1 0
1 0
1 1

Point out how the number of dice rolling a one on each turn is about one sixth of the dice that hadn’t yet rolled a one on the previous turn. Also, that you lose about half of the remaining dice after about four turns.

Send someone out of the room; do either four or eight turns, then bring them back and ask them to guess how many turns the group took. The student should be able to see that if half the dice are left, there were only four turns, but if a quarter of the dice are left, there were eight turns.

If the students are advanced enough to use logarithms, try the above with some number other than four or eight and have the student use logarithms to calculate the number of turns:

turns = log(number remaining/total) / log(5/6),

or, equivalently, in terms of the half-life (which is really closer to 3.8 than 4):

turns = 3.8 * log(number remaining/total) / log(1/2).

When Zircon crystals form, they strongly reject lead atoms: new zircon crystals have no lead in them. They easily accept uranium atoms. Each die represents a uranium atom, and rolling a one represents decaying into a lead atom: because uranium atoms are radioactive, they can lose bits of their nucleus and turn into lead–but only randomly, like rolling a die. Instead of four turns, the half-life of U238 is 4.5 billion years.

Zircon forms in almost all rocks and is hard to break down. So to judge the age of a rock, you get the zircon out, throw it in a mass spectrometer, look at the proportion of uranium to (lead plus uranium) and calculate

years = 4.5 billion * log(mass of uranium/mass of (lead+uranium)) / log(1/2).

Problem: given a zircon crystal where there’s one lead atom for every ninety-nine uranium atoms, how long ago was it formed?

4.5 billion * log(99/100) / log(1/2) = 65 million years ago.

In reality, it’s slightly more complicated: there are two isotopes of uranium and several of lead. But this is a good thing, since we know the half-lives of both isotopes and can use them to cross-check each other; it’s as though each student had both six- and twenty-sided dice, and the student guessing the number of turns could use information from both groups to refine her guess.


Posted in Borges, Fun links, General physics, Perception, Quantum by Mike Stay on 2010 April 27

Lazulinos are quasiparticles in a naturally occurring Bose-Einstein condensate first described in 1977 by the Scottish physicist Alexander Craigie while at the University of Lahore [3]. The quasiparticles are weakly bound by an interaction for which neither the position nor number operator commutes with the Hamiltonian. A measurement of a lazulino’s position will cause the condensate to go into a superposition of number states, and a subsequent measurement of the population will return a random number; also, counting the lazulinos at two different times will likely give different results.

Their name derives from the stone lapis lazuli and means, roughly, “little blue stone”. Lazulinos are so named because even though the crystals in which they arise absorb visible light, and would otherwise be jet black, they lose energy through surface plasmons in the form of near-ultraviolet photons, with visible peaks at 380, 402, and 417nm. Optical interference imparts a “laser speckle” quality to the emitted light; Craigie described the effect in a famously poetic way: “Their colour is the blue that we are permitted to see only in our dreams”. What makes lazulinos particularly interesting is that they are massive and macroscopic. Since the number operator does not commute with the Hamiltonian, lazulinos themselves do not have a well-defined mass; if the population is N, then the mass of any particular lazulino is m/N, where m is the total mass of the condensate.

In a recent follow-up to the “quantum mirage” experiment [2], Don Eigler’s group at IBM used a scanning tunneling microscope to implement “quantum mancala”—picking up the lazulino ‘stones’ in a particular location usually changes the number of stones, so the strategy for winning becomes much more complicated. In order to pick up a fixed number of stones, you must choose a superposition of locations [1].

  1. C.P. Lutz and D.M. Eigler, “Quantum Mancala: Manipulating Lazulino Condensates,” Nature 465, 132 (2010).
  2. H.C. Manoharan, C.P. Lutz and D.M. Eigler, “Quantum Mirages: The Coherent Projection of Electronic Structure,” Nature 403, 512 (2000). Images available at
  3. A. Craigie, “Surface plasmons in cobalt-doped Y3Al5O12,” Phys. Rev. D 15 (1977). Also available at


Posted in Category theory, Math, Quantum by Mike Stay on 2010 April 11

Coends are a categorified version of “summing over repeated indices”. We do that when we’re computing the trace of a matrix and when we’re multiplying two matrices. It’s categorified because we’re summing over a bunch of sets instead of a bunch of numbers.

Let C be a small category. The functor \mbox{hom}:C^{\mbox{op}} \times C \to Set assigns

  • to each pair of objects the set of morphisms between them, and
  • to each pair of morphisms (f:c \to c', h:d\to d') a function that takes a morphism g \in \mbox{hom}(c', d) and returns the composite morphism h \circ g \circ f \in \mbox{hom}(c, d'), where c, c', d, d' \in \mbox{Ob}(C).

It turns out that given any functor S:C^{\mbox{op}} \times D \to \mbox{Set}, we can make a new category where C and D are subcategories and S is actually the hom functor; some keywords for more information on this are “collages” and “Artin glueing”. So we can also think of S as assigning

  • to each pair of objects a set of morphisms between them, and
  • to each pair of morphisms (f:c \to c', h:d\to d') a function that takes a morphism g \in S(c', d) and returns the composite morphism h \circ g \circ f \in S(c, d'), where c,c' \in \mbox{Ob}(C) and d,d' \in \mbox{Ob}(D).

We can think of these functors as adjacency matrices, where the two parameters are the row and column, except that instead of counting the number of paths, we’re taking the set of paths. So S is kind of like a matrix whose elements are sets, and we want to do something like sum the diagonals.

The coend of S is the coequalizer of the diagram

\begin{array}{c}\displaystyle \coprod_{f:c \to c'} S(c', c) \\ \\ \displaystyle S(f, c) \downarrow \quad \quad \downarrow S(c', f) \\ \\ \displaystyle \coprod_c S(c, c) \end{array}

The top set consists of all the pairs where

  • the first element is a morphism f \in \mbox{hom}(c, c') and
  • the second element is a morphism g \in S(c', c).

The bottom set is the set of all the endomorphisms in S.

The coequalizer of the diagram, the coend of S, is the bottom set modulo a relation. Starting at the top with a pair (f, g), the two arrows give the relation

\displaystyle c \stackrel{f}{\to} c' \stackrel{g}{\multimap} c \stackrel{c}{\to} c \quad \sim \quad c' \stackrel{c'}{\to} c' \stackrel{g}{\multimap} c \stackrel{f}{\to} c',

where I’m using the lollipop to mean a morphism from S.

So this says take all the endomorphisms that can be chopped up into a morphism f from \mbox{hom} going one way and a g from S going the other, and then set fg \sim gf. For this to make any sense, it has to identify any two objects related by such a pair. So it’s summing over all the endomorphisms of these equivalence classes.

To get the trace of the hom functor, use S = \mbox{hom} in the analysis above and replace the lollipop with a real arrow. If that category is just a group, this is the set of conjugacy classes. If that category is a preorder, then we’re computing the set of isomorphism classes.

The coend is also used when “multiplying matrices”. Let S(c', c) = T(b, c) \times U(c', d). Then the top set consists of triples (f: c\to c',\quad g:b \multimap c,\quad h:c' \multimap d), the bottom set of pairs (g:b \multimap c, \quad h:c \multimap d), and the coend is the bottom set modulo

(\displaystyle b \stackrel{g}{\multimap} c \stackrel{c}{\to} c, \quad c \stackrel{f}{\to} c' \stackrel{h}{\multimap} d) \quad \sim \quad  (\displaystyle b \stackrel{g}{\multimap} c \stackrel{f}{\to} c', \quad c' \stackrel{c'}{\to} c' \stackrel{h}{\multimap} d)

That is, it doesn’t matter if you think of f as connected to g or to h; the connection is associative, so you can go all the way from b to d.

Notice here how a morphism can turn “inside out”: when f and the identities surround a morphism in S, it’s the same as being surrounded by morphisms in T and U; this is the difference between a trace, where we’re repeating indices on the same matrix, and matrix multiplication, where we’re repeating the column of the first matrix in the row of the second matrix.

Renormalization and Computation 4

Posted in Category theory, Math, Programming, Quantum by Mike Stay on 2009 October 15

This is the fourth in a series of posts on Yuri Manin’s pair of papers. In the previous posts, I laid out the background; this time I’ll actually get around to his result.

A homomorphism from the Hopf algebra into a target algebra is called a character. The functor that assigns an action to a path, whether classical or quantum, is a character. In the classical case, it’s into the rig (\mathbb{R} \cup \{\infty\}, \min, \infty, +, 0), and we take an infimum over paths; in the quantum it’s into the rig (\mathbb{C}, +, 0, \cdot, 1), and we take an integral over paths. Moving from the quantum to the classical case is called Maslov dequantization.

Manin mentions that the runtime of a parallel program is a character akin to the classical action, with the runtime of the composition of two programs being the sum of the respective runtimes, while the runtime of two parallel programs is the maximum of the two. A similar result holds for nearly any cost function. He also points out that computably enumerable reals \mathbb{R}^{c.e.} form a rig (\mathbb{R}^{c.e.}\cup \{\infty\}, \max, \infty, +, 0). He examines Rota-Baxter operators as a way to generalize what “polar part” means and extend the theorems on Hopf algebra renormalization to such rigs.

In the second paper, he looks at my work with Calude as an example of a character. He uses our same argument to show that lots of measures of program behavior have the property that if the measure hasn’t stopped growing after reaching a certain large amount with respect to the program size, then the density of finite values the measure could take decreases like 1/\log(t). Surprisingly, though he referred to these results as cutoffs, he didn’t actually use them anywhere for doing regularization.

Reading between the lines, he might be suggesting something like approximating the Kolmogorov complexity that he uses later by using a time cutoff, motivated by results from our paper: there’s a constant c depending only on the programming language such that if you run the nth program cn^2 steps and it hasn’t stopped, then the density of times near t > cn^2 at which it could stop is roughly 1/\log(t).

Levin suggested using a computable complexity that’s the sum of the program length and the log of the number of time steps; I suppose you could “regularize” the Kolmogorov complexity by adding \Lambda \log(t) to the length of the program, renormalize, and then let \Lambda go to zero, but that’s not something Manin does.

Instead, he proposed two other constructions suitable for renormalization; here’s the simplest. Given a partial computable function f:\mathbb{Z}^+\to \mathbb{Z}^+, define the computably enumerable \bar{f}:\mathbb{N}\to\mathbb{N} by \bar{f}(k) = f(k) if f(k) is defined, and 0 otherwise. Now define

\displaystyle \Psi(k,f;z) = \sum_{n=0}^{\infty} \frac{z^n}{\left(1+n\bar{f}(k)\right)^2}.

When f(k) is undefined, \Psi(k,f;z) = 1/(1-z), which has a pole at z=1. When f(k) is defined, \Psi(k,f;z) converges everywhere except z=\infty. Birkhoff decomposition would separate these two cases, though I’m not sure what value \Psi_+(f,k;1) would take or what it would mean.

The other construction involves turning \bar{f} into a permutation (x,y) \mapsto (x+\bar{f}(y),y), and inventing a function that has poles when the permutation has fixpoints.

So Manin’s idea of renormalizing the halting problem is to do some uncomputable stuff to get an easy-to-renormalize function and then throw the Birkhoff decomposition at it; since we know the halting problem is undecidable, perhaps the fact that he didn’t come up with a new technique for extracting information about the problem is unsurprising, but after putting in so much effort to understand it, I was left rather disappointed: if you’re going to allow yourself to do uncomputable things, why not just solve the halting problem directly?

I must suppose that his intent was not to tackle this hard problem, but simply to play with the analogy he’d noticed; it’s what I’ve done in other papers. And being forced to learn renormalization was exhilarating! I have a bunch of ideas to follow up; I’ll write them up as I get a chance.

Renormalization and Computation 2

Posted in General physics, Math, Quantum by Mike Stay on 2009 October 10

This is the second in a series of posts covering Yuri Manin’s ideas involving Hopf algebra renormalization of the Halting problem. Last time I showed how perturbing a quantum harmonic oscillator gave a sum over integrals involving n interactions with the perturbation; we can keep track of the integrals using Feynman diagrams, though in the case of a single QHO they weren’t very interesting.

One point about the QHO needs emphasis at this point. Given a wavefunction \psi = \sum_{n=0}^{\infty} \psi_n |n\rangle describing the state of the QHO, it must be the case that we get some value when we measure the energy; so if we sum up the norms of the probability amplitudes, we should get unity:

\displaystyle \langle \psi|\psi \rangle = \sum_{n=0}^{\infty} \langle n | \psi_n^* \psi_n | n \rangle = \sum_{n=0}^{\infty} |\psi_n|^2 = 1.

This is called the normalization condition.

When we perturb the QHO, the states |n\rangle are no longer the energy eigenvectors of the new Hamiltonian. We can express the new eigenvectors |m\rangle in terms of the old ones:

\displaystyle |m\rangle = \sum_{n=0}^{\infty}\lambda^n m_n|n\rangle,

where \lambda is the strength of the perturbation, and we reexpress our wavefunction in this new basis:

\displaystyle \psi = \sum_{m=0}^{\infty} \psi'_m |m\rangle

Since we’re working with a new set of coefficients, we have to make sure they sum up to unity, too:

\displaystyle \sum_{m=0}^{\infty} |\psi'_m|^2 = 1.

This is the renormalization condition. So renormalization is about making sure things sum up right once you perturb the system.

I want to talk about renormalization in quantum field theory; the trouble is, I don’t actually know quantum field theory, so I’ll just be writing up what little I’ve gathered from reading various things and conversations with Dr. Baez. I’ve likely got some things wrong, so please let me know and I’ll fix them.

A field is a function defined on spacetime. Scalar fields are functions with a single output, whereas vector fields are functions with several outputs. The electromagnetic field assigns a single number, called the electric field, and a vector, called the magnetic field, to every point in spacetime. When you have two electrons and move one of them, it feels a reaction force and loses momentum; the other electron doesn’t move until the influence, traveling at the speed of light, reaches it. Conservation of momentum says that the momentum has to be somewhere; it’s useful to consider it to be in the electromagnetic field.

When you take the Fourier transform of the field, you get a function that assigns values to harmonics of the field; in the case of electromagnetism, the transformed field \phi assigns a value to each color k of light. Quantizing this transformed field amounts to making \phi(k) into a creation operator, just like z in the QHO example from last time. So we have a continuum of QHOs, each indexed by a color k. (By the way—the zero-dimensional Fourier transform is the identity function, so the QHO example from last time can be thought of both as the field at the unique point in spacetime and the field at the unique frequency.)

When we move to positive-dimensional fields, we get more interesting pictures, like these from quantum electrodynamics:
Feynman diagrams
Here, our coupling constant is the fine structure constant \alpha = e^2/\hbar c, where e is the charge of the electron. For each vertex, we write down our coupling constant times -i times a delta function saying that the incoming momentum minus the outgoing momentum equals zero. For each internal line, we write down a propagator—a function representing the transfer of momentum from one point to another; it’s a function of the four-momentum k—and multiply all this stuff together. Then we integrate over all four-momenta and get something that looks like

\displaystyle \int_0^\infty f(k) d^4 k.

The trouble is, this integral usually gives infinity for an answer. We try to work around this in two steps: first, we regularize the integral by introducing a length scale \Lambda. This represents the point at which gravity starts being important and we need to move to a more fundamental theory. In the quantum field theory of magnetic domains in iron crystals, the length scale is the inter-atom distance in the lattice. Regularization makes the integral finite for \Lambda away from some singularity.

There are a few different ways of regularizing; one is to use \Lambda as a momentum cutoff:

\displaystyle \int_0^\Lambda f(k) d^4 k.

This obviously converges, and solutions to this are always a sum of three parts:

  • The first part diverges as \Lambda \to \infty, either logarithmically or as a power of \Lambda.
  • The second part is finite and independent of \Lambda.
  • The third part vanishes as \Lambda \to \infty.

Renormalization in this case amounts to getting rid of the first part.

These three parts represent three different length scales: at lengths larger than \Lambda, all quantum or statistical fluctuations are negligible, and we can use the mean field approximation and do classical physics. At lengths between \Lambda and 1/\Lambda, we use QFT to calculate what’s going on. Finally, at lengths smaller than 1/\Lambda, we need a new theory to describe what’s going on. In the case of QED, the new theory is quantum gravity; string theory and loop quantum gravity are the serious contenders for the correct theory.

The problem with this regularization scheme is that it doesn’t preserve gauge invariance, so usually physicists use another regularization scheme, called “dimensional regularization”. Here, we compute

\displaystyle \int_0^\infty f(k) d^\Lambda k

which gives us an expression involving gamma functions of \Lambda, where gamma is the continuous factorial function, and then we set \Lambda = 4 - \epsilon. The solutions to this are also a sum of three terms—a divergent part, a finite part, and a vanishing part—and then renormalization gets rid of the divergent part.

Assume we have some theory with a single free parameter g. We’d like to calculate a function F(x) perturbatively in terms of g, where F represents some physical quantity, and we know F(x_0) = g'. We assume F takes the form

\displaystyle F(x) = g + g^2 F_2(x) + g^3 F_3(x) + \cdots

and assume that this definition gives us divergent integrals for the F_n. The first step is regularization: instead of F we have a new function

\displaystyle F_\Lambda(x) = g + g^2 F_{2,\Lambda}(x) + g^3 F_{3,\Lambda}(x) + \cdots

Now we get to the business of renormalization! We solve this problem at each order; if the theory is renormalizable, knowing the solution at the previous order will give us a constraint for the next order, and we can subtract off all the divergent terms in a consistent way:

  1. Order g.

    Here, F_\Lambda(x) = g + O(g^2). Since it’s a constant, it has to match F(x_0) = g', so g = g' + O(g'^2). In this approximation, the coupling constant takes the classical value.

  2. Order g^2.

    Let g = g' + G_2(g') + G_3(g') + \cdots, where G_n(g') \sim O(g'^n). Plugging this into the definition of F_\Lambda, we get

    \displaystyle F_\Lambda(x) = g' + G_2(g') + g'^2 F_{2,\Lambda}(x) + O(g'^3).

    Using F(x_0) = g', we get G_2(g') = -g'^2F_{2,\Lambda}(x_0), which diverges as \Lambda \to \infty. In the case of QED, this says that the charge on the electron is infinite. While the preferred interpretation these days is that quantum gravity is a more fundamental theory that takes precedence on very small scales (a Planck length is to a proton as a proton is to a meter), when the theory was first introduced, there was no reason to think that we’d need another theory. So the interpretation was that with an infinite charge, an electron would be able to extract an infinite amount of energy from the electromagnetic field. Then the uncertainty principle would create virtual particles of all energies, which would exist for a time inversely proportional to the energy. The particles can be charged, so they line up with the field and dampen the strength just like dielectrics. In this interpretation, the charge on the electron depends on the energy of the particles you’re probing it with.

    So to second order,

    \displaystyle F_\Lambda(x) = g' + g'^2\left(F_{2,\Lambda}(x) - F_{2,\Lambda}(x_0)\right) + O(g'^3).

    A theory is therefore only renormalizable if the divergent part of F_{2,\Lambda}(x) is independent of x. In QED it is. We can now define F(x) as the limit

    \displaystyle F(x) = \lim_{\Lambda \to \infty} F_\Lambda(x).

  3. Higher orders.

    In a renormalizable theory, the process continues, with the counterterms entirely specified by knowing F(x_0).

Renormalization and Computation 1

Posted in Category theory, Math, Programming, Quantum by Mike Stay on 2009 October 7

Yuri Manin recently put two papers on the arxiv applying the methods of renormalization to computation and the Halting problem. Grigori Mints invited me to speak on Manin’s results at the weekly Stanford logic seminar because in his second paper, he expands on some of my work.

In these next few posts, I’m going to cover the idea of Feynman diagrams (mostly taken from the lecture notes for the spring 2004 session of John Baez’s Quantum Gravity seminar); next I’ll talk about renormalization (mostly taken from Andrew Blechman’s overview and B. Delamotte’s “hint”); third, I’ll look at the Hopf algebra approach to renormalization (mostly taken from this post by Urs Schreiber on the n-Category Café); and finally I’ll explain how Manin applies this to computation by exploiting the fact that Feynman diagrams and lambda calculus are both examples of symmetric monoidal closed categories (which John Baez and I tried to make easy to understand in our Rosetta stone paper), together with some results on the density of halting times from my paper “Most programs stop quickly or never halt” with Cris Calude. I doubt all of this will make it into the talk, but writing it up will make it clearer for me.

Renormalization is a technique for dealing with the divergent integrals that arise in quantum field theory. The quantum harmonic oscillator is quantum field theory in 0+1 dimensions—it describes what quantum field theory would be like if space consisted of a single point. It doesn’t need renormalization, but I’m going to talk about it first because it introduces the notion of a Feynman diagram.

“Harmonic oscillator” is a fancy name for a rock on a spring. The force exerted by a spring is proportional to how far you stretch it:

\displaystyle F = kx.

The potential energy stored in a stretched spring is the integral of that:

\displaystyle V_0 = \frac{1}{2}kx^2 + C,

and to make things work out nicely, we’re going to choose C = -1/2. The total energy H_0 is the sum of the potential and the kinetic energy:

\displaystyle H_0 = V_0 + T = \frac{1}{2}kx^2 + \frac{1}{2}mv^2 - \frac{1}{2}.

By choosing units so that k = m = 1, we get

\displaystyle H_0 = \frac{x^2}{2} + \frac{p^2}{2} - \frac{1}{2},

where p is momentum.

Next we quantize, getting a quantum harmonic oscillator, or QHO. We set p = -i \frac{\partial}{\partial x}, taking units where \hbar = 1. Now

\begin{array}{rl}\displaystyle [x, p]x^n & \displaystyle = xp - px \\ & = (- x i \frac{\partial}{\partial x} + i \frac{\partial}{\partial x} x)x^n \\\ & \displaystyle = -i(nx^n - (n+1)x^n) \\ & \displaystyle = ix^n.\end{array}

If we define a new observable z = \frac{p + ix}{\sqrt{2}}, then

\begin{array}{rl} \displaystyle z z^* & \displaystyle = \frac{(p + ix)}{\sqrt{2}} \frac{(p - ix)}{\sqrt{2}} \\ & = \frac{1}{2}(p^2 + i(xp - px) + x^2) \\ & = \frac{1}{2}(p^2 -1 + x^2) \\ & = H_0.\end{array}

We can think of z^* as \frac{d}{dz} and write the energy eigenvectors as polynomials in z:

\displaystyle H_0 z^n = z \frac{d}{dz} z^n = n z^n.

The creation operator z adds a photon to the mix; there’s only one way to do that, so z\cdot z^n = 1 z^{n+1}. The annihilation operator \frac{d}{dz} destroys one of the photons; in the state z^n, there are n photons to choose from, so \frac{d}{dz} z^n = n z^{n-1}.

Schrödinger’s equation says i \frac{d}{dt} \psi = H_0 \psi, so

\displaystyle \psi(t) = \sum_{n=0}^{\infty} e^{-itn} a_n z^n.

This way of representing the state of a QHO is known as the “Fock basis”.

Now suppose that we don’t have the ideal system, that the quadratic potential V_0 = \frac{1}{2}kx^2 - \frac{1}{2} is only a good local approximation to the real potential V_0 + \lambda V. Then we can write the total as H = H_0 + \lambda V, where V is a function of position and momentum, or equivalently of z and \frac{d}{dz}, and \lambda is small.

Now we solve Schrödinger’s equation perturbatively. We know that

\displaystyle \psi(t) = e^{-itH} \psi(0),

and we assume that

\displaystyle e^{-itH}\psi(t) \approx e^{-itH_0} \psi(t)

so that it makes sense to solve it perturbatively. Define

\displaystyle \psi_1(t) = e^{itH_0} e^{-itH}\psi(t)


\displaystyle V_1(t) = e^{itH_0} \lambda V e^{-itH_0}.

After a little work, we find that

\displaystyle \frac{d}{dt}\psi_1(t) = -i V_1(t) \psi_1(t),

and integrating, we get

\displaystyle \psi_1(t) = -i\int_0^t V_1(t_0) \psi_1(t_0) dt_0 + \psi(0).

We feed this equation back into itself recursively to get

\begin{array}{rl}\displaystyle \psi_1(t) & \displaystyle = -i \int_0^t V_1(t_0) \left[-i\int_0^{t_0} V_1(t_1) \psi_1(t_1) dt_1 + \psi(0) \right] dt_0 + \psi(0) \\ & \displaystyle = \left[\psi(0)\right] + \left[\int_0^t i^{-1} V_1(t_0)\psi(0) dt_0\right] + \left[\int_0^t\int_0^{t_0} i^{-2} V_1(t_0)V_1(t_1) \psi_1(t_1) dt_1 dt_0\right] \\ & \displaystyle = \sum_{n=0}^{\infty} \int_{t \ge t_0 \ge \ldots \ge t_{n-1} \ge 0} i^{-n} V_1(t_0)\cdots V_1(t_{n-1}) \psi(0) dt_{n-1}\cdots dt_0 \\ & \displaystyle = \sum_{n=0}^{\infty} (-\lambda i)^n \int_{t \ge t_0 \ge \ldots \ge t_{n-1} \ge 0} e^{-i(t-t_0)H_0} V e^{-i(t_0-t_1)H_0} V \cdots V e^{-i(t_{n-1}-0)H_0} \psi(0) dt_{n-1}\cdots dt_0.\end{array}

So here we have a sum of a bunch of terms; the nth term involves n interactions with the potential interspersed with evolving freely between the interactions, and we integrate over all possible times at which those interactions could occur.

Here’s an example Feynman diagram for this simple system, representing the fourth term in the sum above:

Three interactions with the perturbation.

The lines represent evolving under the free Hamiltonian H_0, while the dots are interactions with the potential V.

As an example, let’s consider V = (z + \frac{d}{dz}) and choose \lambda = \frac{1}{\sqrt{2}} so that \lambda V = p. When V acts on a state \psi = z^n, we get V \psi = z^{n+1} + nz^{n-1}. So at each interaction, the system either gains a photon or changes phase and loses a photon.

A particle moving in a quadratic potential in n-dimensional space gives the tensor product of n QHOs, which is QFT in a space where there are n possible harmonics. Quantum electrodynamics (QED) amounts to considering infinitely many QHOs, one for each possible energy-momentum, which forms a continuum. The diagrams for QED start to look more familiar:
Feynman diagrams
The vertices are interactions with the electromagnetic field. The straight lines are electrons and the wiggly ones are photons; between interactions, they propagate under the free Hamiltonian.

My talk at Perimeter Institute

Posted in Category theory, General physics, Math, Quantum by Mike Stay on 2009 June 11

I spent last week at the Perimeter Institute, a Canadian institute founded by Mike Lazaridis (CEO of RIM, maker of the BlackBerry) that sponsors research in cosmology, particle physics, quantum foundations, quantum gravity, quantum information theory, and superstring theory. The conference, Categories, Quanta, Concepts, was organized by Bob Coecke and Andreas Döring. There were lots of great talks, all of which can be found online, and lots of good discussion and presentations, which unfortunately can’t. (But see Jeff Morton’s comments.) My talk was on the Rosetta Stone paper I co-authored with Dr. Baez.

The partition function and Wick rotation

Posted in Math, Quantum by Mike Stay on 2007 September 6

I was trying to understand Wick rotation by applying it in the case of a finite-dimensional Hilbert space, and came up with something strange. The way I’ve worked it out, it seems to map classical observables to quantum states! I’ve never heard anything like that before.

Say we have an n-qubit Hilbert space X=(\mathbb{C}^2)^{\otimes n}. This has the set of n-bit binary strings |0\ldots 00\rangle, |0\ldots 01\rangle, |0\ldots 10\rangle, \ldots |1\ldots 11\rangle as a basis. For brevity’s sake, I’ll write these as |0\rangle, |1\rangle, \ldots |2^n-1\rangle. Let

\displaystyle H = \sum_{j=0}^{2^n-1} E_j |j\rangle\langle j|,

where the E_j are real.

Now define the operators

\displaystyle V=\frac{1}{\sqrt{2}}\left(\begin{array}{cc}1&1\\1&-1\end{array}\right)

and W=V^{\otimes n}. V is the discrete Fourier transform on a 2-dimensional space; W is called the Walsh-Hadamard transform. W defines a conjugate basis to the qubit basis. The most important property of W for my purposes here is the fact that

\displaystyle |\phi\rangle := W|0\rangle = N_\phi\sum_{j=0}^{2^n-1} |j\rangle,

where the normalizing factor N_\phi = 2^{-n/2}. Modulo the normalizing factor, this is a sum over all possible states.

  • What’s the probability amplitude that when you start in the state |\phi\rangle and evolve according to H, the system will still be in the state |\phi\rangle ?

    \displaystyle \langle \phi|e^{-iHt}|\phi\rangle = \langle 0|W \; e^{-iHt} \; W|0\rangle = N_\phi^2 \sum_{j=0}^{2^n-1} \langle j| \; e^{-iHt} \; |j\rangle \displaystyle = N_\phi^2 \sum_{j=0}^{2^n-1} e^{-iE_jt}.

Except for a factor of i in the exponent and some normalization, this is the partition function for H. It’s been “Wick rotated.”

  • What’s the probability amplitude that when you start in the state |\phi\rangle and evolve according to H, the system will move to the arbitrary state \displaystyle |\psi\rangle ? Well,

    \displaystyle |\psi\rangle = N_\psi \sum_{j=0}^{2^n-1} \psi_j |j\rangle,

    where each \psi_j is an arbitrary complex number and N_\psi is a normalizing factor. So

    \displaystyle \langle \psi| \; e^{-iHt} \; |\phi\rangle = N_\psi N_\phi \sum_{j=0}^{2^n-1} \langle j|\psi_j \; e^{-iHt} \; |j\rangle \displaystyle = N_\psi N_\phi  \sum_{j=0}^{2^n-1} \psi_j e^{-iE_jt}.

If we divide this by the quantity above, we get the expectation value of a classical observable \psi at “temperature” 1/it:

\displaystyle \frac{N_\psi N_\phi \sum\limits_{j=0}^{2^n-1} \psi_j e^{-iE_jt}}{N_\phi N_\phi \sum\limits_{j=0}^{2^n-1} e^{-iE_jt}} = \frac{N_\psi}{N_\phi} \langle\psi\rangle.


This mapping from classical to quantum is not quantization. That maps classical observables to Hermetian operators, not to states—although, one might hit the state with the “Currying” isomorphism between states and linear transformations and get something useful.

I’m trying to work out how to connect this to a sum over paths instead of a sum over states; there’s some interesting stuff there, but I haven’t grokked it yet.

MILL, BMCCs, and dinatural transformations

Posted in Category theory, Math, Quantum by Mike Stay on 2007 February 3

I’m looking at multiplicative intuitionistic linear logic (MILL) right now and figuring out how it’s related to braided monoidal closed categories (BMCCs).

The top and bottom lines in the inference rules of MILL are functors that you get from the definition of BMCCs, so they exist in every BMCC. Given a braided monoidal category C, the turnstile symbol

\displaystyle \vdash:C^{\rm{op}} \times C \to \mathbf{Set}

is the external hom functor (that is, x \vdash y is the set of morphisms from x to y). Inference rules are natural or dinatural transformations.

Let D be a category; then a natural transformation \alpha between two functors

\displaystyle F,G:C\to D

can be seen as a functor from the category (A \times C), where A is the category with two objects labelled F and G and one nontrivial arrow between them, labelled \alpha:

\displaystyle F\xrightarrow{\alpha}G

For every morphism f: x \to y in C we get a commuting square in (A \times C):

\left( X^{X^X} \right\downarrow

\displaystyle \begin{array}{ccc}(F,x) & \xrightarrow{(1_F, f)} & (F,y) \\ \left. (\alpha, 1_x)\right\downarrow & & \left\downarrow (\alpha,1_y)\right. \\ (G,x) & \xrightarrow{(1_G, f)} & (G,y)\end{array}

that maps to a commuting square in D.

\displaystyle \begin{array}{ccc}Fx & \xrightarrow{Ff} & Fy \\ \left. \alpha_x\right\downarrow & & \left\downarrow \alpha_y \right. \\ Gx & \xrightarrow{Gf} & Gy \end{array}

In other words, it assigns to each object x in C a morphism α_x in D such that the square above commutes.

Now consider the case where we want a natural transformation α between two functors

         op    F,G: C   × C -> D.

Given f:x->y, g:s->t, we get a commuting cube in (A × C^op × C) that maps to a commuting cube in D.

                        Gtx -------------------> Gty
                        7|                       7|
                       / |                      / |
                 α_tx /  |G(g,1_x)        α_ty /  |
                     /   |                    /   | G(g,1_y)
                    /    |                   /    |
                   /     V      G(1_s,f)    /     V
                  /     Gsx -------------- /---> Gsy
                 /      7                 /      7
                /      /                 /      /
               /      /                 /      /
              /      /  F(1_t,f)       /      / α_sy
            Ftx -------------------> Fty     /
             |     /                  |     /
             |    /                   |    /
F(g,1_x)   |   / α_sx               | F(g,1_y)
             |  /                     |  /
             | /                      | /
             V/         F(1_s,f)      V/
            Fsx -------------------> Fsy

This is bigger, but still straightforward.

To get a dinatural transformation, we set g:=f and then choose a specific route around the cube so that both of the indices are the same on α.

                         ....................... Gyy
                        ..                       7|
                       . .                      / |
                      .  .                α_yy /  |
                     .   .                    /   | G(f,1_y)
                    .    .                   /    |
                   .     .      G(1_x,f)    /     V
                  .     Gxx -------------- /---> Gxy
                 .      7                 /      .
                .      /                 /      .
               .      /                 /      .
              .      /  F(1_y,f)       /      .
            Fyx -------------------> Fyy     .
             |     /                  .     .
             |    /                   .    .
  F(f,1_x)   |   / α_xx               .   .
             |  /                     .  .
             | /                      . .
             V/                       ..
            Fxx .......................

In other words, a dinatural transformation α: F -> G assigns to each object x a morphism α_xx such that the diagram above commutes.

Dinatural transformations come up when you’re considering two of MILL’s inference rules, namely

                               x ⊢ y   y ⊢ z
   ------- (Identity)   and   --------------- (Cut)
    x ⊢ x                          x ⊢ z

These two have the same symbol appearing on both sides of the turnstile, x in the Identity rule and y in the Cut rule. Setting

   Fxy = * ∈ Set,

where * is the one-element set, and

   Gxy = x ⊢ y ∈ Set,

the Identity rule specifies that given f:x->y we have that f o 1_x = 1_y o f:

                         .......................y ⊢ y
                        ..                       7|
                       . .                      / |
                      .  .                α_yy /  |
                     .   .                    /   | 1_y o f
                    .    .                   /    |
                   .     .      f o 1_x     /     V
                  .    x ⊢ x ------------- /--> x ⊢ y
                 .      7                 /      .
                .      /                 /      .
               .      /                 /      .
              .      /     1           /      .
             * ---------------------> *      .
             |     /                  .     .
             |    /                   .    .
          1  |   / α_xx               .   .
             |  /                     .  .
             | /                      . .
             V/                       ..
             * ........................


   α_xx:*  -> x ⊢ x        * |-> 1_x

picks out the identity morphism on x.

In the Cut rule, we let j:x->s, k:t->z, and f:s->t,

   F(t,s) = (x ⊢ s, t ⊢ z)

                j      f         k
   F(1,f) = (x ---> s ---> t, t ---> z)

                j         f      k
   F(f,1) = (x ---> s, s ---> t ---> z)

   G(s,t) = x ⊢ z

and consider the diagram for a morphism f:s->t in C.

                         .......................x ⊢ z
                        ..                       7|
                       . .                      / |
                      .  .         composition /  |
                     .   .                    /   | 1
                    .    .                   /    |
                   .     .          1       /     V
                  .    x ⊢ z ------------- /--> x ⊢ z
                 .      7                 /      .
                .      /                 /      .
               .      /                 /      .
              .      / F(f,1)          /      .
       (x ⊢ s, t ⊢ z) -------> (x ⊢ s, s ⊢ z)
             |     /                  .     .
             |    /                   .    .
             |   / composition        .   .
     F(1,f)  |  /                     .  .
             | /                      . .
             V/                       ..
       (x ⊢ t, t ⊢ z) .................

which says that composition of morphisms is associative.

Dequantization and deformation

Posted in Math, Quantum by Mike Stay on 2007 January 31

A rig is a ring without negatives, like the nonnegative integers. You can add and multiply them, multiplication distributes over addition, and you’ve got additive and multiplicative identities 0 and 1.

There’s another rig, called the “rig of costs,” that everyone uses when planning a trip: given two alternative plane tickets from A to B, we chose the least expensive one. We add the cost of a trip from A to B with the cost of a trip from B to C. This one’s denoted

\displaystyle R_{\min}=(R>0 \cup \{\infty\}, \min, \infty, +, 0).

Notice that “addition” here is min, and the additive identity is \infty: \min(x,\infty)=x. “Multiplication” here is +, and distributes over min:

\displaystyle x+\min(y,z) = \min(x+y, x+z).

As described here, one can deform the rig

\displaystyle (R>0, +, 0, *, 1)

to the rig

\displaystyle R_h=(R>0 \cup \{\infty\}, \oplus_h, \infty, +, 0),


\displaystyle u \oplus_h v = -h \log(e^{-u/h} + e^{-v/h}),

like this:

\displaystyle x \mapsto -h \log x.

As h \to 0, the deformed rig approaches R_{\min}. This is called Maslov dequantization; here’s why.

In quantum mechanics, the path a particle takes is governed by integrating the amplitude, so the probability amplitude of arriving at point b at time t is

\displaystyle \psi(b,t) = \int_{b_0} \int_{\mbox{paths } \gamma:(b_0, t_0) \to (b,t)}e^{-iS(\gamma)} \psi(b_0, t_0)\, D\gamma\, db_0

In classical mechanics, the path a particle takes is governed by the principle of least action, so the action cost of arriving at point b at time t is

\displaystyle \psi(b,t) = \inf_{b_0} \inf_{\mbox{paths } \gamma:(b_0, t_0) \to (b,t)}S(\gamma) + \psi(b_0, t_0)

where “inf” means “infimum,” i.e. the least element of an infinite set. You get from the complex numbers to the rig R_{\min} by taking

\displaystyle z \mapsto -h \log |z|,

and classical mechanics falls out of quantum mechanics as h \to 0. If you take the derivative of those two equations above with respect to time, you get Schroedinger’s equation from the quantum case and the Hamilton-Jacobi equation from the classical case.

No one’s heard of the latter one, but you can describe a classical system with a wavefunction! Instead of the probability amplitude at a given point, it’s the action cost.

Week 241

Posted in History, Math, Quantum by Mike Stay on 2006 December 1

LIGO, the Fool’s Golden Ratio and Stonehenge’s Platonic solids 1600 years before Plato.

Quantum lambda calculus, symmetric monoidal closed categories, and TQFTs

Posted in Category theory, Math, Programming, Quantum by Mike Stay on 2006 August 22

The last post was all about stuff that’s already known. I’ll be working in my thesis on describing the syntax of what I’ll call simply-typed quantum lambda calculus. Symmetric monoidal closed categories (SMCCs) are a generalization of CCCs; they have tensor products instead of products. One of the immediate effects is the lack of built-in projection morphisms \pi_1 and \pi_2 for extracting the first and second element of a pair. And without these, there’s no duplication morphism $\Delta: A\to A\otimes A$; morphisms of this type can exist, but there’s no way to express the notion that it should make a copy of the input! The no-cloning theorem is built in.

The typical CCC in which models of a lambda theory are interpreted is Set. The typical SMCC in which models of a quantum lambda theory will be interpreted is Hilb, the category of Hilbert spaces and linear transformations between them. Models of lambda theories correspond to functors from the CCC arising from the theory into Set. Similarly, models of a quantum lambda theory should be functors from a SMCC to Hilb.

Two-dimensional topological quantum field theories (TQFTs) are close to being a model of a quantum lambda theory, but not quite.

There’s a simpler syntax than the simply-typed lambda calculus, called universal algebra, in which one writes algebraic theories. These give rise to categories with finite products (no exponentials) , and functors from these categories to Set pick out sets with the given structure. There’s an algebraic theory of groups, and the functors pick out sets with functions that behave like unit, inverse, and multiplication. So our “programs” consist solely of operations on our data type.

TQFTs are functors from 2Cob, the theory of a commutative Frobenius algebra, to Hilb. We can look at 2Cob as defining a data type and a TQFT as a quantum implementation of the type. When we take the free SMCC over 2Cob, we ought to get a full-fledged quantum programming language with a commutative Frobenius algebra data type. A TQFT would be part of the implementation of the language.

CCCs and lambda calculus

Posted in Category theory, Math, Programming, Quantum by Mike Stay on 2006 August 22

I finally got it through my thick skull what the connection is between lambda theories and cartesian closed categories.

The lambda theory of a set has a single type, with no extra terms or axioms. This gives rise to a CCC where the objects are generated by products and exponentials of a distinguished object, the free CCC on one object. For example, from a type X we generate objects 1, X, XX, (X^X)^{(X^X)} = X^{XX^X}, etc. The morphisms are reduction-equivalence classes of all the programs we can make out of the appropriately-typed S and K combinators. A product-exponential-preserving functor from this CCC to the category Set (which is also a CCC) picks out a set S_X for X, and because it preserves the structure, maps the product AB to S_A \times S_B and the exponential A^B to {S_A}^{S_B}.

The functor itself, however, can be uncomputable: one could, for example, have S_X be the set of halting programs for some universal Turing machine. This set is only computably enumerable, not computable.

When we have types and axioms involved, then we add structure to the set, constraints that the sets and functions on them have to satisfy. For instance, in the lambda theory of groups, we have:

  • a type G
  • terms
    • m:XX \to X
    • inv:X\to X
    • e:1 \to X
  • axioms for right- and left-unit laws, right-and left-inverses, and associativity

The CCC arising from this theory has all the morphisms from the free CCC on one object and extra morphisms arising from products and compositions of the terms. A structure-preserving functor to Set assigns G to a set and m, inv, and e to functions satisfying the axioms. These functions needn’t be computable, either—they only have to satisfy the group axioms.

So in terminology programmers are more familiar with, the terms and axioms define an abstract data type, an interface. The functor gives a class implementing the interface. But this implementation doesn’t need to be computable! Here’s another example: we start with a lambda theory with a data type \mathbb{N}, along with a term succ:\mathbb{N}\to \mathbb{N} and the axioms of Peano arithmetic; a functor from this lambda theory to Set will give us an implementation of natural numbers. Now we add a term f:\mathbb{N} \to \mathbb{N} to the theory, with no other constraints. One model of this theory is Peano arithmetic with an oracle to \Omega: it assigns f to the function that returns the nth bit of the Omega number for a universal prefix-free Turing machine.

I think that in order to get a computable model, we have to use a “computability functor” (my term). If I’m right, this means that instead of taking a functor directly into Set, we have to take a functor into a CCC with no extra terms to get a “computable theory” (again, my term), and then another from there into Set. This way, since all the morphisms in the category arising from the computable theory are built out of S and K combinators, the functor has to pick an explicit program for the implementation, not just an arbitrary function. From there, whether the implementation of the S and K combinators is computable or not really doesn’t matter; we can’t get at anything uncomputable from within the language.

Now, changing gears and looking at the “programs as proofs” aspect of all this: morphisms in the free CCC on one object are proofs in a minimal intuitionistic logic, where \to now means implication rather than exponentiation. The only two axioms are the ones from S and K. Adding a term of a given type to the theory adds a new axiom to the logic, while adding an axiom to the theory defines an equivalence of proofs in the logic.

A computable theory wouldn’t add any axioms, just assign names to proofs so they can be used as subproofs. And because the programs already satisfy the axioms of the computable theory, asserting the equivalence of two proofs is redundant: they’re already equivalent.