# does Everett have a probability problem, or do you?

Everettian quantum mechanics is the attempt to build a coherent, consistent, and empirically adequate theory using only the unitary part of quantum mechanics.

In short, EQM postulates that everything there is to say about any system can be fully said using a Hilbert space^{1} equipped with an algebra of observables and a unitarily evolving state. There is no fundamental classical-quantum split, there is no postulated collapse of the wavefunction, and measurements and observers are not unexplained primitives of the theory but processes that can be modelled within it. Our classical world is to be found within the correlation structure of the observables, selected by interactions and stabilised by decoherence. EQM is much more commonly known as the many-worlds interpretation of quantum mechanics, because the theory tells us that the world we observe is just a sliver of a mind-boggling (to say the least) multiverse of ever-branching semiclassical realities.

Some people dismiss EQM because the worldview is too weird. This is not a sufficient reason, because new scientific worldviews tend to be too weird at first^{2} (we would have come up with them earlier otherwise). The real question to ask is: is the theory *actually* consistent and empirically adequate?

Along this line, there are two main alleged technical problems with EQM:

**the preferred basis problem:**how do know on which basis the branching happens? actually, how do we even know that branching is happening?**the probability problem:**how do we recover, or even make sense of probabilities in a branching world in which all alternatives happen?

These are all well motivated, important questions whose solutions need to be agreed upon in order for EQM to be a viable theory. There are well developped attempts at solutions to these problems (see Wallace (2014) and Saunders et al. (2010) for overviews). Of course they are still matter of interesting debate.

But, as the title now probably suggests to you, we will focus on only one question here:

do probabilities in a deterministic, branching world even make sense?

If the answer is to this question is “no”, then there is no hope for EQM.

## the garden of forking paths

So, let’s take for granted the branching picture of reality that can be derived from EQM, and consider a specific scenario.

An experimenter, Alice, makes a series of measurements on a bunch of systems that she is convinced have all been prepared in the same way.

To be concrete, let’s say that she is measuring $N$ qubits on the computational basis ${,}.$ According to EQM, after the $N$ measures, Alice and her lab will have branched into $2^N$ versions, one for each possible string $s$ of $N$ binary digits. Each branch, Alice observes $N$ experimental outcomes spelling out the binary string $s$; everything she observes around her confirms that. EQM says that the goings-on in each branch are equally real.

Let’s add a twist. At the end of the experiment, Alice is asked to place a bet on the result of a last computational basis measurement on an $(N+1)$th qubit prepared in the same way as all the others.

How should she pick?

### the way we normally do things

We know exactly how *we* would do this in a single—non-branching—world with unpredictable outcomes. We would use the statistics of the $N$ measurements to estimate the *probability* of the $(N+1)$th measurement coming out as 0 or 1, and we would *decide* to place our bet on the more likely outcome.

This strategy is used all the time in all sorts of areas of science, business, and life, to the point that it seems self-evident to most people. But ask yourself:

*do you know how, or why, it works?*

It is just a brute fact of the world that most of us use this standard practice without knowing why it works. We just know that it seems to work well for us. We’ve been doing it for ages, everyone does it, we learn it in school etc.

And it would be nonsensical to do this in the branching world scenario! Alice can’t just use the observations she has made in her branch to infer anything useful about how she should bet at the end of her $N$ experiments, right? Probabilities make no sense when everything happens, do they?

### … don’t they?

If we do not know why the standard practice works we have no basis to say that, while it works in a world with uncertain events, it cannot work in a deterministically branching universe. But also, more simply:

**If we have been OK just using probabilities without knowing how or why they work, how can we really object to their use in a branching world?**

Can we really require Alice to do more work than us to use the same strategy? Especially considering that both us (in our single-outcome world) and Alice (in her branching world) experience exactly the same phenomenology.

This is the main thesis of this post. If you are convinced, you can stop here. If you want to understand whether Alice’s use of probability is less justified than ours, we need to look into the philosophy of probability.

## some (same?) difficulties

People have named two distinct parts of the standard practice of dealing with probabilities i described above:

- using frequencies to estimate probabilities in repeated, exchangeable trials is called the
*inferential link* - using probabilities to evaluate the desirability of different actions is called the
*decision-theoretic link*

These two links *operationalise* probabilities (theoretical, not directly measurable) by connecting them to frequencies and actions (observable, concrete).

Philosophers have developed several arguments to justify how we use probabilities in science and daily life. But it’s still a matter of debate and there is nothing like a consensus, law-like, in-principle justification for probabilistic reasoning.^{3} We won’t get into these argument for now (that’s for another post). Instead, we’ll look at these two links and see how, under closer inspection, they are a little problematic, both in theory and in practice. And we’ll also see that whatever problem they have, they have it in both single and branching world cases!

But first, let’s quickly get something out of the way, just in case it is rattling around your head.

### probabilities and determinism

Yes, it does make sense to use and talk about probabilities in a deterministic setting.

Remember that for about two and a half centuries, from Newton to Born, we (western scientists) were pretty convinced that we lived in a mechanistic world following deterministic laws? This did not stop Cardano, Pascal, Fermat, and Huyghens from developing probability theory! Nor did it stop (Daniel) Bernoulli, Boltzmann, Maxwell, Gibbs, and company from building statistical mechanics on it. Clearly, we felt entitled to use probability theory within a deterministic ontology.

And it’s not that things have changed much since then. While the physicist in the street will be pretty comfortable in saying that the world is fundamentally random (quantum phenomena show irreducible uncertainty), we still use probabilities all the time in situations where the uncertainty does not come from quantum statistics: coin flips, roll of dice, what is the risk associated with driving under the influence, whether an earthquake will wreck your building or not, etc.

Sure, there was (and still is!) a lot to say about how to square the concept of chance and uncertainty with deterministic laws and to make sense of the concept of *decision making* in a deterministic world (Dennett 2004), but this does not change the fact we felt pretty confident in using probabilities when we thought the world was deterministic, and we still use them today in situations with no underlying quantum randomness.

So, Alice is not automatically disqualified in using them in her branching world simply because it is a deterministic setting.

### the inferential link

The inferential link tells us that we can take observed frequencies as indicative of probabilities. Repeat a chancy experiment sufficiently many times, and take the observed frequency of an event to be the probabilities of that event.

Easy, isn’t it? Certainly, we do it all the time.

But why does it work?

If you do the probability calculus of many (independent, identically distributed) trials, you will get that the observed frequency of an event is close to the probability of that event—** probably!** In other words, after sufficiently many repeated trials, you are

*very likely*to have observed frequencies pretty close to the probabilities.

But what if you didn’t, though? What if you were unlucky and observed fluke after fluke? Though *improbable,* it’s always *possible* for you to observe improbable events. And when you do, using the trusty inferential link will lead you to badly misjudge the probabilities. And there is no way for you to know when that happens. Oups!, too bad for you, I guess.

The situation is not much different in EQM. There will be a copy of Alice who will observe any possible distribution of outcomes, and thus we are guaranteed that there will be an Alice that will believe that $n/N$ is the best estimate of the probability for the last 0, for any value of $n$ from $0$ to $N.$ So, many copies of Alice are bound to misestimate the true probability in the last measurement, and there is no way for them to know.

However, if do the Everettian computation (see below) of the state at the end of the $N$ measurements on qubits each prepared in the state$\mathrm{\mid}\psi \u27e9=\alpha \mathrm{\mid}0\u27e9+\beta \mathrm{\mid}1\u27e9,$

you will see that most of the weight^{4} of the wavefunction (the absolute value squared of the amplitude) is on branches in which Alice observes about $||^2N$ times the outcome $\mathrm{\mid}0\u27e9$. Branches in which Alice observes weird statistics will have low weight.

So, in a possibilistic world, its always *possible, but improbable,* that you will observe bad statistics. In a branching world, *it is guaranteed, but low weight,* that some version of you will observe bad statistics. Too bad for them, I guess.

“Wait a second,” you might say, “if i imagine doing the experiment an infinite number of times, then the probability of observing weird statistics is exactly $0.$” This is the frequentist attempt at grounding the inferential link. And it’s true, but a) we never get to do an infinite number of experiments (so the probability is small, but never $0$) and, also b) if Alice did an infinite number of spin measurements, the wavefunction would also have exactly $0$ support on branches with anomalous statistics, meaning that those branches actually wouldn’t exist and all versions of her would see the correct statistics.

### decision-theoretic link

The second operational part of probabilities is their role in the decision making process. If there are a bunch of outcomes which we value in different ways, we choose actions that maximise the expectation value of the utility of those outcome.

When proposed to bet on the outcome of the experiment, you will normally bet on the outcome you think is most probable. This is super straightforward, but have you stopped to think *why* we should do that?

In the end, all outcomes are *possible*. Sure, you believe that they have different probabilities, and you discount improbable events. But *why* should we use probabilities as guides to action? Why do we maximise *expected* utility, when in the end we care about *actual* utility?

It’s just what we do.

And if we just do that without further justification, why can’t we do it in a branching theory? Why shouldn’t Alice, at the end of the experiment, bet on $\mathrm{\mid}0\u27e9$ if she assigns to $\mathrm{\mid}0\u27e9$ the higher probability?

“Wait a second,” you might say, “I do this in our single outcome world because, if I do the same bet many times, I will win most of the time.” First, that is not entirely true, as we often bet on one off events (which job should i pick?) And, also, no, if you place the same bet on many similar but independent occasions, you will *probably* win most of the time. Same with Everett: if you place a bet on a high weight outcome on many similar but independent occasions, there will be high weight on branches in which you win most of the time.

In a world with uncertain outcomes, we bet on what we judge to have higher probability, so that we have higher probability of winning (even though there is always a *possibility* of losing). In a branching world, Alice will bet on the outcome she judges with higher weight, so that she has a higher weight of winning the bet (even though there is always a version of Alice that loses).

## so who has a problem?

There is a delicious circularity in our use of probability. Our theory of probability tells us the inferential link allows us to deduce the probability of an event with *high probability*, and that if we bet on high probability outcomes we will win with *high probability*; the best we can do is to maximise *expected* utility.

Alice can use probability in EQM with the same exact circularity: it tells her that observed frequencies are informative of weights *with high weight*, and that betting on high weight outcomes allows her to win with *high weight*; the best she can do is to maximise *weighted* utility.

Of course, the *picture* (the ontology) is completely different. In one case only one thing will happen, in the other case every possible thing happens. In one case *probability* has to do with uncertainty and stochasticity, in the other… it’s less clear what it is about. Uncertainty about which future *i* will get to experience? Who is “i”? Or is it just about caring differently about different versions of myself, based on some “past-experience bias”? Lots of questions to explore here (Parfit 1986).

But still—and this kind of blew my mind when it came into focus—the *use* of probability, complete with its circular challenges, is strangely isomorphic in these two cases. The words and the pictures might look very different but, functionally, we have a strong parallel. It’s quite weird, but it’s like this.

Maybe you think that we don’t really need philosophical justification since our direct experience provides empirical evidence that probability works in a single-outcome reality, while we have no experience of probability use in a branching reality. But what if our reality has been a branching one all along?

## appendix: classical vs quantum coin flip

Let us say you call a random number generator $N$ to generate a binary string $s$. Let us assume that documentation of the RNG says that each call has probability $p_1$ of outputting a 1. Then the probability of string $s$ is$P(s)={p}_{0}^{N-h(s)}{p}_{1}^{h(s)},$

where $h(s)$ is the number of 1s in $s$, known as the Hamming weight. We can also ask what is the probability of observing a certain Hamming weight $h$, and this is simply$P(h)=\left(\genfrac{}{}{0px}{}{N}{h}\right){p}_{0}^{N-h}{p}_{1}^{h},$

which is just the binomial distribution.

Let’s look at Alice’s experiment. Say that each qubit is prepared in the state$\mathrm{\mid}\psi \u27e9=\alpha \mathrm{\mid}0\u27e9+\beta \mathrm{\mid}1\u27e9\mathrm{.}$

Then we can denote the state of the $N$ systems as${\mathrm{\mid}\psi \u27e9}^{\otimes N}=\mathrm{\mid}\psi \u27e9\otimes \cdots \otimes \mathrm{\mid}\psi \u27e9,$

and can expand it in the computational basis as$\mathrm{\mid}\psi \u27e9}^{\otimes N}=\sum _{s=0}^{{2}^{N}-1}{\alpha}^{N-h(s)}{\beta}^{h(s)}\mathrm{\mid}s\u27e9\mathrm{.$

If we model each measurement as entangling Alice with a qubit in the following way$(\alpha \mathrm{\mid}0\u27e9+\beta \mathrm{\mid}1\u27e9){\mathrm{\mid}\mathrm{r}\mathrm{e}\mathrm{a}\mathrm{d}\mathrm{y}\u27e9}_{A}\u27f6\alpha \mathrm{\mid}0\u27e9{\mathrm{\mid}\mathrm{s}\mathrm{a}\mathrm{w}\text{}0\u27e9}_{A}+\beta \mathrm{\mid}1\u27e9{\mathrm{\mid}\mathrm{s}\mathrm{a}\mathrm{w}\text{}1\u27e9}_{A},$

the evolution of the $N$ experiments yields$\mathrm{\mid}\psi \u27e9}^{\otimes N}{\mathrm{\mid}\mathrm{r}\mathrm{e}\mathrm{a}\mathrm{d}\mathrm{y}\u27e9}_{A}\u27f6\sum _{s=0}^{{2}^{N}-1}{\alpha}^{N-h(s)}{\beta}^{h(s)}\mathrm{\mid}s\u27e9{\mathrm{\mid}\mathrm{s}\mathrm{a}\mathrm{w}\text{}s\u27e9}_{A}\mathrm{.$

Due to standard decoherence arguments, once Alice is involved, there is essentially no hop to see interference between these branches and it is safe to treat amplitude squares as probabilities. EQM assigns probability${P}_{\mathrm{E}\mathrm{Q}\mathrm{M}}(s)=\mathrm{\mid}{\alpha}^{N-h(s)}{\beta}^{h(s)}{\mathrm{\mid}}^{2},$

to the branch in which Alice saw $s$. So, if$\mathrm{\mid}\beta {\mathrm{\mid}}^{2}={p}_{1},$

EQM gives the same probabilities to Alice as classical probability gave them to us. So quantitative statements about probabilities of odd statistics are the same in both single world and branching worlds.

### Bibliography

Remember that arXiv, sci-hub, and libgen exist!

Image credits to OpenAI.

*Freedom Evolves*. London: Penguin books.

*Reasons and Persons*. Oxford University Press. https://doi.org/10.1093/019824908X.001.0001.

*Many Worlds? Everett, Quantum Theory, and Reality*. Oxford, England: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199560561.001.0001.

*The Emergent Multiverse: Quantum Theory According to the Everett Interpretation*. Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199546961.001.0001.

I think we can do with only a von Neumann algebra of observables and a state on that algebra, maybe not even a flow, but this is a story for another time.↩︎

The Earth is a sphere? It moves?! Wait, what are you saying about apes? Everyone knows matter is continuous, besides, you just need too many of these “atoms”… Time goes slower closer to a planet? How far are those nebulas? What do you mean the universe is expanding, but it’s not expanding

*into*anything?↩︎In fact, the problem of the philosophical foundations of probability is still an ongoing area of investigation, and it has some quite striking parallels with the foundations of quantum mechanics: both quantum theory and probability theory have a rich, well agreed-upon mathematical framework, they are used with unparalleled success in all sorts of applications, experts seem to agree on their predictions in every real-world application, and yet, unknown to most users of the theories, the philosophical foundations are controversial and ongoing, with various vocal camps talking past each other.↩︎

I am using the world “weight” here, but many authors would simply use “probability”.↩︎