Bell’s Theorem: A Nobel Prize For Metaphysics

by Jochen Szangolies

Bells Theorem Crescent in Belfast, John Bell’s birth town.

There has been no shortage of articles on this year’s physics Nobel, which, just in case you’ve been living under a rock, was awarded to Alain Aspect, John Clauser, and Anton Zeilinger “for experiments with entangled photons, establishing the violation of Bell inequalities and pioneering quantum information science”. Why, then, add more to the pile?

A justification is given be John Bell himself in his 1966 review article On the Problem of Hidden Variables in Quantum Mechanics: “[l]ike all authors of noncommissioned reviews [the writer] thinks that he can restate the position with such clarity and simplicity that all previous discussions will be eclipsed”. While I like to think that I’m generally more modest in my ambitions than Bell semi-seriously positions himself here, I feel that there is a lacuna in most of the recent coverage that ought to be addressed. That omission is that while there is much talk about what the prize-winning research implies—from the possibility of groundbreaking new quantum technologies to the refutation of dearly held assumptions about physical reality—there is considerably less talk about what it, and Bell’s theorem specifically, actually is, and why it has had enough impact beyond the scientific world to warrant the unique (to the best of my knowledge) distinction of having a street named after it.

In part, this is certainly owed to the constraints of writing for an audience with a diverse background, and the fear of alienating one’s readers by delving too deeply into what might seem like overly technical matters. Luckily (or not), I have no such scruples. However, I—perhaps foolishly—believe that there is a way to get the essential content of Bell’s theorem across without breaking out its full machinery. Indeed, the bare statement of his result is quite simple. At its core, what Bell did was to derive an inequality—a bound on the magnitude of a certain quantity—such that, when it holds, we can write down a joint probability distribution for the possible values of the inputs of the inequality, where these ‘inputs’ are given by measurement results.

Now let’s unpack what this means.

Flipping A Quantum Coin

The box with a possible result of checking coins 2 and 3.

Suppose you’re given a box that consists of three chambers, each of which contains a coin behind a small window. The view into each chamber is obstructed by a movable panel that can be slid away to reveal the coin behind it. However, the mechanism of the box is designed such that you can only ever open at most two of the panels—the third is then locked tight. Furthermore, after having opened two panels and closing them again, you can only reopen any panels after shaking the box again. Your task is to figure out the probability of all coins coming up heads.

So you shake the box, open up two panels at random—you figure you can just get things done more quickly by taking both values you have access to in each go—and note down the outcome of each coin throw. Sure enough, as you tally the results, counting the number of times each coin has come up heads or tails, you find they all do so about half the time—in other words, the coins seem to be fair. So you figure that the probability for all coins to come up heads together is equal to that of each coming up heads separately—hence, ½ · ½ · ½ = 1/8. That was easy!

However, going back over your notes, you see something curious. Whenever you have noted the result of checking the first and second coins, they agree—when the first comes up heads, so does the second; likewise for tails. The same thing, you notice, holds true for the second and third coins, while for the first and third coins, if you have opened their respective panels together, they always seem to disagree in their values—if the first comes up heads, the third comes up tails, and vice versa. Let’s note this down:

When you look at the first and second coins together, the outcome is either ‘heads, heads’ or ‘tails, tails’ with equal probability
Likewise, when you look at the second and third coins together, the outcome is also either ‘heads, heads’ or ‘tails, tails’ with equal probability
However, when you look at the first and third coins together, the outcome is either ‘heads, tails’ or ‘tails, heads’, again with equal probability

This is not, in and of itself, anything particularly strange—what you’ve discovered is that the outcomes of the coin throws are not independent, but correlated. But correlations aren’t mysterious, but quite common in everyday life. To illustrate the point, Bell tells the story of Bertlmann’s socks: Bertlmann, who had been Bell’s co-worker at CERN, was fond of wearing socks of different colors on each foot. Thus, if you see one of his socks, and know that fact about Bertlmann, you can immediately conclude that the sock on the other foot must have a different color. Supposing that there are only two different colors of socks in Bertlmann’s closet, say red and green, you could even immediately predict the color of the sock on the other foot, without ever having to see it!

The reason for this is that the combinations ‘green, green’ or ‘red, red’ are simply forbidden. Likewise, through whatever internal mechanism, the box forbids the combinations ‘heads, tails’ or ‘tails, heads’ for the first and second coin. But then, this screws up the above calculation: the probability of ‘heads, heads’ isn’t ¼, as surmised, but ½—if one of the coins comes up heads (probability ½), then so must the other. Multiplying the probabilities only works for independent events!

Correlation: out of four nominal possibilities, two are forbidden.

Where does that leave you? Well, there still ought to be information in the probabilities of the pairs of outcomes you have collected to yield some conclusion about the distribution of outcomes for all three coins together. So suppose that the first coin comes up heads. Then, so must the second. But if the second comes up heads, so must the third. So it seems, the whole set then must come up ‘heads, heads, heads’. But, and here’s the punchline: looking at the first and third coins together, you know they must always come up oppositely—so this state can never occur! Could that then be the answer: the probability of ‘heads, heads, heads’ is zero?

But of course, going through the possible outcomes, any combination of outcomes for all three coins violates one of the rules set down above. Each, you find, can never occur—and thus, can’t be assigned a valid probability. You can’t even set them all to zero, as the sum of probabilities of all possible outcomes must be one—something must always happen. This is what is meant if we say that the three coins—more accurately, their values after flipping—can’t be assigned a joint probability distribution.

This should seem somewhat troubling: after all, it seems reasonable to assume, there must be some state for the three coins after a throw (a principle often called ‘value definiteness’); but any possible such state is incompatible with the correlations we have observed.

Physics Goes Meta

On the other hand, the above discovery might leave you rather unimpressed. The coin we’re not looking at, you might surmise, doesn’t really matter. It’s no problem at all to write down a probability distribution for the three coins if we require that only those we actually look at obey the rules above. If we look only at the first and third coins, we could assign to the possibilities (writing H for ‘heads’ and T for ‘tails’ for brevity) ‘HHT’, ‘HTT’, ‘THH’, ‘TTH’ probabilities of ¼ each (in fact, any combinations such that the first and last two sum to ½ works). The second coin will violate the correlations—but we don’t open the second panel, so we never see it do so. Similar assignments are possible for looking at the other pairs.

The trouble with this is just that usually, we would assume that the box doesn’t know beforehand which of the panels you will open. If you had looked instead at coins one and two, or two and three, in the above setup, there would be a nonzero probability of observing a violation of the correlation rules. So supposing that you can open panels randomly and independently of the configuration present in the box after shaking it, this strategy won’t work. (This assumption is often called ‘free will’ in the literature, but this sometimes invites confusion—what’s really needed is that the choice is independent of the outcome of the coin throws, which need not necessarily imply anything about metaphysical free will; thus, I prefer the less ambiguous ‘free choice’. The violation of this assumption is generally known as ‘superdeterminism’.)

But another option is that the box’ internal machinery is more sophisticated than you had assumed. After all, it already can’t merely throw each coin independently—then, there would be no way to account for the correlations between pairs of coins. So suppose that the outcome of each coin toss is only selected once you open up the first panel. Say you decide to look at the third coin, and see that it has come up ‘heads’—then, immediately, the internal machinery of the box sets up the first coin to display ‘tails’ and the second one to show ‘heads’. Thus, if checking on one coin disturbs the value of the others, correlations such as those observed become possible. Call the assumed lack of such influence the ‘no disturbance principle’.

No matter what value is hidden behind the unopened panel, it must disagree with one of the constraints.

Summing it up, we find that if we assume that

there is some state the three coins are in, after shaking the box (value definiteness),
we can choose to check the values of any two coins, independent of those values (free choice), and
looking at one coin does not disturb the values of the others (no disturbance),

correlations such as the ones we have observed should be impossible. That we have observed them, however, then tells us that one of those assumptions must be thrown out.

Now, for a box with some unknown internal mechanism, there seems a clear candidate: there is no good reason to assume that opening up any of the panels should not influence what’s behind the others. It would be trivial to program a computer simulation of such a box, for instance.

However, there are three main aspects of Bell’s result that cement its importance—and that of its conclusive empirical validation by the recent Nobel laureates. First, in the above, we have nowhere had to make any assumptions about the theory we use to describe the physical world. We’re not merely talking about how a given theory tells us the world is, but about the world in a theory-independent way—which is why physicist Abner Shimony, whom we’ll shortly meet again, has coined the term ‘experimental metaphysics’ for such results: literally, they go beyond (any given theory of) physics.

Additionally, while it might be reasonable to expect complex mechanical devices to show properties dependent on the manner of their observation, the same strikes us as profoundly counterintuitive when extended to simple physical objects. Take, for instance, a brick: we consider it perfectly adequate to measure its size, mass, and color in isolation, without having to take account of how these measurements might influence one another. If we measure its mass, we don’t expect that to influence the outcome of a measurement of its color; but with the coins, ‘measuring’ the outcome of one seems to influence that of another. If we thus see such strange correlations in nature on simple objects—e. g., elementary particles—then we find we must let go of an assumption that seems obvious to us in everyday physical objects, namely, that their properties don’t depend on the context of their measurement (what other measurements are carried out simultaneously). This assumption is accordingly termed non-contextuality and is the core of a result intimately related to Bell’s, namely, the Kochen-Specker theorem.

For the final reason why Bell’s result, unlike the behavior of coins in mechanical contraptions, should shock us and forces us to reconsider some dearly-held assumptions, we must look at how correlations like the above are actually realized in quantum mechanics.

Neither Here Nor There

Two pairs of two coins, such that one from each may be examined.

In quantum mechanics, the situation as described above is not exactly realized. However, there are closely related scenarios, and we can inch up our way to more realistic cases: suppose we have four coins in two pairs, such that we can always only look at one of each pair. Let’s call them A₁, A₂, B₁, and B₂ for short. Doing some rounds of experiments with the box as before, you quickly find the following:

- The pairs (A₁, B₁), (A₁, B₂), and (A₂, B₁), if observed together, always yield the same results—both heads, or both tails
- The pair (A₂, B₂) always yields opposing results—one heads, the other tails

While this situation is, again, not realized in quantum mechanics, it makes things a bit simpler, and the additional subtleties introduced by the correct quantum behavior won’t affect our conclusions. For those willing to dig a little deeper, I give a somewhat more realistic treatment in the excursion below.

We can now reason as before: finding H for A₁, B₁ must be H, too, and consequently, so must A₂ and B₂. But A₂ and B₂ must always yield opposing values: hence, no simultaneous assignment of values to every coin is possible—except if, as before, looking at one coin has the potential to disturb the value of the others (or conversely, to take the superdeterministic option, the coin’s values determine which ones we will look at). But this situation now allows us to conclude something more.

The coins in this setup come neatly packaged in two sets, and whether one panel can be opened is decided only by the state of the other. In quantum mechanics, this is a result of Heisenberg’s uncertainty principle: there are properties of a given system—the canonical example being position and velocity/momentum—such that only ever one or the other can be known at any given time.

By now, you’re determined to get at the heart of this puzzle no matter what. So you whip out a saw, and with grim resolve, cut the box apart right through the middle. Opening one panel each of both part A and B, the other still being blocked by the box’ mechanics, you find agreement with the previously established rules—but of course, that doesn’t tell you much. So you try to repeat the experiment—however, to your dismay, you find that now, the mysterious correlation is broken: the outcomes of box A no longer tell you anything about box B. So, you surmise that whatever influence there may be, something needs to be transferred between A and B to make sure each conforms to the observed distribution of outcomes.

But things are not quite that simple. Suppose you’re supplied with a large number of copies of the original box. Sure enough: shaking each box, then cutting it apart and opening a panel of each half at random, again yields the paradoxical behavior—but only for the first values observed. Whatever ensures that the right values are present in box B after opening a panel on box A, or vice versa, apparently works over a certain distance—if only once.

So, you decide to test the limits of this strange behavior—you prepare a large number of copies of the box, and give the A- and B-parts each to a friend of yours (who, contrary to popular belief, don’t necessarily have to be called Alice and Bob), each marked with a number so that later, you can properly combine the results from the original four-coin boxes. Then, you instruct both to get as far away from each other as possible, while taking the utmost care not to accidentally jostle any of their boxes. (In reality, the difficulties involved in both the ‘getting away from each other as far as possible’ and the ‘not jostling the boxes’ were a large part of the reason for this year’s Nobel win.) Finally, each opens one panel of each box at random, noting down the number of the box, and the time it was opened. Then, they are to return.

Once both are back, you tally up the results—and, sure enough, find the same results as before. Moreover, the relative distance of both does not seem to have any effect on this behavior—indeed, examining the timing of the experiments, you find that not even a signal traveling at the speed of light could have crossed the distance in time!

This should give us pause, for it is here that we learn something about physical reality—reality as such, mind, not just reality according to quantum mechanics. If correlations like the ones observed in the box-experiment exist in the world, then one of our assumptions must fail to hold. Moreover, carrying out the experiments in regions isolated from one another considerably strengthens the no disturbance-assumption: to ensure it, one needs only hold that physics is local, that is, what happens right here does not instantaneously influence what happens over there—it first has to traverse the distance between.

So what this all means, the true import of Bell’s theorem, is that one of our assumptions must go: either we are not free to choose which panel to open; or, there are no definite values associated with certain quantities before looking; or, certain ghostly influences exist regardless of spatial separation.

Which option to take is, to a certain degree, a matter of taste. Some insist that there must be definite values to observable quantities even without observing them, often appealing to auxiliary arguments—most notably the one due to Albert Einstein, Boris Podolsky, and Nathan Rosen—to bolster their case. But this reading is controversial, and should be taken with a grain of salt. Even superdeterminism, for a long time at best an outsider option—for how should one do science if the properties of objects determine our observations thereof?—has recently reemerged as an option. But at least one of these options must be taken: after the conclusive experimental violation of Bell’s inequality, there is no going back to a classical world with objects having definite properties independently of observation.

Excursion: The CHSH-Inequality

As noted, quantum systems do not quite implement the above behavior. The actually observed values do not show the perfect (anti-)correlation posited above. However, with a few clever manipulations, and just a tiny dash of math, we can still reach much the same conclusions.

Suppose we assign numerical values to the possible outcomes—say, ‘heads’ is 1, and ‘tails’ is -1. Then we can look at the following quantity:

This combination of outcomes can never exceed a total value of 2—if B₁ and B₂ are both equal to 1, the first term can maximally be 2, but the second will be 0; if B₁ is 1 and B₂ is -1, the second term may be 2, but the first must vanish. Any assignment of values to all four coins at once thus leaves the above expression upper bounded by 2.

Unfortunately, we cannot directly evaluate it in this form. But suppose we multiply out the brackets, yielding:

This can still only at most equal 2. But each term within this equation is now something that can be evaluated, as it contains one coin from the A-pair, and one from the B-pair. So, you might shake the box, open two panels, and note down the outcomes for A₂ and B₁, say. But how do we get from there to evaluating the expression above? Clearly, we can’t simply combine the results from different runs: as the box is shaken in between, all the values will be reset.

However, since we know that the results in each run can’t produce a value exceeding two, we also know that they can’t do so in the average over many runs. So, we can simply run the box experiment a large number of times, opening one set of two panels at random every time, tallying up the results, and divide it by the number of repetitions—and find that we can still not exceed a maximum value of two. This yields an expression first written down by John Clauser, one of the co-laureates of this year’s Nobel prize, together with Michael Horne, Abner Shimony, and Richard Holt, called the ‘CHSH-inequality’ after its originators’ initials:

Here, the bar above each term indicates taking the average. (Note that this argument is still a bit condensed; but a more careful treatment yields the same result.)

It’s important to realize that we are still beholden to our earlier assumptions, here. Thus, let’s return to the example presented in the main text. If the values in the first three terms always agree, while those in the last one always disagree, the above expression takes a value of 4—the first three terms being equal to 1, and the last yielding -1. Violation of the above inequality thus indicates violation of the assumptions listed above. In quantum mechanics, the maximum attainable value for the CHSH-expression is equal to 2√2, or about 2.83. Nobody knows why.