Decision-making: the fine art of throwing away information

by Yohan J. John

Albrecht_Dürer_-_Melencolia_I_-_Google_Art_Project_(_AGDdr3EHmNGyA) Human decision-making routinely confounds our attempts at understanding. Right now, the western world is in a state of bafflement and anxiety brought on by unprecedented collective decisions. From the British public's vote to leave the European Union to Republican voters' selection of Donald Trump as their presidential candidate, the past month or so has been a vivid illustration of the unpredictability of human choice.

There is a temptation, particularly among elites who see themselves as well-educated, to see Brexit and the rise of Trump as a failure of intelligence. According to this perspective, disgruntled working class voters are stupid: willfully ignoring the “facts” offered to them by the experts. (Michael Gove has surely captured the spirit of the era with the statement: “People in this country have had enough of experts”.) There may be some truth in the Stupidity Hypothesis, but a far more interesting thing to think about is how statements come to be seen as relevant facts in the first place. And perhaps even more fundamentally, how exactly do facts influence our decisions?

What I would like to say about decision-making strikes me as both obvious and easy to miss. The process of arriving at a decision necessarily involves ignoring information. However much information is available beforehand, at the time a decision is made, much of it is effectively disregarded. And this is not just true of democratic decisions, for which the opinions of the losing side have little or no direct effect on subsequent events. Whenever a situation involves multiple facets, the process of deciding on a course of action involves a simplification that must abstract away from the initial complexity. From a mathematical perspective, decision-making is the process of collapsing a high-dimensional system into a one-dimensional system.

Weighing the evidence

Let me try to reconstruct my line of thinking here. I work in cognitive neuroscience, and one of the things I do is come up with computational models of neural processes such as decision-making. In order to model a process, researchers must try to make explicit all their implicit notions about it. I find that transforming an abstract process into a mechanical metaphor often helps. If I can boil processes down to simple machines like pulleys, levers, and springs, I find I am better able to work back upwards to complexity. This construction of Rube Goldberg machines is in a sense what mathematical modeling is, regardless of the specific scientific field. We try to emulate a complex natural process using simpler computational 'devices' that we create with lines of code. Emulation may not be the only form of understanding, but to my mind it cleaves closest to the spirit of science and engineering. The history of technology is essentially the story of how we figured out how to replace (or rather, displace) human action with the help of simple inanimate mechanisms.

One of the central features of decision-making is encoded in a common metaphor: that of weighing pros and cons, or costs and benefits. I can get an algorithm to do this 'weighing', but in the end, the process is not that different from a pair of scales. The pros go on one side and the cons on the other, and the choice amounts to determining the direction towards which the scales tip. Once this has been determined, we throw away even more information: we go from the continuous domain of proportion to the discrete binary world of yes or no, true or false, Leave or Remain. The scales therefore, serve as a perfectly intuitive mechanical model of comparison and decision-making.

800px-Weighing_of_the_heart3 The weighing scale metaphor allows us to recognize why comparisons involve throwing away information. When comparing the weight of two objects, all other properties of the objects are rendered irrelevant: their color, their shape, their use. This is what I mean by going from a high-dimensional system to a low-dimensional system: all the properties of the objects being compared are thrown away, except for the one that is 'legible' to the scale.

The phrase “apples to oranges” is often deployed to comment on the unfairness of comparing wildly dissimilar things. But “wildly dissimilar” is a matter of degree and not kind. Or rather, things that differ in kind can always be forced onto a scale of degrees. We can, if we wish, compare apples and oranges in terms of their weight, or their sugar content, or their cost.

We might say that an 'objective' comparison is one in which a clear weighing scale has been used, as opposed to gut instinct, which is largely opaque to introspection. It may not be immediately obvious, but all objective comparisons boil down to weighing one-dimensional magnitudes: scores, ratings, distances, weights, probabilities and so on. These kinds of magnitudes are sometimes called scalar values. The words 'scalar' and 'scale' both come from the Latin word 'scala', which means 'ladder' — a good name for things that can only go up or down. Scalars are distinguished from more complex mathematical entities like vectors and tuples, which group together multiple magnitudes. We can think of a scalar value as a single measurement and a vector or a tuple as a collection of measurements. Given a pair of magnitudes, we can always use a machine to tell if one is greater than the other, or if they are equal. But if we are presented with two tuples, we have to do some more work before we can compare them. Specifically, we have to reduce their dimensionality to one.

Griddy Here's an example. Let's say we have two points on an x-y graph: point A is at (1,5) and point B at (4,2). How might we compare them? How would we go about saying one is “greater”, or “better”? The x-position of A is less than that of B, but the y-position is greater. We need to choose a metric, which is a way to calculate distances. One common metric is the Euclidean metric, which is the most intuitive type of distance. It's the distance 'as the crow flies'. Having chosen this metric, we can say that point A is 5.09 units away from the origin, and point B is 4.47 units. So point A is further away, and we can use this to inform a decision process, such as “let's choose the closer location”. But we could have picked a different metric, such as the city-block metric. As the name suggests, the city-block metric only allows paths along the grid lines (the streets and avenues). If we use this metric, then points A and B are the same distance from the origin (6 blocks).

However sophisticated our comparison systems are, their penultimate stage involves cranking out magnitudes like distances or weights. When a deep learning algorithm predicts the identity of the face in a photograph, it assigns a number (typically a probability) to all of the possible identities in the database, and then just picks the largest one. Google's DeepMind uses a similar system to pick its next move in a game of Go.

Inter-dimensional travel

Now what does this have to do with throwing away information? Let's go back to the two points on the graph. We can use the Euclidean distance to compare points A and B, but we should be aware that there are infinitely many points on the graph that have the same distance from the origin — they all lie on a circle with that distance as its radius. If we use a city-block distance, then all points on a diamond are the same distance away from the origin. So the “subtlety” of the actual, unique position of each point is lost in the comparison.

This loss of specificity can be seen in the process of academic scoring and ranking. Let's say we want to decide which student to award a science prize to. One student scores 90% each in Physics, Chemistry and Biology. Another scores 98, 98, and 74. Their average scores are exactly the same. You may wish to give the prize to the most consistent all-rounder, or to the one who was capable of near-perfect scores. But your metric — the average — doesn't capture any of this. Adding another metric only delays the problem. You could quantify the reliability using a measure of the degree of 'spread', such as the variance. But it is possible to have a low average and a low variance — scoring 50 in each subject would result in the same variance (zero) as scoring 90 in each subject. No fancy metric combining the mean and variance will help here: there will in general be infinitely many combinations of means and variances that result in the exact same combined score.

So if we want to compare things 'objectively', we have no choice but to throw away the rich information in our original high-dimensional framework and “drop down” into a one-dimensional landscape. However much we problematize or complexify a subject, if we have to make a decision about it we have to squeeze into a 1D universe. (The various consequences of a decision then travel in the other direction, turning quantities back into qualities that we can perceive.) When we create metrics for comparing things, situations or people, what we are actually doing is creating a kind of priority list for each piece of information. The metric is the process by which incommensurate items of information are translated into a common currency.

Once you've picked a metric, the process of deciding between two scores is as objective as comparing weights. The real mischief then, comes from picking a metric. Subjectivity arises in the system of translation from qualities to multidimensional quantities to unidimensional scores to binary choices. Of course, sometimes a natural metric suggests itself from the problem we're working on — it doesn't really make sense to use a Euclidean metric to determine walking distance in a city with a grid pattern of streets. But as you might expect, in more complex situations — such as when thrashing through the jungle of big data with an analytical machete — a metric is a way to obscure your biases and preconceived notions in a seemingly objective mathematical form. (A particularly egregious case of metric-based obfuscation lies in the use of the word “optimal”. More on this in the footnotes! [1])

Skin in the game

As I mentioned earlier, to compare things is to translate them into a common currency. Very often this currency is… currency!The overwhelming majority of people on the planet use money for their everyday needs, and are therefore very familiar with how money forces things like health, entertainment and even personal relationships onto a single all-encompassing weighing scale. Money rips everything out of its unique and particular context and into the market of comparison and exchange. Money might be considered the most concrete of all abstractions (with the possible exceptions of time and energy, which are closely related to money). Rather than abstracting to a higher or more complex plain of analysis, money abstracts downwards, from an uncountable multiplicity of qualities to a single quantity.

As David Graeber's book Debt: the First Five Thousand Years illustrates, human choice does not have to be governed by the abstractions of money and exchange. Anthropology has uncovered quite a few 'primitive' cultures whose activities were not legible to the process of monetary valuation and subsequent exchange. At the other end of the cultural spectrum, economists and businesspeople are the tribes most likely to denominate everything in terms of profit and loss. But economists are keenly aware of the fact that the monetary metric is ill-suited to measuring many complicated factors. These “externalities” include the various social costs of profit-seeking, including unemployment, environmental degradation, and disruption of cultural institutions. Carbon credits notwithstanding, it is not clear that we can buy our way out of global warming, even if the political will to do so existed.

Many people living in modern industrial and post-industrial societies — well-versed in the art of evaluating objects, activities and experiences in monetary terms — have reservations about treating the bean-counting homo economicus as the paragon of human decision-making. How to factor in the value of science? Or art? Or tradition? Or the environment? So far no one has really come up with an uncontroversial way to make these intangibles tangible. And even if they did, the process would very likely involve some kind of one-dimensional quantity. Even the greatest intangible of all, consciousness itself, has been linked to a singular quantity: phi, a measure of 'information integratedness'. [2]

But recall that the information-destroying approach to making decisions is a feature of 'objective' decision-making. Many of the decisions people make do not have the hallmark of the weighing scale approach: our metrics of comparison are often implicit, and therefore obscure even to ourselves. Surely only very few people who voted for Brexit created a cost-benefit analysis in which the economic and social benefits of staying in the EU were explicitly weighed against concepts like sovereignty, national dignity, or even dislike of foreigners.

“Stretch out with your feelings!”

Starwars When confronted with decisions that seem motivated by something other than cool 'rational' analysis, it is common to blame emotions. No doubt emotions have the power to distort reality, forcing shades of grey to appear as black and white. This warping maybe the source of a lot of trouble, but it may also be the purpose of emotions. The alternative may be information overload. When confronted with volumes of detailed information, our capacity for making good decisions is often degraded. This is reflected in the paradox of choice: if their are too many options on the shelf of the store, we often experience anxiety and may even avoid making a choice at all. Emotions — particularly feelings like fear or competitiveness — drive decision-making, inducing us to stop dithering and go with the gut.

The advertising world long ago realized this, and typically attempts to communicate directly with our emotional circuits, which are far more likely to induce a decision to purchase something than our cognitive information-processing networks. Perhaps Donald Trump has realized this too: he gives off an impression of decisiveness. As “the Decider“, he cuts through a morass of mind-numbing facts with his emotionally resonant simplifications. He seems to engage his supporters at the level of pure id.

We can disagree with the contents of Trump's id(eology), but we may be forced to learn from his general approach. Without emotional signals, humans may be incapable of decision-making. Antonio Damasio, in his book Descarte's Error: Emotion, Reason and the Human Brain, documents cases in which damage to emotion-related parts of the brain have a stultifying effect on choice. Subjects with this kind of brain damage were capable of solving problems when experimenters asked them to (so the decision was effectively made for them), but found it difficult to initiate a choice on their own. They were aware of the information pertaining to each option, but were somehow unable to make the final comparison necessary to act. Emotion seems to be the spark that causes an information-rich cloud of uncertainty to rain down as decisive action. Without emotion to inject value into life, we may experience melancholia. It is striking that Albrecht Dürer's illustration of Melancholia (the first image shown in this essay) includes a pair of empty scales.

The importance of emotion in decision-making gives us an alternative to the claim that poor decisions imply poor intelligence or a lack of awareness of the 'facts'. It may be that what is missing is not some fact or the other, but an emotional reason to admit something as a fact in the first place — many people no longer trust the sources of alleged facts. Getting people to choose 'your side' may require the slow and painstaking process of building trust in you as a person, rather than in your facts. The quick and easy solutions — repetition, ridicule and righteous indignation — don't seem to be working these days.

Here be hobgoblins!

Perhaps some day an enterprising scientist will come up with an 'objective' way to infer people's subjective metrics. Along the way she may be stymied by another striking aspect of human decision-making. It is often wildly inconsistent. And this is common even for decisions involving explicit metrics. In every field it seems as if people are constantly looking for new ways to evaluate success and failure. The criteria used to make a decision today may not apply tomorrow.

DrawingHands It is easy to dismiss this inconsistency as yet another example of the irrationality of the human species. We might also see this unwillingness to be chained to a particular, explicit, consistent value system as a feature of human intelligence rather than a bug — one that still distinguishes us from the artificial 'intelligences' we have thus far created. We can instill metrics in our machines, and perhaps even 'metrics-of-metrics': automated systems for deciding which metric to use when confronted with a given problem. But we might expect that in machines of the sort we have built so far, there can only be a finite number of levels of 'meta-metric'. (I don't want to claim that it is impossible to emulate the human ability to always escape to a new meta-level of analysis. Perhaps achieving this will require making more explicit what Douglas Hofstadter is getting at when he claims that we are all “strange loops“. On the other hand, perhaps a strange loop is necessarily a mystery unto itself.)

John Maynard Keynes once said “When the facts change I change my mind“. We might extend this, and say that when the facts change, a truly flexible intelligence changes not only its decisions, but also its metrics for going about decision-making. Perhaps this is an alternative way of interpreting Ralph Waldo Emerson's line “A foolish consistency is the hobgoblin of little minds”.

We might say that the spirit of decision-making comprises two parts: a robotic but reliable hobgoblin of consistency, and a capricious but creative demon of change. I suppose real intelligence — or what used to be called wisdom — consists in knowing which part to call forth at any given time!

_____

Notes:

[1] On “optimal” solutions

In fields ranging from economics to psychology to evolutionary biology, one often hears the word “optimal” bandied about. Optimization is a mathematical procedure by which the best of several alternatives is found, given a metric. An optimal solution is one that minimizes a loss function (or, equivalently, maximizes a gain function). The concept of optimality gives the impression of mathematical precision: clearly if one economic theory, or cognitive strategy, or biological trait is “optimal”, then the experts must agree that it is better than the alternatives. Very often the word is deployed even when no actual mathematical procedure has been tried.

But even when due mathematical diligence has been observed, the imagined solidity of optimal solutions is often an illusion, or at the very least a distraction from where the actual problem lies. Once you've defined a quantity — a cost, a score, a performance metric — you can go about finding the alternative that optimizes that quantity (though this can be a complicated process, often because searching through all the alternatives is computationally expensive). You can, for example, determine which of several cars has the best fuel efficiency. You can determine which of several paths between two points is the shortest. But as soon as you decide you want to kill two birds with one stone — say, find a car with good fuel efficiency and good safety ratings — then the trouble begins. How exactly does one combine fuel efficiency and safety into a single index? You can come up with some kind of weighted sum of the two numbers, and then try to find the car that optimizes your new metric. But there is quite a bit of subjectivity here: you can combine the available factors in countless ways. You can even work backwards: given an choice that you are already biased towards, you can, with a little imagination, cook up a metric for which your solution is optimal.

Also, when you optimize some metric that is itself a combination of other quantities, there is no guarantee that your solution optimizes those other quantities: in general, you can only optimize one quantity at a time. So an optimal solution may end up being a kind of mediocre middle ground, excelling at nothing in particular. This is also why it is highly unlikely that evolution has made us “optimal” at solving any particular sort of problem. Evolution is a satisficing process: it finds solutions that are good enough, but not necessarily optimal for anything specific.

[2] On “phi”: a proposed measure of consciousness

Phi is a theoretical measure of the degree of consciousness exhibited by people, animals and other complex systems. Its utility as a measure of consciousness is the chief proposal of Integrated Information Theory (IIT). The central assumption is that this sort of integratedness is the hallmark of consciousness. I am not alone in having reservations about phi. Even if it could be feasibly measured (and this is debatable), it isn't clear why it has anything to do with most people's conceptions of consciousness. As philosopher Ned Block once said to Giulio Tononi, one of the creators of phi: “You have a theory of something, I am just not sure what it is”. For an excellent in-depth examination of phi, see this paper: The Problem with Phi: A Critique of Integrated Information Theory. I've collected some excerpts from the paper here.