Information: the Measure of All Things? Part I: Communication, Code and Computation

by Yohan J. John

Metaphor is a hallmark of human communication, and a vital tool of scientific thinking. Along with its more formal cousin, analogy, metaphor allows us to create linguistic and conceptual bridges from the known to the unknown. Some of the greatest breakthroughs in science began with an analogical leap between seemingly unrelated concepts. Isaac Newton brought the heavens down to earth when he realized that the movement of the moon around the earth was analogous to the motion of a ball thrown so hard that it just keeps falling perpetually. This line of reasoning led Newton to realize that the same deterministic laws held for both terrestrial and cosmic phenomena. The scientific revolution that took place in the wake of this discovery brought Enlightenment thinkers to the conclusion that the universe worked like clockwork: its components interlocking like cogs and gears that whirred with coordinated mechanical precision. The vibrations of a violin string and the propagation of sound and light were linked by analogy with wave motion. Analogies also helped usher in the atomic age: Ernest Rutherford likened the atom to the solar system, with electrons wheeling around a central nucleus like planets around the sun. The twin revolutions of communication and genetics gave rise to one of the world's most powerful and popular scientific metaphors: the idea that the DNA molecule — the bearer of heredity from one generation to the next — was a code, or a blueprint, or even “the book of life”. Phrases like 'genetic code' have become so common that we frequently forget that it is a metaphor at all. [1]

Over the course of a series of essays, I'd like to investigate the metaphor of code, and how it came to dominate biology. Our investigation of the code metaphor must revolve around two related questions. Firstly, how did the nuts-and-bolts talk of cells, membranes, proteins and chemical bonds become engulfed in a sea of words like 'encoding', 'decoding', 'transcription', 'translation' and 'editing' — the language of telecommunication and cryptography? Secondly, despite the successes of the code metaphor, might it obscure some of the most intriguing and difficult problems in biology? Before we even get to biology, it will be useful to lay some groundwork, and understand what modern scientists mean when they use words like “information” and “code”. So in the first part of this series, I'd like to review the seminal work that led to the modern conception of information. [2]

Beginning in the 19th century, successive waves of technological innovation transformed the way humans communicated with each other. Telegraphs and telephones made the near-instantaneous, global exchange of information possible. The technology, and the sheer pace of change, brought new questions to the forefront of scientific thinking. What exactly is happening when we send a telegram, or make a phone call? When we send a message, what is being transmitted along the wires, or through the ether? In other words, what is information?

'Information' comes from the verb 'inform', and comes from the Old French word informer, which means 'instruct', or 'teach'. This in turn comes from the Latin word informare, which means 'to shape or form', and metaphorically, to 'outline' or 'sketch' an idea. For a message to be informative, it must have meaning — it must convey something instructive, formative, and structured. The concept of information was once inextricably linked with its use: the shaping of ideas in the mind of the recipient.

Our modern understanding of information required breaking this link: information had to be divorced from meaning and apprehended in naked form, as a certain sort of 'something' that could be sent from one place to another. As the pioneers of telecommunication realized, this 'something' was very different from matter or energy, the reliable cogs and gears of the old clockwork universe. The 'content' of a message doesn't seem to depend on the nature of the medium: after all, carrier pigeons, mechanical semaphores, and telegrams can all be used to transmit the same basic letters or words.

Information theory was born in 1948, when Claude Shannon published a paper called A Mathematical Theory of Communication. Shannon held that the “fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” What did Shannon mean by “message”? Was he thinking of telegrams or telephone calls? Those were the two technologies that ushered in the era of ultra-fast communication. Telegraphic codes such as Morse code translate each letter of the alphabet into a sequence of audible 'dots' and 'dashes'. The letter 'A' for example, is encoded as a dot followed by a dash, and the letter 'B' by a dash followed by 3 dots. Telegraph operators listen to the patterns of dots and dashes, decoding them back into English letters.

Morse code, first used in 1844, is sometimes thought of as the first digital signaling system, but strictly speaking, any signaling system that uses discrete symbols is digital. The expressive power of a discrete code comes from creating patterns that combined symbols. Given a fixed set of symbols, the more of them you string together, the more patterns become possible — the combinatoric possibilities explode. There are only 26 possible one-letter strings, but 676 (26^2) two-letter strings, and 17576 (26^3) 3-letter strings. These strings can stand for anything at all — numbers, letters, people, places, or things. To make use of such a code, all you need is a look-up table that lists each string and its corresponding meaning or purpose. The simplest possible discrete code is a binary code, which contains only two symbols: the binary digits or 'bits', 0 and 1. The small number of digits in binary code in no way constrains the number of symbols it can represent — one simply increases the number of bits in a string to increase representation capacity. A two-bit code can represent just four possible messages (2^2). Adding a few more bits to make a seven bit code, like ASCII, allows you to represent 128 (2^7) different symbols.

220px-SteinwayWelte1919 Binary codes are very difficult for most humans to decipher without the help of a look-up table, but they are especially useful when dealing with machines. The zeros and ones of binary code can also be interpreted as 'off' and 'on', or 'absent' and 'present'. Such simple symbols were relatively easy to use as controllers for machines. The punched cards that French weavers invented in the 18th century used a kind of binary code, in which a hole controlled a mechanism in a textile loom that could either be raised or lowered. Weavers could make a variety of different textile patterns, just by changing the pattern of holes on the cards. A century later, a similar binary mechanism was employed in the design of the player piano: a hole at a particular point in the piano roll triggered a hammer that would strike the corresponding key. The punched cards used by the French weavers inspired Herman Hollerith to use them as the inputs to his electric tabulating machines. He used these machines to compile the data from the 1890 US Census, making the process both faster and cheaper. [3]

Numbers and letters are already discrete symbols, so it makes sense that they can be encoded using a different set of symbols. But can the human voice, with all its subtleties of tone and inflection, be reduced to a discrete code? A sound wave is typically understood as analog: it is a continuous variation in energy that seems nothing like a telegraphic stream of discrete symbols. Telephones simply converted one type of continuous signal, sound waves, into another, electromagnetic waves. A continuous signal like a sound wave (or an image) could be reconstructed from a finite set of discrete measurements — called “samples” — of the signal's amplitude. Signal amplitude is a number, and can therefore be encoded using a discrete code. Shannon and other researchers proved that with a large enough number of samples, you could reconstruct a continuous signal without any loss.

Loss of what, though? It wasn't meaning. Shannon believed that “[t]he ‘meaning’ of a message is generally irrelevant”, at least from the perspective of efficient communication. Shannon may not have cared what a message meant, but he needed to quantify the special something that was being conveyed during communication (and that could sometimes be lost because of noise or faulty channels). He came up with a quantity that measured what was really being carried by communication lines. This quantity was denominated in the currency of bits, which could be used to encode any set of symbols. Shannon decided to call the thing he was quantifying 'information', though he knew that this might clash with the colloquial understanding of the word. His conception of information separated the act of communication, which involved transmitting discrete symbols, from the purpose of communication, which was to convey meaning. (I explain some of the details of Shannon's quantification of information in footnote [4].)

This was a radical but extremely useful simplification, helping to reveal the commonality between such disparate forms of communication as text, sound and image. They could all be broken into bits, and transmitted using the same technologies. This is why old phone lines could be repurposed to connect a computer to the internet, and why a hard drive is agnostic about whether it is used to store documents, music, photographs, or movies. And once engineers had a clear picture of what was being transmitted and stored, they could focus on improving the efficiency of telecommunication technology.

This realization also led to a powerful synergy with another great breakthrough of the 20th century: the birth of digital computers. Bits are not just good for faithfully storing and transmitting messages; they can also serve as the basis for creating brand new messages. And the way to achieve this is through computation: the art of using formal rules to manipulate patterns of symbols — such as numerals or bits — in order to create new patterns.

It required the genius of Alan Turing to demonstrate the true power of thinking about computation as symbol manipulation. Until the turn of the 20th century, a computer was a person who performed mathematical calculations. Human computers were greatly assisted by devices — the abacus, for example, had been around for thousands of years. By the 19th century the abacus was joined by a host of increasingly sophisticated mechanical devices: adding machines, stepped reckoners, differential analyzers, predictors and calculators. Mechanical computers were replaced by electro-mechanical machines, and finally by electronic solid state devices. Turing proved that mathematical computations, whether performed by a person or a machine, could all be boiled down to a basic mechanical essence. This essence was captured by a theoretical device now known as a Turing machine, which he unveiled to the world in 1936.

A Turing machine consists of four basic parts:

an infinitely long piece of tape divided into cells. Each cell contains a symbol, such as a letter or a number, drawn from a pre-specified set. The simplest set consists of binary digits or bits: '0' and '1' (or 'off' and 'on'). Sequences of bits can be used to encode any symbol or number you might care to name.
a head, that can read and write symbols onto the cells of the tape, and move along it from left to right.
a register that stores the current state of the Turing machine.
a table of instructions that specify what the machine does. The machine can either be instructed to erase or write a symbol on a cell, or move to a new cell position. Either way, the instruction can move the Turing machine into a new state.

The input to a Turing machine is encoded on the tape, and after the instructions have been executed, the output consists of the new contents of the tape. So a computation consists of using instructions to transform one pattern of symbols, the input, to another, the output. The problem with a Turing machine is that a new one needs to be built whenever a new computation is to be performed. But Turing proved that a modified version of this deceptively simple theoretical machine — now known as the Universal Turing machine — could perform all possible computations. The way he established this was to show that you could encode the instruction table, the input, and the output of any other Turing machine onto the Universal Turing machine's tape. Turing proved that the Universal Turing machine could perform all possible computations [5].

Turing showed that very simple mechanical steps, taken one by one, are sufficient to perform an infinite number of complex mathematical computations. A simple, finite set of symbols could be used to generate virtually any sequence one could think of. The Universal Turing machine was the theoretical seed from which grew the modern digital computer: the table of instructions is now called a program or a code [6], the tape is now called memory, and the roles of the head and the register are subsumed by the processor.

Shannon and the other pioneers of information theory showed that anything that could be communicated could be encoded by symbols, and that binary digits were the simplest possible symbols. They laid a theoretical foundation from which engineers could build more and more efficient systems for transmitting these symbols from point A to point B. Meanwhile, Turing and the computer scientists demonstrated how strings of symbols could be encoded in extremely simple — some might say stupid -— machines that were nevertheless capable of performing even the most complex transformations, thereby generating new strings of symbols.

In the next part of this series, we will examine how the inheritance of biological traits came to be seen as the communication of abstract hereditary symbols, and how this view in turn contributed to the notion that the DNA molecule is life's very own Turing machine.

Notes and References

[1] For an abstract discussion of the power of metaphor, see this blog post I wrote some years ago: Metaphor: the Alchemy of Thought. Incidentally, this was the post I submitted to the editors of 3QD when they were looking for new Monday columnists.

[2] James Gleick's masterfully researched book The Information is my primary guide here, with Wikipedia filling in some of the details. Quotes of Shannon come from these two sources.

[3] Hollerith went on to create the Tabulating Machine Company, which merged with two other companies in 1911 to become the Computing-Tabulating-Recording Company. The company changed its name to International Business Machines (IBM) in 1924.

[4] To work our way to Shannon's mathematical quantification of information, let's start by noting that the number of symbols and the length of the message determine the number of possible messages. If you use an alphabet of 26 letters, then you can create 26x26x26 (26^3) distinct three-letter strings of letters. Adding one more letter increases the possibilities by a factor of 26: there are 26^4 four-letter strings. The number of possible messages of a given length (M) is just the number of possible symbols (N) raised to the power of the length of message (N^M). Any set of symbols could be expressed using binary digits (bits). With four bits you can represent 2^4 possible binary strings. Shannon felt that the length of a message, rather than the number of possible messages, lined up best with most people's intuitions about quantities of information. The length of a tweet is 140 characters. With 27 symbols (26 letters plus a space), there are 27^140 possible messages — most of them meaningless, of course. Shannon would have argued that we typically think of two tweets as containing twice as much information, rather than 27^140 as much. Two volumes of an encyclopedia are typically viewed as containing twice as much information as one volume, not an astronomically larger amount. The quantity that captures this intuition is the logarithm of the number of possible messages. The base-N logarithm of N^M is just M.

Shannon extended the logarithmic intuition by incorporating the knowledge that real messages normally contain some redundancy. For example, in English, the letter “q” is almost always followed by “u”. Intuitively, if “q” is one of the symbols in the message, knowledge that “u” is next doesn't really add any new information. Shannon captured this intuition into his measure of information by incorporating statistical knowledge about the likelihood of different patterns of symbols. The statistical perspective allowed Shannon to make an analogy with the concept of entropy in statistical physics, creating the new concept of information entropy. Shannon developed a quantity called the “self-information”, which was the amount of information in a message drawn from a 'message space', i.e., a corpus of possible messages. The self-information of a message is defined as the negative logarithm of the probability of that message being picked from the corpus. The information entropy of an entire message space is the average self-information, i.e., the sum total of the self-information of each possible message, weighted by their probabilities. (This is about as far as we can go in describing entropy without using graphs and mathematical symbols!)

Entropy captures the disorderliness or unpredictability in a message space. Entropy is highest when all messages are equally probable. This leads to the paradoxical-sounding finding that a white noise image contains more self-information than a meaningful image. In a white noise image, each pixel's value is equally likely. In a meaningful image, there is a lot of redundancy, so large numbers of pixel-values can be predicted by the neighboring pixel-values. The pixels in a white noise image are not predictive of each other: to reproduce a white noise image perfectly you just have to know each pixel's value. But in a meaningful image, many of the pixels values are highly correlated, so they are often predictive of each other. Self-information seems less counter-intuitive when it is interpreted as a measure of how much information you need to gather in order to fully reproduce a message. A message that has low information content can therefore be compressed easily: its statistical structure can be used to reconstruct it, either perfectly or with some loss, and therefore it can be stored and transmitted using far fewer bits. The fact that music is more predictable than noise is one of the two reasons audio signals can be greatly compressed, facilitating among other things, the mp3 revolution. (The other reason is that humans cannot really hear all the information in an auditory signal, so an audio signal can be down-sampled to a great extent without degrading its subjective quality. Recently, reviewers of Neil Young's Pono music player have pointed this out — most people simply cannot appreciate the higher fidelity on offer.)

[5] There are, however, some numbers that cannot be computed (or named). Ironically, Turing introduced his universal computing machine in a paper that demonstrated why certain numbers were uncomputable. His work — as well as that of Alonzo Church, who arrived at an equivalent model of computation via a different route — was partly inspired by Kurt Gödel's incompleteness theorem, and in particular by Gödel numbering, which was used to reduce logic to arithmetic.

[6] The word “code” is derived from the Latin word codex, which meant a “book of laws”, so it is an ideal name for Turing's table of instructions. The word “code” has been used in English for centuries. It picked up the connotation of a secret message or cipher in the early 1800s. People have been sending secret messages — and attempting to intercept and decipher them — for millennia, and by the 19th century cryptography, the study of secret codes, was a rapidly developing field. Cryptographers may have been the first to realize that you could encode any message with the help of some symbols and a set of rules for manipulating them. A substitution cipher, for example, just involved systematically replacing each letter with a different one: a simple enough rule. More sophisticated codes just required more elaborate rules. Perhaps unsurprisingly, both Shannon and Turing were involved in cryptography during WWII — Shannon in making codes, and Turing in breaking them. Shannon said that his insights into communication and cryptography developed at the same time, and that “they were so close together you couldn't separate them”. Shannon and Turing came into contact in 1943, in the cafeteria at Bell Labs. Neither could talk to the other about their cryptography work, which was classified, but Turing did show Shannon his paper that introduced what became known as the Universal Turing machine. Shannon was reportedly impressed, and felt that the paper complemented his own ideas.

Image 1: An artist's rendering of a Turing machine

Image 2: A Steinway Welte-Mignon reproducing piano