Information: the Measure of All Things? Part II: The Genius in the Gene

170px-DNA_orbit_animated_static_thumbby Yohan J. John

In Part I of this series, we looked at how the concept of information brought communication and computation together. Claude Shannon and the other pioneers of information theory showed that discrete symbols could be used to encode and transmit almost any sort of message, and that binary digits were the simplest possible symbols. Meanwhile Alan Turing and the computer scientists demonstrated that strings of symbols could serve as the inputs to simple machines that could transform them into new and useful output strings.

Information theory arose from the question of how best to transmit discrete signals from point A to point B, with little to say about the purpose of the signals. Computability theory was born of a complementary quest: the study of how to transform and manipulate symbols in the service of some purpose. The birth of modern genetics reveals a similar complementary relationship. Two broad research questions arose in the tumult of 19th century biology: the question of how hereditary information was communicated from one generation to the next, and the question of how an organism develops, starting from the moment of conception. The first question gave rise to transmission genetics, while the second gave rise to developmental biology. These questions proved to be intimately related: progress in answering one was often contingent on developments in answering the other. The overlap between the answers to these questions was recognized in the twin roles of the DNA molecule: it has been described as both the vector of hereditary transmission, and the bearer of a developmental program that 'specifies' or even 'computes' the organism. We will now follow the path that led to the DNA molecule, a path that emerged from the confluence of evolutionary theory, cell biology, and biochemistry. [1]

The nature of heredity

An awareness of hereditary inheritance must have arisen very early in human culture. It can't have been very difficult to realize that the properties of an organism — its traits — tend to reappear in its offspring. Children typically share many features with their parents. Ancient peoples clearly recognized inheritance of characteristics in plants and animals too. Humans have been selectively breeding plants and animals since prehistoric times, gradually amplifying useful traits with every generation. The dog is believed to have been domesticated from a wolf-like ancestor between 11 and 16 thousand years ago. And rice and wheat were domesticated between 8 and 13 thousand years ago. The ability to make use of hereditary inheritance precedes the dawn of civilization.

Despite having rudimentary practical awareness of inheritance, most ancient cultures don't seem to have attempted to explain it. Only the Greeks offered some vivid speculations. Hippocrates proposed a process called 'pangenesis', through which 'seed material' traveled from all parts of the body to the reproductive organs. He also suggested that fertilization mixed the seed material from both parents. Though not very detailed, the pangenesis idea was a lot more promising than the theories that immediately followed it. Aristotle, for instance, chose to elaborate on an idea proposed by Pythagoras, who claimed that male and female parents made qualitatively distinct contributions to their offspring. Aristotle proposed that a male ordering principle (“eidos”) contributed the essential characteristics of the offspring, while the female merely provided the unformed raw material, female menstrual blood (“catemenia”).

Adolphe_Millot_oeufs-fixedNo one really tested or built upon these early Greek proposals for centuries. It was only after the Renaissance that mechanical accounts of nature started to gain currency, creating the intellectual climate that eventually gave birth to experimental science. But the scientific revolution didn't dispel all the confusions that had been handed down since antiquity. Well into the 19th century, many scientists maintained that only one or the other parent contributed all the genetic material to the offspring, in open defiance of the relatively straightforward observation that children could resemble both parents. In 1651 William Harvey proposed that all organisms came from eggs, leading to the philosophy of “ovism”, from the Latin word for egg, “ovum”. (This was perhaps a lucky guess, because the egg cell had not yet been discovered.) Ovism was dominant in the 16th and 17th centuries, until a challenger arose: “spermism”. Perhaps unsurprisingly, spermism was promulgated by the two discoverers of sperm, Antonie van Leeuwenhoek and Nicholas Hartsoeker.

Real progress in understanding hereditary inheritance had to wait until the theory of evolution was developed. Ideas about evolution stemmed from what initially seemed like a very different sort of question: the precise nature and origin of species. Under the influence of Western Christian theology, which had absorbed Plato's theory of Forms, many 18th century thinkers seemed to think that a species possessed an essential, archetypical form, and that the variations within a species — including the characteristics that could be passed from parent to offspring — were of secondary importance. Since the early natural philosophers focused on what was common to all members of a given species, the features of individual organisms and their transmission to offspring received very little attention.

The evolution revolution

The idea that species were constant and unchanging was widely held until the 1700s, largely because it fitted with the theological notion of a single unique moment of creation. By contrast, the appearance of new species would suggest that creation could happen multiple times. Perhaps even more threatening to the old world view was the idea that a species could go extinct: this could be construed as a slur against the divine perfection of creation. Despite opposition from natural theology, evolutionary ideas began to gather steam over the course of the late 1700s and early 1800s. The new and intimately linked sciences of geology and paleontology revealed that the world had apparently passed through a series of epochs, during which distinct species of plants and animals emerged and then went extinct. Various lines of evidence for the mutability of species began to converge, providing the raw material for an explosion of evolutionary thinking.

220px-Buffon_1707-1788In 1751 Pierre Louis Maupertius proposed that natural modifications occur during reproduction and accumulate over generations, producing races and new species. In 1767 James Burnett, Lord Monboddo, proposes that man had descended from primates, and that the environment shaped creatures' characteristics over long periods. In 1784 Denis Diderot speculated that species were always changing through a constant process of trial-and-error, anticipating aspects of natural selection. In 1794, Erasmus Darwin, Charles Darwin's grandfather, published Zoonomia, which suggested that “all warm-blooded animals have arisen from one living filament.” In his 1803 poem Temple of Nature, he described the rise of life from minute organisms living in mud to all of its modern diversity. One of the most influential theories of evolution was proposed by Georges-Louis Leclerc, Comte de Buffon (shown on the right), who published an encyclopedic survey of the natural sciences, beginning in 1789. Darwin later wrote that “the first author who in modern times has treated it [evolution] in a scientific spirit was Buffon”. Ernst Mayr referred to him as “the father of evolutionism”. Buffon once wrote “Not only the ass and the horse, but also man, the apes, the quadrupeds, and all the animals might be regarded as constituting but a single family”.

Buffon's theories were expanded upon by Jean-Baptiste Lamarck, who provided one of the first explanations of evolution: the process of inheritance of acquired characteristics. According to Lamarckian theory, traits that an organism might acquire during the course of its life could be transmitted to the offspring. So the length of a giraffe's neck was explained as the result of successive generations trying to stretch higher in order to eat the highest leaves of trees. [2]

Evolutionary theory reached a mature state of development thanks to the celebrated efforts of Charles Darwin (and also Alfred Russell Wallace). Darwin recognized that three simple facts about populations of organisms could be used to explain evolution. Firstly, individuals within a species possess a variety of different traits. Secondly, different traits contribute differently to the ability of individuals to survive and reproduce in a given environment. Thirdly, organisms pass on their traits to their offspring. As long as these three facts hold, evolution by natural selection must take place. A trait that confers a relative advantage in an environment, however slight, will gradually spread through the population, therefore appearing to have been 'selected' by nature. Variability in traits provides the 'raw material' for natural selection, so that in new environments a species can acquire new traits to deal with new challenges, eventually resulting in the creation of brand new species.

220px-Flickr_-_Rainbirder_-_High-rise_livingNatural selection explains the giraffe's neck without reference to the efforts of the giraffe's ancestors. Instead, one only needs to imagine that in every generation of giraffes, there is variability: some giraffes are born with slightly shorter than average necks, and some with slightly longer than average necks. A longer neck is adaptive, because it just happens to confer an advantage in the giraffe's environment. The longer-necked giraffes will be more suited to their environment, and will therefore have slightly more offspring, on average. If the trait of long-neckedness is heritable, then gradually the population of giraffes will come to have longer and longer necks.

Both hereditary inheritance and variability played central roles in Darwin's theory of evolution, but Darwin and the early evolutionary theorists didn't have any solid proposals for how they worked. Darwin was deeply aware of the explanatory holes in his theory, and filled them provisionally with a version of pangenesis. He speculated that invisible particles called “gemmules” arose in every part of an organism, and traveled to the reproductive organs where they accumulated in the germ cells (the sperm in males and the egg cell in females).

Darwin believed that inheritance worked through blending, so that an offspring's traits would simply be a mixture of the traits of the parents. This tied in with his belief that all variability was gradual and continuous, like the distributions of peoples' heights. But he also realized that there was a serious conceptual weakness in blending inheritance: it would lead to uniformity in the population. Blending inheritance predicts that the offspring of a longer-necked giraffe and a shorter-necked giraffe would have an intermediate-length neck. So blending inheritance in a stable environment would result in a melting pot, eventually erasing variability. But variability was crucial to the ability for natural selection to result in new traits and increased diversity. To preserve variability, Darwin eventually settled on a somewhat ad hoc combination of blending and Lamarckian evolution. The gemmules of Darwin's pangenesis theory could then “pick up” acquired traits, which might then restore the variability lost through blending. So gemmules in a giraffe's neck would somehow pick up on the fact that the giraffe was stretching to reach higher leaves, and would then travel to the sexual organs. Darwin's compromise solution to the problem of inheritance was an unfortunate backpedaling, because it weakened the force of the argument that natural selection alone could account for all evolutionary phenomena.

The Mendelian Casino

Gregor_Mendel_ovalMuch of the confusion surrounding heredity was resolved by the science which came to be known as genetics. The first principles of genetics were discovered by Gregor Mendel, a German-speaking Moravian who worked at a monastery in Brno. It is tempting to imagine that Mendel was some kind of rustic in the wilderness who just happened to be a good scientist. But this is far from the case. Mendel received an excellent education: he attended the University of Vienna, where he received the qualifications necessary to be a high school physics teacher. Mendel's professor of botany, Franz Unger, had adopted a theory of evolution in 1852 that emphasized the importance of studying varieties. His training in physics and botany no doubt contributed to his meticulous, quantitative approach to experiments [1]. Between 1856 and 1863 Mendel performing a series of painstaking experiments with the pea plant, through which he discovered certain statistical regularities in the way traits passed from one generation to the next.

Let's consider one of the seven traits that Mendel investigated. Pea seeds can be yellow or green — there are no intermediate yellowish-green pea-seeds. Purebred yellow-seeded pea plants will always produce yellow-seeded pea plants as offspring. The same holds for green-seeded pea plants. But when yellow-seeded plants are cross-pollinated with green-seeded plants, the offspring were always yellow-seeded. If these hybrid yellow-seeded plants are self-pollinated, however, 25% of the offspring are green-seeded. Subsequent generations are also 25% green-seeded. Mendel came to three crucial conclusions from his breeding experiments: (1) that the inheritance of traits is determined by discrete units or factors that are passed on unchanged from one generation to the next, (2) that an individual inherits one such factor from each parent, for each trait, and (3) a trait may not show up in an individual, but can still show up in the next generation. [3]

To capture the statistical nature of Mendel's findings, it might help to think of heredity as a kind of card game — the cards are Mendel's factors. For a given trait, the male and female each have two cards which may be the same or different. Each offspring is randomly dealt one card from each parent, to make up its own set of two cards. Which trait actually shows up in the offspring depends on which is the higher card of the two. In the pea-seed color game, yellow always trumps green. So whenever an offspring has a yellow card, it always ends up yellow-seeded. A yellow-seeded plant may still have a green card in its hand — this can only be known in the next generation. Imagine we have such a plant: it has one yellow card and one green card. Let's imagine that this hybrid plant is bred with another hybrid plant that has two yellow cards. Each of the third-generation offspring has a 1 in 2 chance of getting the greed card from the hybrid parent, and a 1 in 2 chance of getting a green card from the purebred parent. The chance of getting two green cards is 1 in 4. So on average, 25% of the offspring will have green seeds.

To extend this card game metaphor to the whole host of traits that an organism might display, imagine that there are many suits of cards, one for each trait, and that a 'hand' consists of two cards from each suit. In each suit, the highest card is 'played', so that it is expressed as a trait. The genotype — the genetic inheritance — is the full set of cards dealt. The phenotype — the developed organism — is the result of playing the highest card for each suit. Mendel studied seven traits in the pea plant. So the pea plant version of Mendel's card game involves each organism receiving 14 cards: 7 from each parent. This set of 14 cards contains 2 cards from each of the 7 suits. Of the 14 cards, seven cards are 'played'. Each card played is the higher of the two cards. The lower card stays in hand, and may be dealt during the next round of the game, i.e., it may be transmitted to the next generation. (As it turns out there is actually a game called gene rummy, which illustrates genetic principles.)

NgramsMendel's results reveal why information must have seemed like an excellent metaphor for heredity. Transmitting discrete hereditary traits between generations sounds a lot like transmitting discrete symbols between two points. It makes sense that the use of phrases like “genetic information” exploded in the years following Shannon's paper. (See the Google Ngrams result to the left.)

In 1889, still unaware of Mendel's work, the Dutch botanist Hugo de Vries coined the term “pangen” — derived from “pangenesis” — for the smallest particle representing one hereditary trait. Two decades later the Danish botanist Wilhelm Johannsen shortened this to 'gen' ('gene' in Engish), and applied it to Mendel's discrete factors — the suits in the hereditary card game. The various possible card values in each suit were named 'allelomorphs' — later shortened to 'alleles' — by William Bateson (who also coined the term 'genetics'). The 'higher' allele in a pair is called dominant, and the 'lower' allele is called recessive. Here are Mendel's laws, expressed in the modern terminology:

  • The law of segregation: During gamete (sex cell) formation, the alleles for each gene segregate from each other so that each gamete carries only one allele for each gene.
  • The law of independent assortment: Genes for different traits can segregate independently during the formation of gametes.
  • The law of dominance: Some alleles are dominant while others are recessive; an organism with at least one dominant allele will display the effect of the dominant allele.

The equivalent laws in the Mendelian card game are as follows:

  • The law of segregation: For each suit, only one of the two cards in each parent's hand is dealt to the offspring.
  • The law of independent assortment: The suits are independent of each other, so dealing is separate for each suit.
  • The law of dominance: In each suit, one card is higher than the other (dominant), and gets played, while the other card doesn't get played (recessive).

Even before Mendel's laws were rediscovered, useful data on the physical basis of heredity began to accumulate. Breeders of plants did perform experiments of the sort that Mendel was so good at, but they failed to recognize any statistical regularities that pointed to discrete units of inheritance. More success was found in approaching the problem from the other end, as it were. Rather than studying the broad statistical properties of traits in populations, some biologists chose to investigate the building blocks of individual organisms: cells. The story of how cell biology and Mendelian ideas fused to give rise to modern genetics is a complex story, so it may help to keep the card game analogy in mind as we move along. Mendel's laws revealed the broad statistics of the card game, but the details needed to be worked out. Specifically, what were the mechanics of dealing genetic cards? How did the offspring receive exactly one card for each suit from the mother and the father? And perhaps most importantly, what were the cards made of?

From cell to nucleus to chromosome

Biological cells were first seen in the 1660s by Robert Hooke (who coined the term “cell”), and soon after by Antonie van Leeuwenhoek. By the 1830s, microscopes had turned up enought evidence for Theodor Schwann and Matthias Jacob Schleiden to postulate two basic tenets of biology: (1) All living organisms are composed on one or more cells, and (2) The cell is the most basic unit of life. A third postulate to cell theory was added by Rudolf Virchow in 1855: (3) All cells arise only from pre-existing cells. This was inferred from the observation that new cells were formed by binary cell division. Working backwards, this meant that every organism originated in a single cell. It took a few more years to work out how that primordial cell of an organism came into being at the moment of fertilization.

PreformationSome of the pieces in the fertilization puzzle had been around for quite some time. Sperm were discovered in 1678 by Antonie van Leeuwenhoek and Nicolas Hartsoeker. Initially Leeuwenhoek thought they were parasites living in the semen, but later came to believe that each sperm contained an embryo in miniature. He used a plant metaphor to understand development: he saw the sperm as a seed, and the female as the nutrient soil from which the seed would grow. The other co-discoverer of sperm, Nicolas Hartsoeker, thought he could see a tiny but fully-formed human infant inside each sperm, even providing diagrams (shown on the right). Spermists like Leeuwenhoek and Hartsoeker debated with the ovists, who thought that it was the egg cell in the mother than carried the preformed embryo. Despite these early glimmers of partial insight, a synthesis of the spermist and ovist positions had to wait for almost two centuries. The idea that sperm were parasites lingered on, and it was only in the 1870s that it was established that the entry of the sperm into the egg cell led to fertilization.

The nucleus of the cell, which was visible even in the days of Leeuwenhoek and Hartsoeker, was eventually recognized as a crucial player in cell division and heredity. Late Victorian biologists observed that when a sperm entered the egg cell, its nucleus fused with the egg cell's nucleus to create a new nucleus. The purpose of the nucleus was gradually uncovered by a pioneering generation of biologists, particularly Oscar Hertwig, Edouard Van Beneden, Walther Flemming, Heinrich von Waldeyer, Eduard Strasburger, Theodor Boveri, and August Weismann. They discovered that nuclei contained a structure that strongly absorbed certain dyes. This substance — named “chromatin”, from the Greek word for 'color' — arranged itself into threadlike structures in the nucleus. The threadlike structures were later name 'chromosomes' (colored bodies) by von Waldeyer. Chromosomes quickly proved to be the most intriguing elements in the cell. It was discovered that the number of chromosomes remained constant from cell to cell within an organism, from organism to organism within a single species, and from generation to generation within a given species.

1023px-Major_events_in_mitosis.svgCell division helped explain this striking constancy. Flemming and van Beneden both saw that during normal cell division — mitosis — each chromosome splits in two, and each daughter cell's nucleus receives one of the two parts. On the basis of these discoveries, Flemming and Strasburger proposed a nuclear counterpart to the third postulate of cell theory: all cell nuclei come from a pre-existing nucleus.

Towards the end of the 19th century, the link between chromosomes and heredity was starting to be recognized by several researchers. Discovering the link required asking the right sorts of questions. Why did each chromosome seem to split exactly in two? Why didn't it split into unequal pieces? One person who did ask such questions was Wilhelm Roux, who proposed in 1883 that the chromosomes contain a linear arrangement of qualitatively different genetic particles. He proposed that this structure helps ensure that daughter cells receive equal amounts of chromosomal material. In 1891 Boveri proposed that in the course of cell division, one half of the chromosomes are of strictly paternal origin, the other half of maternal.

The rediscovery of Mendel's work in 1900 by Hugo de Vries, Carl Correns and Erich von Tschermak allowed existing knowledge of cell biology to be seen in a radical new light, and pointed the way forward for biology in the new century. Mendel's work stimulated the search for the material basis for his discrete factors — a basis that could also provide a mechanism for equal contributions from the maternal and paternal genetic material. This line of thinking allowed several puzzle pieces about chromosomes collected over the preceding decades to rapidly fall into place, allowing for a coherent theory of inheritance. Theodor Boveri and Walter Sutton independently proposed that the way each chromosome splits in two during cell division can serve as the 'dealing' mechanism in Mendel's card game of inheritance.

691px-Mitosis+meiosisConnecting chromosomes with inheritance helped explain a mystery associated with chromosomes. If the chromosome number of a cell is constant, and the nuclei of the sperm and egg cells fuse during fertilization, then shouldn't the number of chromosomes double with every generation? This was clearly not observed, so an explanation was called for. It was found in the details of a second, special type of cell division — meiosis. During meiosis the number of chromosomes is halved. Meiosis results in the creation of the gametes, the sperm and egg cells, which possess half the number of chromosomes as normal (somatic) cells. Meiosis in plants and animals takes place in the reproductive organs. The sex cells are very different from other cells in the body: they only possess a single chromosome of each type, rather than a pair. Meiosis turns out to be an ideal mechanism for the process of dealing cards in the Mendelian card game. An offspring receives only one genetic card per suit from each parent, and not two. If the nuclei of cells are the containers for the cards, there must be special containers that only contain one card per suit rather than the usual two. These special containers are the nuclei of the sex cells, and they are the card dealers in the Mendelian Casino.

Gradation, Saltation, and Mutation

The rediscovery of Mendel's work revolutionized cell biology and triggered the rise of a new science: genetics. But initially, genetics did not fit well with Darwinian evolution. Genetics just happened to arise during the “eclipse of Darwinism”, a period when most biologists accepted evolution, but doubted that natural selection was its primary mechanism. Darwin believed that evolution proceeded gradually: he frequently used the aphorism natura non facit saltum (Latin for “nature does not make a jump”). Saltation — the sudden, discontinuous appearance of new traits and of new species — was a popular alternative to gradual evolution, and seemed to be supported by the fossil record. There did not seem to be any transitional forms. Early discoveries in genetics also seemed to line up more closely with saltation. Mendel's laws, for instance, only made sense if the units of hereditary transmission were discrete. The manner in which variability arose also seemed to support discrete jumps. In 1886 de Vries discovered new forms of the evening primrose plant; he gave the name “mutation” to the process by which such discrete genetic changes took places.

A solution to the saltation/mutation challenge for natural selection was discovered over the first few decades of the 20th century, helping lay the groundwork for the Modern Evolutionary Synthesis that was produced in the late 1930s and early 1940s. The first step towards the synthesis was made by the pioneers of population genetics: Ronald Fisher, J.B.S. Haldane and Sewall Wright. Ronald Fisher published a paper in 1918 that showed how continuous variation could arise as the result of many discrete genes. (In Part I, we saw how the Nyquist-Shannon sampling theorem bridged the gap between continuous signals and discrete signals. In an analogous manner, the population geneticists showed how multiple genes could bridge the gap between continuous gradual variation and discrete Mendelian genes.)

The Chromosomal Shuffle

Just like natural selection, Mendelian genetics needed some work in order to be brought into the 20th century. The law of independent assortment, for instance, turned out to be wrong in general. It was discovered that some traits had much higher correlations than would be expected if they were inherited independently of each other. For example, in 1911 T. H. Morgan discovered that some traits in the fruit fly are sex-linked. A few years later his collaborator Calvin Bridges discovered that sex-linked genes were carried by the X chromosome. Morgan's group made the brilliant realization that the correlations in expression of traits could be used to create a concept of genetic distance: genes that were closer together on a chromosome had a higher probability of being inherited together. Armed with the idea of genetic distance, Morgan and his student Alfred Sturtevant created the first chromosome map in 1913, providing powerful evidence that genes are arranged along the chromosome in a linear sequence.

At this point the Mendelian card game analogy starts to break down. The suits in a deck of cards are necessarily independent: they're not attached to each other, after all. If assortment is not independent, then there is less randomness when dealing the genetic cards. An important question now arises: if traits can often be linked, how did Mendel manage to come up with his laws? As it turns out, the pea plant has exactly seven chromosomes, and the seven traits Mendel studied were located on separate chromosomes. It may seem as if Mendel was supremely lucky, but there is evidence to suggest that before even commencing his breeding experiments he had conducted preliminary studies to pick the most promising traits.

800px-Morgan_crossover_1Genes on the same chromosome are more highly correlated than would be expected by chance, but the correlation is not 100%. If this were the case, then distinct sets of Mendelian traits would occur together or not at all. A fascinating property of chromosomes accounts for the presence of variability even in the face of chromosomal linkage. During meiosis — the special cell division that results in the sex cells — genetic material is exchanged between corresponding maternal and paternal chromosomes through a process called chromosomal crossover. When the sperm and egg cells are created, the parental genetic material is shuffled a bit. This prevents genes on the same chromosome from being 100% correlated. It was the variability of genetic correlations that provided Morgan and his colleagues with a way to define genetic distance on a chromosome: the more highly correlated two genes were, the closer they were on the chromosome.

The Road to the Double Helix

Eukaryote_DNA-en.svgThe chromosomal theory of inheritance made considerable progress in the early decades of the 20th century, but only so much could be understood through cytology and breeding experiments. The focus eventually switched to the structure of chromosomes, which required the techniques of chemistry, rather than biology. Biochemists had determined that chromosomes contained DNA (deoxyribonucleic acid) and proteins, but it took some time to realize that heredity depended on DNA and not proteins. In the meantime, the chemical composition of nucleic acid (DNA and the related molecule RNA) was elucidated. Between 1885 and 1901 Albrecht Kossel isolated and named the four nucleobases — adenine (A), cytosine (C), guanine (G) and thymine (T) — that are crucial to the structure of DNA. In 1919 Phoebus Levene elucidated the basic nucleotide structure of nucleic acid: the phosphate-sugar-nucleobase unit. (In RNA, the sugar is ribose, and in DNA it is 2-deoxyribose.) Levene was unable to provide an accurate model of the structure of the DNA molecule, however. The sheer size of the DNA molecule could not be appreciated until the rise of polymer chemistry after 1922.

As soon as polymer chemistry raised the possibility of long organic macromolecules, speculative ideas about the molecular basis of heredity began to pop up. Ernst Mayr writes that the polymer idea 'seemed to fulfill the old dream of so many mechanistic biologists that all biological material “ultimately consists of crystals.”' In 1927 Nikolai Koltsov suggested that traits are inherited via a “giant hereditary molecule” made of “two mirror strands that would replicate in a semi-conservative fashion using each strand as a template”. In 1935, one of Koltsov's collaborators, Nikolay Timofeev-Ressovsky published a paper with Karl Zimmer, and Max Delbrück on the nature of genetics and mutation. This paper in turn influenced the physicist Erwin Schrödinger, who wrote the popular book 'What is Life?', which contained the idea that genetic information was mediated by some kind of 'aperiodic crystal'. 'What is Life' may also have contained one of the first descriptions of genetic information as a 'code'. Schrödinger's vision of an aperiodic crystal, as well as his metaphor of a genetic code, would provide inspiration to both Francis Crick and James Watson, the celebrated co-discoverers of the DNA structure.

Griffith_experiment.svgThe first definitive evidence for a role of DNA in heredity was provided in 1944 by the Avery–MacLeod–McCarty experiment. This experiment was based on an earlier experiment conducted by Frederick Griffith in 1928. Griffith had shown that bacteria could transmit genetic information. Griffith used two strains of the pneumococcus bacteria: a lethal strain that killed mice and a harmless strain that didn't. He then investigated what dead bacteria would do. Neither alone harmed the mice, but the dead lethal bacteria in combination with the living harmless bacteria did kill the mice. This showed that some “transforming principle” from the lethal strain had been taken up by the harmless strain. The Avery–MacLeod–McCarty experiment was very similar, but instead of dead bacteria, purified DNA was employed. In this way DNA was shown to be the “transforming principle”. Further evidence of the genetic role of DNA was provided by the Hershey-Chase experiments, published in 1952. They showed that when bacteriophages (a type of virus) infect a bacterium, the DNA enters the host cell, but most of the protein does not.

Photo_51_x-ray_diffraction_imageWith DNA now at the center stage of biochemistry, the race was on to determine its structure. How did the basic nucleotide units link together to form the DNA macromolecule? Three groups vied with each other to answer this question: Linus Pauling's team at CalTech, Maurice Wilkins and Rosamund Franklin's team at King's College, London, and the Cambridge duo of James Watson and Francis Crick. Linus Pauling may have seemed the likely winner, since in 1948 he discovered that many proteins possessed helical shapes. But two crucial experimental findings helped Watson and Crick win the race. The first was 'Photo 51': an x-ray diffraction image of a DNA molecule. Wilkins and his collaborators were the first research group who were able to take clear x-ray diffraction images of DNA. Photo 51 was one such image, taken in 1952 by Raymond Gosling, a PhD student working under Franklin. Wilkins showed this image (shown on the right) to Watson and Crick.

Watson-Crick-DNA-modelThe second crucial finding was Chargaff's rules. In 1947 Erwin Chargaff showed that the proportions of the four nucleotides vary between one DNA sample and the next, but that for particular pairs of nucleotides — adenine and thymine, guanine and cytosine — the two nucleotides are always present in equal proportions. He explained these findings to Watson and Crick in 1952. Watson and Crick realized that this suggested a fixed relationship between the nucleobases: adenine paired with thymine and guanine paired with cytosine. Each nucleobase had a complementary nucleobase. With the help of Photo 51, Chargaff's rules, and some pieces of wire and flat metal, Watson and Crick were able to construct the famous double helix model of DNA structure, which they announced to the world in 1953 in the journal Nature. Their model proposed that the paired bases were arranged like the steps of a spiral staircase. Experimental evidence for their model was published in the very same issue of Nature, in a series of five papers. These included one report by Franklin and Gosling, and another by Wilkins and two colleagues.

Like all great scientific breakthroughs, the double helix structure helped answer multiple questions simultanously. The double helix was not just a chemical structure. As Watson and Crick themselves stated in their 1953 paper: “It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.” This mechanism received experimental support in 1958. (The process can be quite complicated to describe, but the diagram on the right conveys the basic intuition. Animated videos on YouTube are also quite helpful.)

DNA_replication_split.svgOnce the details of genetic replication were understood, Ernst Mayr felt confident enough to declare that “story of transmission genetics was completed.” [1] And what a story it was! Over the course of a century, researchers progressively zoomed in on the material basis of heredity — from the cell to the nucleus to the chromosomes to the DNA molecule.

The genetic code

With trasmission genetics seemingly a solved problem, the field of molecular biology could devote most of its time to the second major question in genetics: development. Once the genetic material has been transmitted, how does it constribute to the structure and behavior of the cell, and by extension, the organism? In other words, what else does DNA do besides carry genetic information?

To discover how genetic information was put to use in an organism, the link between DNA, RNA and proteins —the three essential macromolecules — had to be established. All three molecules are polymers, but their distinctive properties support a division of labor among them. Proteins are composed of amino acids, of which there are 21 in eukaryotes (multi-celled organisms like plants and animals). Proteins can fold into a vast number of three-dimensional shapes, so they can interact with other molecules in a variety of ways. The chemical diversity of amino acids also allows proteins to act a enzymes, the catalysts of biochemical reactions. Since proteins can perform both physical and chemical actions, it makes sense that they are the main actors on the cellular stage. The name “protein” comes from the Greek word “proteios”, which means “primary”, “in the lead” or “standing in front”. So biologists recognized early on that proteins are the workhorses of biology. As one resource on proteins puts it: “Proteins do everything. If we pretend that your body is a house then proteins are the walls, floor, ceiling, wiring, plumbing, furniture, builders, repairmen, and residents.” [4]

Blausen_0773_RNARNA molecules are similar to DNA molecules, except that the nucleobase thymine is replaced by uracil (U), and instead of being double-stranded, they are single-stranded. Like proteins, RNA molecules can fold into myriad 3-D forms, allowing them to serve several different functions. But since RNA molecules are less chemically diverse than proteins, they are generally less effective as catalysts. DNA molecules are the least chemically reactive of the three, and the most stable. This stability may be one reason why DNA serves as such an effective carrier of hereditary information. [5]

By the early 1960s it was recognized that DNA is central to the production of proteins, acting in concert with RNA. The nucleotides were referred to by the first letters of the corresponding nucleobases: G, A, C, T for DNA, and G, A, C, U for RNA. These were interpreted as the discrete symbols that comprised the genetic code. The fact that permutations of discrete symbols seemed to be central to the new science of molecular biology let to the importing of ideas and terminology from the other new sciences of the mid 20th century: information theory and computer science. The ideas may have assisted the research, but the terminology unfortunately contributed to the increasingly hyperbolic descriptions of the scope of molecular biology. The genetic letters were frequently viewed as the script in which the “language of life” is written.

In 1961 Crick, along with colleagues Sydney Brenner, Leslie Barnett and R.J. Watts-Tobin, established that each amino acid corresponded to three bases. The size of the genetic 'word' had been determined. The next step was to determine what the words meant. Framed as a quest to 'decypher' the genetic code, the project drew mathematicians, physicists and cryptographers into biology in larger numbers than ever before. But the eventual 'decoding' was completed by the biochemists Har Gobind Khorana, Marshall W. Nirenberg and Robert W. Holley. Their work determined which 3-letter sequence corresponded with each amino acid.

The process of turning a gene into a protein is called gene expression, and involves several stages with names like 'transcription', 'editing' and 'translation'. Clearly the metaphors of language, information and code have become deeply embedded in molecular biology. It seems clear that these metaphors have helped researchers frame questions and understand the results of experiments. But all metaphors and analogies break down eventually. Is there any scientific meaning in the claim that the DNA molecule encodes a message written in the “language of life”? And does it make sense to say that, armed with an organism's DNA sequence and a powerful enough computer, we will one day be able to “compute the organism”? [6]

As we have seen, the development of genetics revolves around two questions: (1) how hereditary information is communicated from parent to offspring and from cell to cell, and (2) how hereditary information contributes to the structure and behavior of the organism. The discovery of the DNA structure was the culmination of a century of progress in answering the first question. This has lead many researchers to the conclusion that trasmission genetics is a closed chapter. The subsequent cracking of the genetic code was the first tentative step towards answering the second question. No one thinks that developmental or functional genetics is a closed chapter, but many researchers nevertheless promote the idea that humanity is on the verge of deciphering an even more elaborate code: the language (or the book, or the blueprint) of life, which determines everything important about an organism's form and function.

In part III of this series, we will look at why both these chapters in the biology textbooks may be in need of some editing. There may be more to trasmission genetics than the letters of the DNA molecule. And more importantly, there appears to be quite a bit of 'information' relevant to the structure and behavior of an organism that does not reside in the DNA molecule.

Nature, it seems, always has a few cards up her sleeve.


Notes and References:

[1] Ernst Mayr's book The Growth of Biological Thought is my primary historical reference. This is supplemented by Wikipedia.

[2] Lamarckian evolution was deemed as good as dead in the 20th century, but recent discoveries in epigenetic inheritance, raise the possibility of a resurgence of some forms of Lamarckism in the 21st century.

[3] Mendel's Genetics

[4] Proteins in the Cell

[5] DNA, RNA,and proteins

[6] Richard Lewontin tells a story in which Sydney Brenner makes a comment of this sort. You can hear it at the start of this excellent lecture: Gene, Organism and Environment.


Structure of the Gametes

Celebrating an unsung hero of genomics: how Albrech Kossel saved bioinformatics from a world of hurt

Crick and Watson's 1953 paper


All images taken from Wikipedia, except the photograph of Watson and Crick in from on their DNA model, which was found at The History Blog, and the diagram contrasting mitosis and meiosis, which was found at