Sorting Out the Genome

To put your genes in order, flip them like pancakes.

Brian Hayes in American Scientist:

Fullimage_200782114412_846All through the 1930s, members of the famous Drosophila group at Caltech roamed the American West collecting fruit flies for genetic analysis. One discovery to come from these expeditions was an abundance of genetic inversions: blocks of genes that had flipped end-over-end. Flies from one geographic region might have a certain set of genes in the order a b c d e f g, but a population elsewhere could harbor the sequence d c b a e f g, with the first four genes inverted. A further reversal, affecting a different block of genes, could produce an ordering such as d c f e a b g. Theodosius Dobzhansky and Alfred H. Sturtevant, two of the leading Drosophilists,pointed out that such genetic rearrangements could help in reconstructing the family tree of the flies. More reversals would indicate greater evolutionary distance.

The variations discovered by Dobzhansky and Sturtevant could be explained by reversing just one or two blocks of genes. Later, when gene order was studied in a broader range of organisms, more complex patterns emerged. In the 1980s Jeffrey D. Palmer and Laura A. Herbon of the University of Michigan were measuring the pace of evolutionary change in plants of the cabbage family. Looking at the DNA in mitochondria (the energy-producing organelles), they found that the genes had been jumbled by multiple random reversals. Transforming cabbage into turnip took at least three reversals. More distant relatives such as cabbage and mustard appeared to be separated by a dozen or more reversal events—they could only estimate how many.

If these genetic flip-flops are to serve as an evolutionary clock, we need a reliable way to count them. Given two arrangements of a set of genes—say a b c d e f g and f e b a g c d—how do you determine what sequence of reversals produced the transformation? This example has a three-step solution, which you can probably find in a few minutes with pencil and paper. For larger genomes and longer chains of reversals, however, trial-and-error methods soon falter. Is there an efficient algorithm for identifying a sequence of reversals that converts one permutation into another?

The genetic reversal problem lies at the intersection of biology, mathematics and computer science. For some time, the prospects for finding a simple and efficient solution seemed dim, even with the most powerful tools of all three disciplines. But the story has a happy ending. A little more than a decade ago, computing gene reversals was still a subtle research problem; now it can be done with such ease that it’s a matter of routine technology. If you need to know the “reversal distance” between two genomes, you can go to a Web site and get the answer in seconds.

More here.