by Jonathan Kujawa
(this is the sequel to last month's 3QD essay on the Pancake Problems)
I frequently come across a rafter of wild turkeys on bike rides through the countryside near my home. This particular group is recognizable thanks to having a peahen as an honorary member. Just this morning I was treated to a startling surprise: the peahen was busily herding a brood of chicks! I would have thought peacocks and turkeys were too distantly related to successfully breed. Apparently nobody told the peahen. I haven't seen any other peacocks in the neighborhood, so it would seem that she is more than friends with one of her turkey buddies. According to the internet, peacock/turkey hybrids (turcocks? peakeys?) are a thing which can happen.
Going by looks and their natural geographic ranges, my wrong guess was that peacocks and turkeys should be pretty distant on the tree of life. In the not-too-distant past, classification of species depended on such observational data.
Nowadays we can dig directly into the DNA to look for answers about relatedness. In the past decade it became possible to sequence the entire DNA of an organism. Not only that, but it's become fast and cheap. In fifteen years we've gone from the Human Genome Project taking thirteen years and $2.7 billion dollars to sequence the human genome to now being able to do it in days for $1,000. The progress in this field puts Moore's Law to shame.
It's one thing to have the data, it's another to put it to use. To deal with the flood of information pouring out of DNA sequencers an entirely new field called computational molecular biology has sprung up. It's a wonderful combination of biology, mathematics, and computer science.
A good example of this is turnips. Looking at them in the garden you might guess that they are more closely related to radishes than cabbage. In the 1980s Jeffrey Palmer and his collaborators looked at the mitochondrial genomes of turnips and cabbage and found that the genes they contained were nearly identical. What was different was the order of those genes [1]. The random mutations which occurred over the years didn't change the genes themselves, only their position in the DNA.
Even better, Palmer and company saw that the kind of rearrangements which occur are of specific kind. When a mutation occurs, what happens is that a segment of DNA consisting of some number of genes is snipped out, flipped around, and put back in, now in reverse order. For example, if genes were the numbers one through five, a typical sequence of mutations might look like:
Here at each step the segment of genes to be snipped out and reversed is indicated with an underline. Because each mutation reverses the order of some of the genes, folks call it a reversal.
