Veronique Greenwood in Nautilus:
When Francisco Mojica was 25, he supported himself by tracking bacteria in the Mediterranean off the coast of a tourist haven in southeastern Spain. At the time, he was a doctoral candidate at the University of Alicante, where he focused on a much stranger microorganism than those he was searching for in the ocean: Haloferax mediterranei, a single-celled creature that thrives in water so salty it kills almost everything else. “Even sea water is not salty enough for them,” he says. To understand this peculiar creature, Mojica, his advisor, and another graduate student were painstakingly sequencing bits of H. mediterranei DNA. This was the early 1990s—pre-Human Genome Project, pre-modern genomics—and it was frustrating work. When Mojica found bizarre, stuttering repeats of DNA bases, he assumed they’d screwed up somehow. “It’s impossible that you get exactly the same sequence many times!” he recalls thinking. But every time Mojica and his colleagues repeated the experiment, the same pattern—30 or so bases that appeared over and over again, separated by lengths of seemingly unrelated DNA—reappeared. Reading journal articles in the library, Mojica learned that a Japanese group had noticed something similar in the genome of E. coli a few years before. Despite the fact that the repetitions did not seem to be connected to H. mediterranei’s predilection for salt, he put a chapter on them at the end of his Ph.D. thesis. And by the time he turned the thesis in, he couldn’t stop thinking about them. There weren’t many people who shared his interest. Unexplained oddities are common in the genomes of most organisms, from humans to archaea, the group of microorganisms to which H. mediterranei belongs. Even after moving on to other subjects, Mojica remained fascinated by the fact that E. coli and H. mediterranei, which were only distantly related, both had repeats—and that the “spacer DNA” between the repeats was always about the same length despite having a wide variety of different sequences. What were these things for?
In 1994, between a pair of short-term positions, he returned to these single-celled curiosities and inserted extra copies of the repeats and spacers into H. mediterranei to see what would happen. The cells promptly died—“It was amazing!” he recalls fondly—and he wrote a paper suggesting the extra copies interfered with the cells’ ability to reproduce correctly. (He was wrong.) After he was hired to teach at University of Alicante in 1997, he tried, fruitlessly, to see if the same thing would happen in E. coli. Perhaps the repeats formed small loops in the genome for proteins to attach to? (Wrong again.) “Nothing worked,” he says. Still, in the years since he’d started his work, genome sequencing had gotten much easier. By the early 2000s, other people were starting to wonder about both the patterns that had intrigued Mojica and the genes around them, including Roger Garrett at the University of Copenhagen, Ruud Jansen at Utrecht University in the Netherlands, and Eugene Koonin at the Unites States’ National Center of Biotechnology Information (NCBI). When Mojica and Jansen struck up a correspondence, they began tossing around catchy names for the patterns, and on Nov. 21, 2001, they settled on CRISPR—an acronym for Clustered Regularly Interspaced Short Palindromic Repeats. Mojica finally found the key that was to unlock the origin of the spacers, and, along with it, the meaning of the puzzle that had transfixed him for so long, in 2003. While studying E. coli, he realized that the spacers were pieces of DNA from viruses that were retained in the genome of the host species—and that some of the microbial strains that carried the spacers were either already known to be resistant to infection or had no record of ever being infected.