How DNA could store all the world’s data

Andy Extance in Nature:

Digital_DNA It was Wednesday 16 February 2011, and Goldman was at a hotel in Hamburg, Germany, talking with some of his fellow bioinformaticists about how they could afford to store the reams of genome sequences and other data the world was throwing at them. He remembers the scientists getting so frustrated by the expense and limitations of conventional computing technology that they started kidding about sci-fi alternatives. “We thought, 'What's to stop us using DNA to store information?'” Then the laughter stopped. “It was a lightbulb moment,” says Goldman, a group leader at the European Bioinformatics Institute (EBI) in Hinxton, UK. True, DNA storage would be pathetically slow compared with the microsecond timescales for reading or writing bits in a silicon memory chip. It would take hours to encode data by synthesizing DNA strings with a specific pattern of bases, and still more hours to recover that information using a sequencing machine. But with DNA, a whole human genome fits into a cell that is invisible to the naked eye. For sheer density of information storage, DNA could be orders of magnitude beyond silicon — perfect for long-term archiving.

“We sat down in the bar with napkins and biros,” says Goldman, and started scribbling ideas: “What would you have to do to make that work?” The researchers' biggest worry was that DNA synthesis and sequencing made mistakes as often as 1 in every 100 nucleotides. This would render large-scale data storage hopelessly unreliable — unless they could find a workable error-correction scheme. Could they encode bits into base pairs in a way that would allow them to detect and undo the mistakes? “Within the course of an evening,” says Goldman, “we knew that you could.” He and his EBI colleague Ewan Birney took the idea back to their labs, and two years later announced that they had successfully used DNA to encode five files, including Shakespeare's sonnets and a snippet of Martin Luther King's 'I have a dream' speech¹. By then, biologist George Church and his team at Harvard University in Cambridge, Massachusetts, had unveiled an independent demonstration of DNA encoding². But at 739 kilobases (kB), the EBI files comprised the largest DNA archive ever produced — until July 2016, when researchers from Microsoft and the University of Washington claimed a leap to 200 megabytes (MB).

More here.