Revolutionizing Data Storage with DNA Technology

With a smartphone, you can look up facts, stream videos, check out Facebook, read tweets and listen to music. But all of those data aren’t stored on your phone. They are kept somewhere else, perhaps half a world away. For now, companies like Microsoft, Amazon and Facebook store those data on magnetic tapes or other media. It’s an ever-growing library of data that takes up lots of space in sprawling data centers. And even the best storage media last only a few decades at most. Then they need to be replaced. But there may be a better way to keep and guard information, some researchers say. Store and retrieve it — with DNA.

DNA holds the genetic information that tells each cell inside a living being what to do. Each side of a DNA molecule’s twisted, ladder-like structure is made of four chemical building blocks. They’re called nucleotides and are known as A, T, C and G. (The letters stand for adenine, thymine, cytosine and guanine.) In various combinations, these letters spell out the code for our genes.

Computers currently store data as series of 0s and 1s. But data also can be written using the four building blocks of DNA, says Luis Ceze. As a computer architect at the University of Washington in Seattle, he studies how computers and data systems should be designed and function. Labs can make strands of synthetic DNA, one nucleotide block at a time. Combinations can be developed, as a code, to stand for numbers, letters or other digital information. Later, other lab equipment can translate those building blocks along a strand of DNA. In that way, they can decode the original data.

Why bother? DNA can hold lots of information in a tiny space. In theory, a volume of DNA the size of a sugar cube could hold as much data as a Walmart-sized storage center. Plus, Ceze says, unlike magnetic tape, DNA can last unchanged for thousands of years.

Work on DNA data storage started years ago. Ceze’s team has just added what’s known as “random access” to the method. It offers a way to find a specific file. Each data file gets its own unique “address.” It works in much the same way as a house number, street name and zip code guide a mail carrier to your home. The researchers add those digital addresses to each DNA strand holding data for a particular file.

Ceze’s team, which included people from Microsoft, reported its new advance on April 6 in Atlanta, Ga. The advance was presented at the International Conference on Architectural Support for Programming Languages and Operating Systems.

A borrowed tool

To search for a specific file in a large quantity of DNA, the Seattle team uses a tool called PCR. It’s short for polymerase (Puh-LIM-er-ase) chain reaction. Here’s how PCR works: DNA goes into a test tube, along with strings of nucleotides known as primers. Each primer is chosen to match the address sequences at the ends of selected DNA strands. Single nucleotides and a few other things are in the mix, too. The test tube then goes into a machine that heats and cools the soup of genetic material over and over.

Heating up double-stranded DNA separates it into single threads. After the sample cools down, the primers seek out and bind to the ends of the specific strands that scientists are interested in. Single nucleotides in the mix then bind to the rest of the strand.

Each time the heating and cooling cycle repeats, it’s like pressing start on a copying machine; the PCR duplicates DNA. These cycles repeat over and over and over, making millions of copies of the target DNA. Scientists describe this as “amplifying” the DNA.

PCR will copy desired snippets of DNA so many times that soon they greatly outnumber all of the rest of the genetic material in a sample.

Many scientists already use PCR. It’s used to copy the DNA found at a crime scene, for instance. That lets forensic scientists work with the DNA and compare it to other samples, such as one from a suspect. Similarly, environmental scientists might use PCR to amplify the foreign DNA they find in a river in hopes of matching it to a particular species of fish.

Making lots of copies of a specific bit of DNA can now help pick out a data file, Ceze says.

He compares the idea to trying to get a bowl of alphabet soup with only certain letters. Picking out individual letters would take a really long time. But suppose you were able to selectively copy, over and over, just the letters you liked. Eventually, nearly every scoop you took out of the bowl would contain just what you wanted. Likewise, PCR can make sure that the DNA picked out after the process is pretty much just what you had been looking for. Then lab equipment can read that DNA to decode its stored data.

PCR is a pretty standard tool in genetics research. But borrowing that tool to find specific DNA data files didn’t happen until Ceze took a break from his regular work and spent time in a microbiology lab. There, he learned about PCR. And that led to the team’s idea for random access. “You see two things, and then all of a sudden you see that they could be connected,” he explains.

Avoiding errors

Making and copying large amounts of DNA is “hard to control exactly,” Ceze says. So his team also built in a way to deal with errors. When data have been encoded into the fake DNA, overlapping parts of each section will go onto three separate DNA strands. In order to decode a file, a computer needs data from at least two of the three strands. That way, even if one strand has errors, the other two strands will still have saved the data.

The new system also doesn’t require the same accuracy for all types of data. Relaxing standards for some types of material makes it easier to store large files. For example, text files might require a very high level of precision. In contrast, most people won’t notice if a few pixels are off in yet another picture of their cat.

In lab tests, the system worked very well. The researchers successfully coded video files of people talking about war crimes in the African nation of Rwanda. When they later searched for those files, they found them easily. The group also encoded and reconstructed four image files.

Dean Tullsen is a computer science engineer at the University of California, San Diego. He chaired the meeting’s session in which Ceze’s group described its new system for DNA storage and file retrieval. He says that it’s not clear whether or when DNA data storage might become common. But the University of Washington team has “shown some very exciting potential,” he says. “The best part of the work is that they have actually stored some pictures in synthesized DNA” in the lab, he adds. The team then “read the data back out with no errors.”

References:

Kathiann Kowalski (May 4, 2016). DNA can now store images, video and other types of data. ScienceNewsforStudents. Retrieved from https://buff.ly/3bh0g9B