Ivars Peterson's MathTrek

June 17, 1996

## DNA, Computers, and Killer Apps

The second annual meeting on DNA-based computers, held last week at Princeton University, was a smaller, quieter affair than the inaugural event a year earlier. At that time, the dramatic news was still fresh in everyone's mind that computer scientist Leonard M. Adleman of the University of Southern California had solved a genuine, though simple mathematical problem by manipulating molecules of DNA in the laboratory.

A DNA molecule consists of two intertwined chains, each made up of four kinds of much simpler molecules called bases, which protrude from a sugar-phosphate backbone. These bases are known as adenine, thymine, guanine, and cytosine, and they are generally referred to by the four letters A, T, G, and C. These four bases also tend to form complementary pairs: A pairs with T; and G pairs with C. The links between pairs of bases are responsible for binding together the two strands to form the characteristic double helix of a DNA molecule.

Over the years, biochemists and molecular biologists have developed a variety of techniques for manipulating DNA molecules, from the synthesis of particular sequences of bases to the extraction of certain types of strands. The profusion of advertisements related to DNA and protein handling in such journals as Science and Nature attests to the success and sophistication of these applications of biotechnology.

A short strand of synthetic DNA, known as an oligonucleotide, may consist of 10 to a few hundred bases. A typical DNA molecule for a human being is about 3 billion bases long. The order of the bases along such a molecule constitutes the "genetic code."

When computer scientists look at DNA, they see information storage and processing. They view the four bases of DNA as something akin to the zeros and ones of the binary code by which computers store and process information. They look for algorithms, theorems, data structures, and combinatorial patterns.

Adleman was able to find a particular path through an array of points and one-way, point-to-point links by assigning each point a unique code name. Each code name was in the form of a short, single-stranded DNA sequence made up of 20 bases. Each one-way link between two points was represented by a strand consisting of the complements of the last 10 bases of the starting point and the complements of the first 10 bases of the destination point. By using various biochemical operations, Adleman could tie together these strands to create longer molecules representing all the possible paths from point to point and then select the one corresponding to the answer he was looking for.

Adleman's achievement attracted widespread attention, particularly because DNA-based computers appeared to offer a way of solving certain types of mathematical problems more rapidly than conventional computers. With each strand of DNA in a test tube corresponding to a single, independently operating processor, huge numbers could be put to work simultaneously on a single problem.

Nearly 200 people showed up for the first workshop on DNA-based computers, held at Princeton just a few months after the publication of Adleman's paper in Science. Nearly all the presentations came from computer scientists, who were suddenly caught up in the intricacies of biotechnology.

Since then, researchers have settled down to the hard work of exploring alternative approaches to DNA computing. They are examining the efficiency and accuracy of DNA manipulations, studying ways of scaling up these operations to solve larger problems, and identifying possible applications.

The program for this year's workshop reflected these efforts. It included presentations by members of Adleman's group of a new model of DNA computation involving molecular "stickers" and the use of this type of computation for breaking a widely used cryptographic scheme known as the Data Encryption Standard. Other papers focused on possible ways of circumventing the errors that inevitably accompany DNA manipulations; improved strategies for selecting which sequences of bases to use as information units, or bits; and alternative suites of DNA operations that could be used for different types of computations.

The diversity of approaches and ideas exemplifies both the great breadth of DNA biochemistry and the lack of consensus at this stage on what strategy to use for solving problems of interest to computer scientists. "It's a very rich design space," says Princeton's Richard J. Lipton, one of the meeting organizers. "It's so rich that people are going to continue to come up with pretty wild ideas, and it's going to take a while to see which ones really make sense."

One notable change from the first meeting was the significant increase in the number of presentations by researchers from fields other than computer science, particularly chemistry and biology. Such interactions between biology and computer science are very valuable for the scientific community in general, Lipton says.

"I'm sure good things will come of this," he notes, "whether we will ever solve big [mathematical] problems on DNA machines or whether we just figure out better how DNA works."

Still lacking, however, is a significant body of lab work clearly demonstrating the feasibility of DNA operations on a scale anywhere near that contemplated by the computer scientists. As the biochemists pointed out, getting precisely the results that one wants is no simple matter, and pioneering lab work can require months, if not years, of effort. Computer science graduate students who have ventured into the lab have learned the same lesson.

The big underlying question in DNA computing, however, concerns its potential application. "The trouble with current examples of DNA computations is that none of the applications is compelling yet," Lipton argues. "None is clearly so important that it by itself justifies the construction of DNA computers."

What Lipton and his colleagues would like to find is the "killer application" -- the one use that makes it worthwhile for someone to spend the money to design and develop DNA-based computers. In the early days of personal computers, for example, it was the invention of spreadsheet programs that created a real need for PCs. A computer program known as Visicalc was one of the "killer apps" that got the personal computer revolution rolling.

"What we need is a real problem that cannot fit an electronic machine, a problem that someone really wants to solve," Lipton says. "So, we've put a lot of effort into thinking about what we are going to do with this technology, if it works."

Current models of DNA computation involve huge numbers of molecules, each one corresponding to a processor in a certain state and operating independently of all the others. In effect, these molecules work simultaneously but individually.

It may be desirable, however, to have some way of passing information from one strand to another, establishing communication among the DNA strands. Such global communication and memory could considerably speed up searches and other types of computer operations.

Of course, the addition of a communication capability could potentially increase the complexity of the DNA processing required to implement such a strategy. It isn't clear yet that large-scale processing could occur efficiently and accurately enough to meet the needs of even the simpler models already being considered.

Nonetheless, "we hope that the addition of global memory to the capabilities of DNA computations will help us to discover the killer app," Lipton remarks. "One outcome of the meeting was that we're now a little more optimistic than we were before that we will find something compelling to do with DNA."

After three days of listening to formal presentations and corridor conversations at the meeting, I came away with a renewed appreciation of DNA as a truly remarkable material. For something so complicated, it behaves with amazing reliability. Its molecular structure allows it just the right blend of rigidity and flexibility to perform its functions consistently, to correct its own mistakes, and to evolve and adapt to changing circumstances. C'est la vie.

### References:

Adleman, Leonard M. 1994. Molecular Computation of Solutions to Combinatorial Problems. Science 266 (Nov. 11): 1021-1024.

Bass, Thomas A. 1995. Gene Genie. Wired (August): 114.

Devlin, Keith. 1995. Test Tube Computing with DNA. Math Horizons (April): 14-21.

Fallis, Don. 1996. Mathematical Proof and the Reliability of DNA Evidence. American Mathematical Monthly (June-July): 491-497.

Gifford, David K. 1994. On the Path to Computation with DNA. Science 266 (Nov. 11): 993-994.

Leutwyler, Kristin. 1995. Calculating with DNA. Scientific American (September): 20.

Lipton, Richard J. 1995. DNA solution of hard computational problems. Science 268 (Apr. 28): 542-545.

Peterson, I. 1994. Molecular computing in a DNA soup. Science News 146 (Nov. 12): 308.

Pool, Robert. 1995. A Boom in plans for DNA computing. Science 268 (Apr. 28): 498-499.

Information about DNA-based computing is available at a number of web sites, including http://www.neci.nj.nec.com/homepages/eric/eric.html,
http://www.cs.princeton.edu/~dabo/biocomp.html, and
http://hope.caltech.edu/roweis.html.

Richard Lipton can be reached at rjl@cs.princeton.edu.