Ivars Peterson's MathTrek

April 10, 2000

# Hiding in DNA

Spies might have to start boning up on molecular biology to pass along and decipher secret messages.

During World War II, German spies used microdots to hide information in plain view. Consisting of a greatly reduced photograph of a typed page, a microdot could be pasted on top of a printed period at the end of a sentence in an otherwise innocuous missive sent between a spy and headquarters personnel. Now, it’s possible to encode a message in a strand of DNA, camouflage it among an enormous number of similar molecules, and confine the sample to an area no larger than a microdot.

This use of DNA molecules to hide secret messages won Viviana I. Risca of Paul D. Schreiber High School in Port Washington, N.Y., the top prize at this year's Intel Science Talent Search. Risca worked with Carter Bancroft and Catherine Taylor Clelland of the Mount Sinai School of Medicine in New York City to demonstrate the scheme’s feasibility. In earlier research, Bancroft and his coworkers had shown how DNA molecules can be used to add binary numbers (see DNA Adds Up, July 22, 1996).

A single strand of DNA consists of a chain of simpler molecules called bases, which protrude from a sugar-phosphate backbone. The four varieties of bases are known as adenine (A), thymine (T), guanine (G), and cytosine (C). Any strand of DNA will adhere tightly to its complementary strand, in which T substitutes for A, G for C, and vice versa. For example, a single-stranded DNA segment consisting of the base sequence TAGCCT will stick to a section of another strand made up of the complementary sequence ATCGGA. The links between pairs of bases are responsible for binding together two strands to form the characteristic double helix of a DNA molecule.

The researchers first assigned 3-base units to letters of the alphabet, numerals, and punctuation marks.

 Text to DNA Encryption Key A = CGA K = AAG U = CTG 0 = ACT B = CCA L = TGC V = CCT 1 = ACC C = GTT M = TCC W = CCG 2 = TAG D = TTG N = TCT X = CTA 3 = GAC E = GGC O = GGA Y = AAA 4 = GAG F = GGT P = GTG Z = CTT 5 = AGA G = TTT Q = AAC _ = ATA 6 = TTA H = CGC R = TCA , = TCG 7 = ACA I = ATG S = ACG . = GAT 8 = AGG J = AGT T = TTC : = GCT 9 = GCG

They used the encryption key to encode a message reading "JUNE6_INVASION:NORMANDY" as a sequence of 69 bases and synthesized the following DNA strand:

AGTCTGTCTGGCTTAATAATGTCTCCTCGAACGATGGGATCTGCTTCTGGATCATCCCGATCTTTGAAA.

The message sequence was then sandwiched between two carefully selected oligonucleotide units (primers) consisting of 20 bases each, known only to the sender and the intended recipient:

TCCCTCTTCGTCGAGTAGCA and the complement of TCTCATGTACGGCCGTGAAT.

The total length of a single-stranded message molecule was 109 bases. A few copies of this molecule were mixed with a huge number of similarly sized fragments of human DNA.

Only a recipient knowing the sequences of both primers would be able to extract the message, using the polymerase chain reaction (PCR) to isolate and make copies of (amplify) the message-containing DNA strand. It would then be a simple matter to determine the sequence of nucleotides in the relevant strand and decode the message. In contrast, an eavesdropper would have to undertake the virtually impossible task of sifting through 420 possible primer sequences to find the correct pair.

Because DNA is a very stable molecule under normal conditions and PCR is a very sensitive analytic technique, a DNA message can be hidden almost anywhere, Risca notes.

In their proof-of-principle experiment, the researchers dripped a small quantity of DNA-containing solution onto a small dot printed on filter paper. They cut out the dot, taped it over the period in a typed letter, and mailed the letter. The recipient recovered the dot, performed the analysis, and successfully decoded the secret message.

Molecular computing had first attracted Risca's attention when she was in the 10th grade. Reading a paper by a group of researchers who had created logic gates out of DNA molecules, Risca had noted that there was a need for more accurate and dependable biochemical analytic techniques. She decided to do a science project on optimizing PCR for DNA-based computation.

Though most of her results were inconclusive, Risca made one significant discovery. She obtained the best PCR yields when solutions contained surprisingly high concentrations of magnesium chloride--concentrations far higher that those generally recommended or considered acceptable. That discovery proved useful later, especially in cases where high PCR sensitivity was required.

Risca entered her project in several science fairs. Bancroft happened to be a judge at one of those fairs, and he ended up inviting Risca to work in his lab on DNA-based schemes for hiding information.

"I loved the concept and immediately began brainstorming ideas as to how it would be implemented," Risca says. "From then on, I worked with Dr. Bancroft's original concept and designed a concrete research plan that I carried out and modified where necessary."