Statisticians not wanted

Devlin's Angle

September 2006

Statisticians not wanted

On August 16, 2006, the California Supreme Court made it official: in certain legal cases that hinge on statistical calculations, it is not the business of professional statisticians to decide how to evaluate the statistical data and to judge what method is most suited to analyze that data. From now on, in California at least, the courts will decide what statistical analysis is appropriate and what is not.

It gets worse - especially if you are a professional statistician. By upholding a ruling by a lower state court, the California Supreme Court also affirmed that, in their view, the proper job for statisticians is simply to plug numbers into a formula and turn the crank to produce an answer. Not any formula will do, mind; if you want your calculation to play a rule in a California legal proceeding, it will have to be the formula chosen by the court. As a professional statistician, you may believe that it is precisely your job to make that call, and that no other profession has the knowledge, expertise, experience, and skill to make such a decision in your stead. But the California Supreme Court says otherwise. They say the decision is theirs to make. You don't believe me? Read on.

The issue is one of suspect identification and conviction based on DNA profiling. Since both the reliability of a "DNA (profile) match" as a means to identify a suspect in a criminal investigation and its efficacy as evidence in court depend upon the likelihood of two different individuals sharing the same profile, use of the technique is crucially dependent on calculation of that likelihood. That's where statistics comes into the picture.

It turns out, however, to be not at all an obvious matter how to compute the appropriate statistic - or more precisely, to decide what exactly is the appropriate statistic. To my mind, though not the mind of the California Supreme Court it appears, that is where statisticians should come into the picture.

To take the story any further, I need to provide a brief summary of the DNA profiling technique.

DNA profiling

The DNA molecule comprises two long strands, twisted around each other in the now familiar double-helix structure, joined together in a rope-ladder-fashion by chemical building blocks called bases. (The two strands constitute the "ropes" of the "ladder", the bonds between the bases its "rungs".) There are four different bases, adenine (A), thymine (T), guanine (G), and cytosine (C). The human genome is made of a sequence of roughly three billion of these base-pairs. Proceeding along the DNA molecule, the sequence of letters denoting the order of the bases (a portion might be ... AATGGGCATTTTGAC ...) provides a "readout" of the genetic code of the person (or other living entity). It is this "readout" that provides the basis for DNA profiling.

Using today's techniques, it would be totally impractical to do a DNA comparison by determining all three billion letters. What is done instead is to examine a very small handful of sites of variation.

DNA is arranged into large structural bodies called chromosomes. Humans have 23 pairs of chromosomes which together make up the human genome. One chromosome in each pair is inherited from the mother and the other from the father. This means that an individual will have two complete sets of genetic material. A "gene" is really a location (locus) on a chromosome. Some genes may have different versions, which are referred to as "alleles." A pair of chromosomes have the same loci all the way along their length, but may have different alleles at some of the loci. Alleles are characterized by their slightly different base sequences and are distinguished by their different phenotypic effects. Some of the genes studied in forensic DNA tests have as many as 35 different alleles in the population.

Most people share very similar gene sequences, but some regions of DNA sequence vary from person to person with high frequency. Comparing variation in these regions allows scientists to answer the question of whether two different DNA samples come from the same person.

The profiling technique used by the FBI and other law enforcement authorities depends on the fact that the variability is manifested by differences in the length, measured by the number of bases or the number of times a given sequence repeats, between pre-specified locations. This procedure yields two measurements for each sample for each locus, one for the father's side and one for the mother's side. The length of DNA fragments can be measured precisely. In comparing two samples at a given locus, if the pair of measurements from one sample is the same as the pair of measurements from the other, the profiles are said to match at that locus; otherwise, they are said not to match at that locus. If the two profiles match at each of the loci examined, the profiles are said to match. If the profiles fail to match at one or more loci, then the profiles do not match, and it is virtually certain that the samples do not come from the same person.

A match does not mean that the two samples must absolutely have come from the same source; all that can be said is that, so far as the test was able to determine, the two profiles were identical, but it is possible for more than one person to have the same profile across several loci. At any given locus, the percentage of people having DNA fragments of a given length, in terms of base pairs, is small but not zero. DNA tests gain their power from the conjunction of matches at each of several loci; it is extremely rare for two samples taken from unrelated individuals to show such congruence over many loci.

The FBI's forensic DNA identification system (called CODIS) examines thirteen such regions in the genome. Sequences in these special regions involve multiple repetitions of short combinations of letters, such as GATA. Easily detectable differences between people lie in the number of repeats that occur in both copies of their DNA in these regions. For example, at one of these regions a person might have inherited four repeats (GATAGATAGATAGATA) from their father and six repeats (GATAGATAGATAGATAGATAGATA) from their mother at the same location in the genome. Another person might inherit eight repeats (GATAGATAGATAGATAGATAGATAGATAGATA) from their father and five repeats (GATAGATAGATAGATAGATA) from their mother.

How reliable is DNA profile matching?

When two randomly chosen DNA samples match completely in a large number of regions, such as the 13 used in the FBI's system, the probability that they could have come from two unrelated people is very small. This fact makes DNA identification extremely reliable (when performed correctly). The degree of reliability is generally measured by using the product rule of probability theory to determine the likelihood of finding a particular profile among a random selection of the population.

For example, consider a profile based on just three sites. The probability that someone would match a random DNA sample at any one site is roughly one in ten (1/10). So the probability that someone would match a random sample at three sites would be about one in a thousand:

1/10 x 1/10 x 1/10 = 1/1,000.
Applying the same probability calculation to all 13 sites used in the FBI's CODIS system would mean that the chances of matching a given DNA sample at random in the population are about one in ten trillion:
(1/10)^13 = 1/10,000,000,000,000.
This figure is known as the random match probability (RMP). Since it is computed using the product rule for multiplying probabilities, it assumes that the patterns found in two distinct sites are independent. Is this assumption justified? Personally, I find this a particularly worrying assumption, and it very definitely is an assumption, but genetics is not my area of expertise, and (unlike the California Supreme Court) I do not feel comfortable stepping into the specialties of other professionals. Overall those specialists seem reasonably confident in the independence assumption. In any event, the courts regularly accept the independence assumption, and my present focus lies elsewhere, so for the purpose of this essay, I'll simply accept it too.

Using DNA profiling

Here is one way that DNA profiling is often used in the criminal justice system. The authorities investigating a crime obtain evidence that points to a particular individual as the criminal, but fails to identify the suspect with sufficient certainty to obtain a conviction. If the suspect's DNA profile is in the CODIS database, or else a sample is taken and a profile prepared, it may be compared with a profile taken from a sample collected at the crime scene. If the two profiles agree on all thirteen loci, then for all practical - and all legal - purposes, the suspect is assumed to have been identified with certainty. The random match probability (one in ten trillion) provides an estimate of the likelihood that the two profiles came from different individuals.

Of course, all that a DNA match does is identify - within a certain degree of confidence - an individual whose DNA profile was that same as that of a sample (or samples) found at the crime scene. In of itself, it does not imply that the individual committed the crime. There could be any number of ways for a person's DNA to end up at a crime scene. (If your spouse or close friend were murdered, very likely some of your DNA would be found on the victim's body or clothing. It does not follow automatically that you are the killer.) Other evidence is required to determine guilt of the crime in question.

As to the degree of confidence that can be vested in the identification of an individual by means of a DNA profile match obtained in the above manner, the issues to be considered are:

A likelihood of one in ten trillion attached to the second of these two possibilities (such as is given by the RMP for a 13-loci match) would clearly imply that the former possibility is far more likely, since hardly any human procedure can claim a one in ten trillion fallibility rate. Put differently, if there is no reason to doubt the accuracy of the sample collections procedures and the laboratory analyses, the DNA profile identification could surely be viewed with considerable confidence.

[I have already expressed my doubt regarding the use of the RMP to obtain a reliable indicator of an accidental match, computed as it is on the basis of our current scientific understanding of ggenetics. The RMP calculation does, after all, require mathematical independence of the loci - an extremely demanding condition - in order to be able to apply the product rule. I'd feel a lot more confident if there were some empirical data to buttress the accepted assumption. What empirical data there is seems if anything to support my doubt. A recent analysis of the Arizona convicted offender data base (a database that uses the 13 CODIS loci) revealed that among the approximately 65,000 entries listed there were 144 individuals whose DNA profiles match at 9 loci (including one match between individuals of different races, one Caucasion, the other African American), another few who match at 10 loci, one pair that match at 11, and one pair that match at 12. The 11 and 12 loci matches were siblings, hence not random. But matches on 9 or 10 loci among a database as small as 65,000 entries cast considerable doubt in my mind on figures such as the oft-cited "one in ten trillion" for a match that extends to just 3 or 4 additional loci. But again, this is off my current target.]

Of course, the a one-in-a-trillion likelihood figure is massive overkill. Absent any confounding factors, a figure of one in a million or one in ten million (say) would surely be enough to determine identity with virtual certainty.

Hence, all of the above cautions notwithstanding, it seems reasonable to assume that (blood relatives aside) a 13-loci match can be taken as definitive identification - provided that, and this is absolutely critical to the calculation and use of the RMP, the match is arrived at by comparing a profile from a sample from the crime scene with a profile taken from a sample from a suspect who has already been identified by means other than his or her DNA profile. But this is not what happened in the case that led to the recent decision by the California Supreme Court. The case before them involved a so-called "cold hit identification."

Cold Hit searches

Increasingly, when criminal investigation authorities find themselves with crime scene DNA evidence but no suspects, they resort to using the DNA profile as a tool to identify a possible culprit, by searching DNA profile databases of previous offenders (such as the CODIS database) to see if a match can be found. A "cold hit" identification is one that results from such a search. A match obtained in this way would be a "cold hit" because, prior to the match, the individual concerned was not a suspect.

As in the case where DNA profiling is used to provide identification of an individual who was already a suspect, the principal question that has to be (or at least should be) asked after a cold hit search has led to a match (a "hit") is: Does the match indicate that the profile in the database belongs to the same person whose sample formed the basis of the search, or is the match purely coincidental? At this point, the waters rapidly become very murky.

To illustrate the problems inherent in the Cold Hit procedure, consider the following analogy. A typical state lottery will have a probability of winning a major jackpot around 1 in 35,000,000. To any single individual, buying a ticket is clearly a waste of time. Those odds are effectively nil. But suppose that each week, at least 35,000,000 people actually do buy a ticket. (This is a realistic example.) Then every one to three weeks, on average, someone will win. The news reporters will go out and interview that lucky person. What is special about that person? Absolutely nothing. The only thing you can say about that individual is that he or she is the one who had the winning numbers. You can make absolutely no other conclusion. The 1 in 35,000,000 odds tell you nothing about any other feature of that person. The fact that there is a winner reflects the fact that 35,000,000 people bought a ticket - and nothing else.

Compare this to a reporter who hears about a person with a reputation of being unusually lucky, goes along with them as they buy their ticket, and sits alongside them as they watch the lottery result announced on TV. Lo and behold, that person wins. What would you conclude? Most likely, that there has been a swindle. With odds of 1 in 35,000,000, it's impossible to conclude anything else in this situation.

In the first case, the long odds tell you nothing about the winning person, other than that they won. In the second case, the long odds tell you a lot.

To my mind, a Cold Hit measured by RMP is like the first case. All it tells you is that there is a DNA profile match. It does not, in of itself, tell you anything else, and certainly not that that person is guilty of the crime.

On the other hand, if an individual is identified as a crime suspect by means other than a DNA match, then a subsequent DNA match is like the second case. It tells you a lot. Indeed, assuming the initial identification had a rational, relevant basis (like a reputation for being lucky in the lottery case), the long RMP odds against a match could be taken as conclusive. But as with the lottery example, in order for the long odds to have (any) weight, the initial identification has to be before the DNA comparison is run (or at least demonstrably independent thereof). Do the DNA comparison first, and those impressive sounding long odds may be totally meaningless, simply reflecting the size of the relevant population, just as in the lottery case.

It has to be admitted that not everyone agrees with the above analogy - at least, they do not agree with the conclusions regarding the inapplicability of the RMP in the case of a cold hit match. In particular, the FBI has argued repeatedly that the RMP remains the only statistic that needs to be presented in court to provide a metric for the efficacy of a DNA cold hit match.

Unfortunately, attempts to resolve the issue by obtaining expert opinion have so far served only to muddy the waters still further.

The NRC reports

In 1989, the FBI urged the National Research Council to carry out a study of the matter. The NRC formed the Committee on DNA Technology in Forensic Science, which issued its report in 1992. Titled DNA Technology in Forensic Science, and published by the National Academy Press, the report is often referred to as "NRC I". The committee's main recommendation regarding the cold hit process is given on page 124 of the report:

"The distinction between finding a match between an evidence sample and a suspect sample and finding a match between an evidence sample and one of many entries in a DNA profile databank is important. The chance of finding a match in the second case is considerably higher. ... The initial match should be used as probable cause to obtain a blood sample from the suspect, but only the statistical frequency associated with the additional loci should be presented at trial (to prevent the selection bias that is inherent in searching a databank)."

In part because of the controversy the NRC I report generated among scientists regarding the methodology proposed, and in part because courts were observed to misinterpret or misapply some of the statements in the report, in 1993, Judge William Sessions, then the Director of the FBI, asked the NRC to carry out a follow-up study. A second committee was assembled, and it issued its report in 1996. Often referred to as "NRC II", the second report, The Evaluation of Forensic DNA Evidence, was published by the National Academy Press in 1996.

The NRC II committee's main recommendation regarding cold hit probabilities is:

"Recommendation 5.1. When the suspect is found by a search of DNA databases, the random-match probability should be multiplied by N, the number of persons in the database."

The statistic NRC II recommends using is generally referred to as the "database match probability", DMP. This is an unfortunate choice of name, since the DMP is not a probability - although in all actual instances it is a number between 0 and 1, and it does (in my view as well as that of the NRC II committee) provide a good indication of the likelihood of getting an accidental match when a cold hit search is carried out. (The intuition is fairly clear. In a search for a match in a database of N entries, there are N chances of finding such a match.) For a true probability measure, if an event has probability 1, then it is certain to happen. However, consider a hypothetical case where a DNA database of 1,000,000 entries is searched for a profile having a RMP of 1/1,000,000. In that case, the DMP is

1,000,000 x 1/1,000,000 = 1
However, in this case the probability that the search will result in a match is not 1 but approximately 0.6312.

The committee's explanation for recommending the use of the DMP to provide a scientific measure of the accuracy of a cold hit match reads as follows:

"A special circumstance arises when the suspect is identified not by an eyewitness or by circumstantial evidence but rather by a search through a large DNA database. If the only reason that the person becomes a suspect is that his DNA profile turned up in a database, the calculations must be modified. There are several approaches, of which we discuss two. The first, advocated by the 1992 NRC report, is to base probability calculations solely on loci not used in the search. That is a sound procedure, but it wastes information, and if too many loci are used for iidentification of the suspect, not enough might be left for an adequate subsequent analysis. ... A second procedure is to apply a simple correction: Multiply the match probability by the size of the database searched. This is the procedure we recommend." [p.32].

This is essentially the same logic as I presented for my analogy with the state lottery.

The controversy

Since two reports by committees of acknowledged experts in DNA profiling technology and statistical analysis, with each report commissioned by the FBI, came out strongly against the admissibility of the RMP, one might have imagined that would be the end of the matter, and that judges in a cold hit trial would rule in favor of admitting either the RMP for loci not used in the initial identification ( la NRC I) or else ( la NRC II) the DMP but not the RMP calculated on the full match.

However, not all statisticians agreed with the conclusions of the second NRC committee. Most notably, Dr. Peter Donnelly, Professor of Statistical Science at the University of Oxford, took a view diametrically opposed to that of NRC II. In an affidavit to the Court of the District of Columbia, in connection with a cold hit case (the Jenkins case), titled "DNA Evidence after a database hit" and dated October 3, 2004, Donnelly observed that during the preparation of the NRC II report, he had substantive discussions about the issues with four members of the committee whom he knew professionally, and went on to say:

"I had argued, and have subsequently argued, that after a database search, the DNA evidence ... is somewhat stronger than in the setting in which the suspect is identified by non-DNA evidence and subsequently found to match the profile of the crime sample. ... I disagree fundamentally with the position of NRC II. Where they argue that the DNA evidence becomes less incriminating as the size of the database increases, I (and others) have argued that in fact the DNA evidence becomes stronger. ... The effect of the DNA evidence after a database search is two-fold: (i) the individual on trial has a profile which matches that of the crime sample, and (ii) every other person in the database has been eliminated as a possible perpetrator because their DNA profile differs from that of the crime sample. It is the second effect, of ruling out others, which makes the DNA evidence stronger after a database search..."

Donnelly advocated using a Bayesian analysis to determine the probability of a random match, which method he outlined in a paper co-written with David Balding in 1996, titled "Evaluating DNA Profile Evidence When the Suspect is Identified Through a Database Search" (J. Forensic Science 603) and again in a subsequent article co-written with Richard Friedman: "DNA Database Searches And The Legal Consumption Of Scientific Evidence", Michigan Law Review, 00262234, Feb99, Vol. 97, Issue 4.

The statistic generated by the Donnelly/Balding method is generally close to the RMP, although it results from a very different calculation.

The Donnelly/Balding method was considered by NRC II and expressly rejected. (Readers knowledgable in probability theory will recognize at once that this is yet another manifestation of the ongoing debate between frequentist and Bayesian approaches to probability calculations.)

We thus have a fascinating situation: two groups of highly qualified experts in statistical reasoning, each proposing a different way to calculate the likelihood that a cold hit search will identify an innocent person, and each claiming that its method is correct and the other is dead wrong.

Scarcely any wonder then that the courts have become confused as to what number or numbers should be presented in court as evidence.

Personally, I (together with the collective opinion of the NRC II committee) find it hard to accept Donnelly's argument, but his view does seem to establish quite clearly that the relevant scientific community (in this case statisticians) have not yet reached consensus on how best to compute the reliability metric for a cold hit.

As I understand it (as a non-lawyer), the accepted procedure for the courts to follow when there is no consensus regarding a scientific procedure is to rule inadmissible the introduction as evidence of results obtained by the disputed procedure. In this case, that would, I believe, make it very difficult to provide the RMP as the sole numerical indicator of the reliability of a DNA profile match obtained from a cold hit search, a state of affairs that the FBI, for one, appears to wish not to happen.

The question then is, what should the courts do? My personal view, as a mathematician, is that they should adopt one of the approaches recommended by the NRC, preferably NRC I (which is free of controversy), taking advantage of much improved DNA testing technology to extend the match process to more than 13 loci, a move that would more than compensate for the increase in the accidental match probability, however it is calculated, that results from a cold hit search.

What the courts should definitely not do, in my opinion (and let me stress that what you are reading is, as always in "Devlin's Angle", an opinion), is simply take it upon itself to decide, as a matter of law rather than scientific accuracy, which calculation should be used. That is not how the courts normally act in matters of scientific evidence, and in my view it is not how they should act here.

Yet this is exactly what the California Supreme Court has just done with its recent ruling. (The decision went 4 to 2 with one justice recusing himself. The court did not give the reasoning behind their decision.)

Two test cases

There are, to my knowledge, two cases currently before the California courts where there is dispute as to the admissibility of cold hit calculations, one of which led directly to the recent decision.

In one, People of the State of California versus Christopher Goree, the Los Angeles District Attorney, in his submission to the Los Angeles Superior Court dated 5/19/06, opposed the defendant's motion to exclude DNA cold hit statistics resulting from a method currently in dispute, stating ". . .any argument regarding the relevance in a 'cold hit case' of a rarity/random match probability statistics is a legal argument, not a [. . .] scientific argument."

Statisticians reading this may be shocked, but the DA meant what he said. Later in his motion, he argues: "Whether evidence has less probative value or more probative value is a legal evaluation, not a scientific one. Nothing prevents scientists from debating the issue, but its evaluation and resolution is reserved for the judiciary alone."

While the first sentence in the above claim may well be correct from a legal standpoint, think very carefully about what is implied by the second sentence. We are, after all, talking here not about opinions, but what number, as a matter of actual fact, most accurately measures the mathematical likelihood of a false conviction. The fact that at present different groups of statisticians do not agree on the answer does not make this any less a matter of actual fact. It just means that the relevant professional community have not yet reached consensus on what that actual fact is.

This kind of thing is hardly unknown in science. Physicists are currently in disagreement as to whether string theory correctly describes the universe we live in. But should that too be a matter for the courts to resolve?

Yes, of course the courts are where decisions must be made as to what evidence may or may not be admitted. But when that evidence is a result of the application of science, they should do so in an informed way, upon the advice of the appropriate scientists. In that case of cold hit DNA cases, that means professional statisticians. If the statisticians agree on a number or numbers that describe a certain situation, the court must, if it decides such numbers are relevant, use that number or numbers - and definitely no others. If the statisticians express disagreement, the court would be wise to act on the assumption that either view may be correct. (Correct here does not mean which calculation is correct as a calculation. In the present cold hit controversy, no one argues that any particular calculation is incorrect. Rather, the "correctness" in dispute is which calculation (and hence the result of that calculation) best describes the actual situation before the court.)

The LA District Attorney goes on to say: "Defendant then postulates that when a single suspect is identified in a DNA database search, the significance of a subsequent one-to-one DNA profile comparison between the suspect and the perpetrator should not be described using the rarity/ransom match probability in the general population. He is wrong, however. [ . . . ] The exact means by which the suspect was initially identified are irrelevant."

I know of no statistician, be he or she frequentist or Bayesian, who would agree to that last claim. Still, the DA is trying to secure a conviction. It is not his job to be faithful to science, rather to make the best case he can. What I find far more worrying are the decisions being made by the courts. For their job, after all, is to get at the truth.

In the second case I shall discuss, The People versus Michael Johnson, the Court of Appeal of the State of California Fifth Appellate District issued an opinion on May 25 of this year, in which they state: "In our view, the means by which a particular person comes to be suspected of a crime - the reason law enforcement's investigation focuses on him - is irrelevant to the issue to be decided at trial, i.e., that person's guilt or innocence."

The court continues a short while later: " . . . the fact that here, the genetic profile from the evidence sample (the perpetrator's profile) matched the profile of someone in a database of criminal offenders, does not affect the strength of the evidence against appellant. [ . . . ] The fact appellant was first identified as a possible suspect based on a database search simply does not matter."

Oh dear, oh dear, oh dear.

Again, I urge you to play out the above line of reasoning with my lottery example. Every week, millions and millions of lottery entrants are reminded of the huge difference between "the probability that someone will win" and "the probability that YOU will win."

Subsequent to the court's opinion on the Johnson case, I was one or several scientists who wrote an Amicus Brief to the California Supreme Court requesting that in cold hit cases as in other cases involving scientific issues, the courts should seek expert opinion from, in this case, statisticians.

[Incidentally, my only involvement in the general issue of probability calculations in DNA cold hit cases is that of a citizen concerned that justice be properly done, and a mathematician who believes I have a duty to ensure that the professional opinion of mathematicians should be taken into account when it is relevant. I have no connection with any case currently before the courts, and know nothing about any of them other than is available in public documents. I have no personal interested vested in whether the court accepts or denies a brief to which I am a cosignator.]

In our brief, we state:

By way of background, we make clear that we are scholars and scientists, not attorneys. Our professional interest is in the proper understanding of the role that science in general, and statistics in particular, can and should play in legal cases involving forensic DNA evidence. Assuming that criminal trials are a search for the truth, evidence presented before juries should be accurate. Forensic DNA evidence is grounded in statistical expressions that measure the likelihood of coincidence. Statisticians and other professional scientists with an interest in, and knowledge about, statistics and genetics are uniquely empowered to advise how to derive those statistical expressions. Science matters, and court decisions that treat statistical questions such as how to express match evidence in DNA database match cases as purely legal ones are, respectfully, irresponsible."

By denying the petition for review in the Johnson case of which our brief was part, the California Supreme Court has ruled that, for now at least, it is for the courts to decide which statistical calculations to accept and which to keep out in cold hit cases. That strikes me as a scandalous afront to all professional statisticians, both those who regularly testify for the prosecutions and those who testify for the defendants in DNA profile cases.

Given the system of checks and balances in our legal system, I am hopeful that in due course the matter will be resolved correctly. In the meantime, I fear that the very DNA profiling procedure that has been used so successfully to overturn many previous false convictions (as well as put behind bars individuals I for one am glad are no longer roaming the streets), will, in the case of cold hit cases as currently being adjudicated, lead to another collection of wrongful convictions that later courts will have to undo.


Devlin's Angle is updated at the beginning of each month.
Mathematician Keith Devlin (email: devlin@csli.stanford.edu) is the Executive Director of the Center for the Study of Language and Information at Stanford University and The Math Guy on NPR's Weekend Edition. Devlin's newest book, THE MATH INSTINCT: Why You're a Mathematical Genius (along with Lobsters, Birds, Cats, and Dogs) was published recently by Thunder's Mouth Press.