United States v. Shea, 957 F. Supp. 331 (D.N.H. 1997)
March 18, 1997
Anthony Mark SHEA.
United States District Court, D. New Hampshire.
*332 Gary Milano, Asst. U.S. Atty., Concord, NH, for Plaintiff.
Bjorn Lange, Federal Defender, Concord, NH, for Defendant.
BARBADORO, District Judge.
Two men wearing masks and gloves broke into the Londonderry, branch of the First New Hampshire Bank about an hour after closing on August 4, 1995. One of the robbers apparently cut himself when he entered the building, as bloodstains were discovered inside the bank and in a stolen minivan believed to have been used as a getaway vehicle.
The government later charged Anthony Shea with the robbery and proposed to base its case in part on expert testimony comparing Shea's DNA with DNA extracted from several of the bloodstains. The government's expert, a forensic scientist employed by the FBI, used a method of DNA analysis known as Polymerase Chain Reaction ("PCR"), in determining that Shea has the same DNA profile as the person who left several of the blood stains at the crime scene and in the getaway vehicle. The expert also concluded that the probability of finding a similar profile match if a DNA sample were drawn randomly from the Caucasian population is 1 in 200,000.
Shea moved to exclude the DNA evidence prior to trial. Although he conceded that the scientific principles underlying PCR are generally accepted in the fields of molecular biology and forensic science, he argued that the evidence is inadmissible pursuant to Fed. R.Evid. 702 because the FBI's PCR methods are unreliable. He also challenged the government's random match probability estimate for similar reasons. Finally, he argued that evidence of a random match probability is barred by Fed.R.Evid. 403 because the risk that the jury would be misled by the evidence substantially outweighs its probative value.
After holding an evidentiary hearing and carefully considering Shea's arguments, I denied his motion to exclude. Shea subsequently was convicted of attempted bank robbery and several related charges. In this opinion, I explain why I admitted the DNA evidence.
In order to appreciate Shea's contentions, one must understand certain generally accepted principles and methodologies used in the fields of molecular biology and population genetics. Accordingly, I begin by describing several basic concepts used in human genetics, the DNA typing methodology at issue in this case, and the statistical methods the government's expert used in attempting to determine the probability of a random match.
*333 A. Some Basic Concepts Used in Human Genetics
DNA, an acronym for deoxyribonucleic acid, is the chemical blueprint for life. Most human cells other than reproductive cells contain identical copies of a person's DNA. Although 99.9% of human DNA does not vary from person to person, no two persons other than identical twins have the same DNA. NRC II, supra, at 63.
Human DNA is organized into 23 pairs of chromosomes and each chromosome contains a DNA molecule. DNA molecules have a double stranded helical structure that can be envisioned as a spiral staircase. NRC I, supra, at 2. See Figure 1. Running between the two sugar-phosphate strands forming the handrails of the staircase are millions of steps comprised of two loosely bound nitrogen bases. Each step is referred to as a base pair. There are four types of bases: adenine (A), thymine (T), guanine (G), and cytosine (C). A's ordinarily pair only with T's, and C's ordinarily pair only with G's. Thus, if the sequence of bases on one side of a DNA molecule is known, the corresponding sequence of bases on the other side can be deduced. The arrangement of base pairs in chromosomal DNA comprises the genetic code that differentiates humans from non-humans and makes every person unique. Mange, supra, at 19-20.
In total, the DNA molecules in the 23 pairs of human chromosomes contain approximately 3.3 billion base pairs. Most of the base pairs are arranged in the same sequence in all humans. NRC II, supra, at 62-63. However, every DNA molecule has regions known as polymorphic sites where variability is found in the human population. Each possible arrangement of base pairs that occurs at a polymorphic site is referred to as an allele. Alleles can result from differences in a single base pair, differences in multiple base pairs, or differences in the number of base pairs that comprise a site.
The combination of alleles from corresponding sites on a chromosome pair is sometimes referred to as the site's genotype. NRC II, supra, at 216. One allele for each single locus genotype is inherited from each parent. If both parents contribute the same type of allele, the child's genotype is considered to be homozygous. If each parent contributes a different type of allele, the child's genotype is considered to be heterozygous. To illustrate, if only two alleles for a locus are found in the population, A and a, two homozygous genotypes, AA and aa, and one heterozygous genotype, Aa, will be found in the population. Although an individual's genotype consists of either two copies of the same allele or one copy of each of two different alleles, many different alleles may be found in the population for a single locus. NRC II, supra, at 15.
B. PCR Amplification and Typing
PCR and Restriction Fragment Length Polymorphism ("RFLP") are the two methods most often used in forensic DNA typing. In this case, the government relies exclusively on PCR. PCR has two aspects, amplification and allele identification.
*334 1. PCR Amplification
PCR amplification is a process for making many copies of selected portions of a DNA sample. NRC I, supra, at 40. The process requires the use of a primer for each end of a polymorphic site. Primers are synthetic single-stranded DNA molecules consisting of approximately 20 bases. They are arranged in a sequence that complements the bases on one strand of the double-stranded DNA molecule in a known region flanking the site at issue. Amplification is commenced by adding two corresponding primers for each end of a site, an enzyme known as DNA polymerase, and many free floating copies of the four bases (A, C, T and G) to a purified DNA sample. The double-stranded DNA molecules in the sample are then denatured. Denaturing separates double-stranded DNA molecules into single-stranded molecules with complementary base sequences. After the DNA is denatured, the primers bind with the denatured DNA at their complementary sites such that one primer binds to one strand at one end of the studied site and the other primer binds to a complementary strand at the other end of the site. Mange, supra, at 287. See Figure 2. Primers have 3' and 5' ends. The denatured DNA is replicated only from each primer's 3′ end leaving the portion of a molecule on the 5' end single-stranded. Mange, supra, at 256, 287. Each step in the process after the primers are added is accomplished through carefully controlled changes in temperature in a device known as a thermal cycler. Mange, supra, at 288.
The amplification process is repeated many times. After the third cycle, some copies are produced that contain only the polymorphic region and its flanking primers. See Figure 3. The number of these copies grows exponentially with each cycle. Eventually, enough copies of the shorter segments are produced to permit the amplified alleles to be identified. Mange, supra, at 288.
Seven different polymorphic sites were analyzed in this case. DQ Alpha; Low Density Lipoprotein Receptor (LDLR); Glycophorin A (GYPA); Hemoglobin G Gammaglobin (HBGG); D7S8; Group-Specific Component (Gc); and D1S80. The DQ Alpha site and the five sites collectively known as the Polymarker sites (LDLR, GYPA, HBGG, D7S8 and Gc) were amplified simultaneously using a commercially available test kit known as the "AmpliType PM PCR Amplification and Typing Kit." The D1S80 site was amplified separately.
2. Identification of Amplified Alleles
Once a DNA sample is amplified, the specific polymorphic sites must be typed. The DQ Alpha and Polymarker sites are typed as follows. The amplified DNA is denatured once more and washed over strips of allele-specific probes. Each probe itself contains denatured DNA segments comprising an allele that is known to exist at the studied site. Both the DQ Alpha and Polymarker test strips also have control probes that contain many copies of a denatured portion of the DQ Alpha site that all of the DQ Alpha alleles have in common. The amplified DNA binds with the denatured DNA in the control probes and those allele-specific probes that contain DNA segments with complementary sets of bases. NRC I, supra, at 42. A reagent is then applied which causes colored dots to appear at any probes where binding has occurred. Because the control probes contain DNA segments that will bind with all of the DQ Alpha alleles, the control probes will always show a positive response if the sample contains human DNA and the tests are performed properly. If the sample contains DNA from only one person and the *335 person is homozygous at a locus, the test will show a positive reading for only one allele at that locus. If the person is heterozygous, the test will be positive for two alleles. Because the chemical reactions occurring in this process also are sensitive to temperature, they are conducted in a water bath at designated temperatures.
The D1S80 site is typed differently because alleles for this site result from variations in the number of times that a contiguous sequence of 16 base pairs is repeated. NRC II, supra, at 74. The process used to type amplified D1S80 is known as gel electrophoresis. In this process, amplified D1S80 is deposited at one end of a thin slab of a gel material. The gel is then placed in an electric field and this field causes the amplified D1S80 to migrate through the gel. The rate at which the D1S80 segments travel through the gel depends upon the length of the amplified alleles. Shorter alleles travel further in a given time than longer alleles. After a predetermined time, the gel is removed from the electric field and the amplified D1S80 in the gel is stained. The sample can then be typed based on the distance the amplified DNA has traveled through the gel. Mange, supra, at 299-301.
PCR is a very potent process which can result in the amplification of very small amounts of DNA. Accordingly, special care must be taken to minimize the possibility that samples become contaminated though mishandling. Among the other issues that are sometimes raised when considering the PCR process are: (1) the potential that primers or probes might bind at points other than the areas flanking the site under study; (2) concerns that the process has a limited capacity to identify mixtures of more than one person's DNA; and (3) suggestions that the laboratory performing the test in a particular case might fail to detect and correct erroneous results.
C. Population Genetics
The PCR analysis conducted in this case allegedly demonstrates that Shea's DNA matches DNA extracted from several of the bloodstains at seven studied sites. To put this finding in context, the government offered evidence that the probability of finding a similar match if a DNA sample were drawn randomly from the Caucasian population is 1 in 200,000. This random match probability essentially expresses the expected frequency of the observed DNA profile in a pertinent population.
The process of calculating a random match probability begins with a determination of the allele frequencies comprising the DNA profile. An allele frequency is simply a statement of relative proportion that is customarily expressed as a decimal fraction. Genotype frequencies are calculated by squaring the frequency of the single allele comprising each homozygous genotype (P2) and by doubling the product of the two allele frequencies comprising each heterozygous genotype (2PiPj). NRC I, supra, at 4; NRC II, supra, at 92. The law of genetics that permits genotype frequencies to be determined in this way under proper conditions is called the Hardy-Weinberg law. Mange, supra, at 408-11. Once genotype frequencies are determined, the probability of a random match of genotypes at multiple sites is calculated by multiplying the frequencies of the sample's genotypes at each site.
*336 The rule that the joint probability of multiple independent events can be determined by multiplying the frequencies of the individual events is known as the product rule. Mange, supra, at 61. The product rule can be applied reliably in the manner described above only if the estimate of allele frequencies is reasonably accurate and the conditions in the population approximate what are known as Hardy-Weinberg equilibrium and linkage equilibrium.
(1) Accuracy of Allele Frequencies Estimates
Because it is not practical to test an entire population, allele frequencies are derived from databases of DNA samples. If these databases do not accurately reflect the distribution of alleles in the population either because the sample size is too small or because of a bias in the way in which samples were selected for inclusion the calculation of a random match probability may be unreliable. NRC I, supra, at 10.
(2) Hardy-Weinberg Equilibrium
Hardy-Weinberg equilibrium is the state in which genotype frequencies can be calculated reliably using the Hardy-Weinberg law. Hardy-Weinberg equilibrium will exist for a large population if there is approximately random mating within the population, a negligible amount of biased mutation occurs in the alleles comprising the genotypes under study, migration is limited and unbiased, and natural selection is insignificant. Large deviations from Hardy-Weinberg equilibrium make it difficult to reliably determine genotype frequencies from allele frequencies. Mange, supra, at 410-411.
(3) Linkage Equilibrium
The product rule can be used reliably only if the events considered in a joint probability calculation are independent. Thus, to the extent that genotypes at multiple sites are linked, it becomes more difficult to calculate a random match probability using the product rule. Linkage equilibrium exists in a population when alleles comprising the genotypes at one site are not associated with the alleles comprising genotypes at other sites. NRC II, supra, at 106. If Hardy-Weinberg equilibrium persists in a population over several generations, the population will approach linkage equilibrium. NRC II, supra, at 27. Linkage equilibrium will be approached more quickly for sites on different chromosomes than for sites on the same chromosome, and widely spaced sites on the *337 same chromosome will approach linkage equilibrium more quickly than sites that are close together. NRC II, supra, at 64, 106. Whereas Hardy-Weinberg equilibrium will produce alleles in Hardy-Weinberg proportions after a single generation, it takes several generations before linkage equilibrium is approached. NRC II, supra, at 106.
Hardy-Weinberg equilibrium and linkage equilibrium are rarely attained in real populations, most significantly because real populations are finite and contain subgroups that are perpetuated by non-random mating. Accordingly, debate about whether the product rule can be used reliably often focuses on the power of the statistical methods used to detect deviations from Hardy-Weinberg equilibrium and linkage equilibrium and the adequacy of the measures that are used to account for potential deviations.
A. Rule 702
Expert testimony must satisfy three requirements in order to survive a Rule 702 objection: (1) the witness must be qualified by "knowledge, skill, experience, training or education;" (2) the witness's testimony must concern "scientific, technical or other specialized knowledge;" and (3) the testimony must "assist the trier of fact to understand the evidence or to determine a fact in issue." United States v. Shay, 57 F.3d 126, 132 (1st Cir. 1995) (quoting Fed.R.Evid. 702). In this case, only the second of Rule 702's requirements is in serious dispute.
When an expert bases opinion testimony on scientific knowledge, the testimony will not be admitted unless it is derived by the scientific method and is supported by "appropriate validation." Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, 590, 113 S. Ct. 2786, 2795, 125 L. Ed. 2d 469 (1993). Rule 702 thus establishes a standard of evidentiary reliability that focuses on the scientific validity of the expert's methods rather than the soundness of his specific conclusions.United States v. Bonds, 12 F.3d 540, 566 (6th Cir.1993). Moreover, each logical step in the expert's analysis must be scientifically valid because, as the Supreme Court observed in Daubert, "scientific validity for one purpose is not necessarily scientific validity for other, unrelated purposes." 509 U.S. at 591, 113 S. Ct. at 2796; see also In re Paoli R.R. Yard PCB Litig., 35 F.3d 717, 743 (3d Cir.1994) ("Paoli II"), cert. denied, ___ U.S. ___, 115 S. Ct. 1253, 131 L. Ed. 2d 134 (1995). In Daubert, the Supreme Court described this consideration as "fit."Daubert, 509 U.S. at 591, 113 S. Ct. at 2795-96.
Almost any challenge to an expert's conclusions can be redefined as a dispute over methods. However, Rule 702's reliability requirement distinguishes between a claim that an expert's methods are unsound and a claim that scientifically sound methods have been applied improperly in a particular case. A claim that scientific methods are unsound must be addressed initially by the trial judge, while a claim that scientifically sound methods have been applied improperly ordinarily should be left for the jury to resolve unless the alleged "error negates the basis for the reliability of the principle itself." United States v. Martinez, 3 F.3d 1191, 1198 (8th Cir.1993), cert. denied, 510 U.S. 1062, 114 S. Ct. 734, 126 L. Ed. 2d 697 (1994).
Among the factors that a court should consider in determining whether scientific testimony is reliable are: (1) whether the expert's opinion can be or has been tested; (2) whether the theory or technique on which the opinion is based has been subjected to peer review and publication; (3) the technique's *338 known or potential error rate; (4) the existence and maintenance of standards controlling the technique's operations; and (5) "general acceptance."Daubert, 509 U.S. at 592-95, 113 S.Ct. at 2796-98; Paoli II, 35 F.3d at 742. No single factor is necessarily dispositive in this analysis and other factors might also warrant consideration in the appropriate case.Daubert, 509 U.S. at 594, 113 S. Ct. at 2797.
B. Rule 403
Rule 403 requires the exclusion of otherwise admissible expert testimony if the probative value of the evidence is substantially outweighed by "the danger of unfair prejudice, confusion of the issues, or misleading the jury, or by considerations of undue delay, waste of time, or needless presentation of cumulative evidence." Fed.R.Evid. 403. Expert testimony must be closely scrutinized for compliance with Rule 403 because, as the court in Daubert recognized, "[e]xpert evidence can be both powerful and quite misleading...." Daubert, 509 U.S. at 595, 113 S. Ct. at 2798 (quoting Jack B. Weinstein, Rule 702 of the Federal Rules of Evidence is Sound; It Should Not be Amended, 138 F.R.D. 631, 632 (1991)); see also, United States v. Fosher, 590 F.2d 381, 383 (1st Cir.1979). Nevertheless, relevant and reliable expert testimony ordinarily should be admitted notwithstanding Rule 403 unless the potential that it will be used improperly substantially outweighs any legitimate persuasive value that the evidence may have. See Paoli II, 35 F.3d at 747 (expert testimony should not be excluded simply because it is complex unless there is something about the particular technique at issue that overwhelms the jury's ability to independently assess the evidence).
Shea's challenges to the DNA evidence fall into three categories. First, he argues that the FBI's PCR testing protocols contain errors and omissions that render the methodology suspect. Second, he contends that the product rule cannot be used to calculate the probability of a random match because the databases on which the calculation is based are too small. Finally, he asserts that the government should be barred from informing the jury of the probability of a random match because it will mislead the jury. I address each class of contentions in turn.
A. PCR Typing Protocols
The PCR Typing methods used by the FBI in this case readily satisfy Rule 702's reliability requirement. First, although PCR is a relatively new technology, it is based on sound scientific methods and it has quickly become a generally accepted technique in both forensic and non-forensic settings. Perhaps the strongest evidence on this point is the conclusion reached by the National Research Council's Committee on Forensic DNA Science that "the molecular technology [on which PCR is based] is thoroughly sound and ... the results are highly reproducible when appropriate quality-control methods *339 are followed." NRC II, supra, at 23; see also Mange, supra, at 287 (noting PCR's "widespread and growing applications [in the field of molecular biology]"). Second, the tests used to type each of the 7 sites examined in this case were validated in a carefully constructed series of experiments and the results were later published in peer-reviewed publications. Finally, the FBI followed detailed testing protocols and quality control procedures in this case that conform to industry standards.
Notwithstanding the considerable evidence supporting a finding that the FBI's PCR test methods are scientifically valid, Shea argues that the DNA evidence must be excluded because the FBI's PCR tests will produce an unacceptably high percentage of erroneous results even if evidence samples are properly handled and the tests are properly performed. Shea bases this argument primarily on the testimony of Dr. Donald Riley. Dr. Riley claims that the FBI's testing protocols could result in typing errors because the testing protocols specify incorrect amplification and typing temperatures. He also states that this problem is particularly significant with the amplification and typing of the DQ Alpha region. Because the control probes for both the DQ Alpha and Polymarker tests are intended to detect DQ Alpha alleles, Dr. Riley theorizes, the FBI's testing protocols could produce erroneous results on both tests.
I reject Dr. Riley's testimony for two reasons. First, although he claimed that he has tested his theory, he has not subjected his conclusions to peer review, nor has he described his test methods in sufficient detail to permit a conclusion that they are scientifically valid. Second, even if the testing protocols specify the wrong amplification and typing temperatures, Dr. Riley has offered no scientific support for his theory that this methodological flaw could produce false positive signals at the control probes on the DQ Alpha and Polymarker test strips. In the face of such poorly supported testimony, I have no difficulty in finding that the published validation studies relied on by the government persuasively establish the evidentiary reliability of the FBI's PCR testing protocols.
Shea also argues that the DNA evidence should be excluded because PCR cannot reliably detect mixtures of more than one person's DNA. The government concedes that a mixture theoretically could result in the declaration of a false match. However, Richard Guerrieri, one of the government's expert *340 witnesses, testified that such errors are exceedingly unlikely because an examiner will be able to identify a mixture from observable differences in the relative strengths of the signals indicated on the PCR test strips, except in extremely unusual circumstances. I reject Shea's argument because I find Mr. Guerrieri's testimony persuasive on this point.
Shea next argues that the DNA evidence must be excluded because the government did not establish that the FBI laboratory has an acceptably low PCR error rate. Testing errors can occur either because a test has inherent limitations or because the people involved in collecting, handling or testing samples are not sufficiently skilled. See Edward J. Imwinkelried, Coming to Grips with Scientific Research in Daubert's "Brave New World": The Courts' Need to Appreciate the Evidentiary Differences Between Validity and Proficiency Studies, 61 Brook.L.Rev. 1247 (1995) (explaining the difference between a validation study which evaluates whether a test produces accurate results if performed properly and a proficiency study which evaluates a laboratory's ability to correctly perform the test). A laboratory's error rate is a measure of its past proficiency that is of limited value in determining whether a test has methodological flaws. Since Rule 702's reliability requirement focuses on the validity of the test rather than the proficiency of the tester, the absence of a laboratory error rate will rarely be dispositive if the rest of the evidence establishes that the test has been properly validated. In this case, the government produced substantial persuasive evidence to support its claim that its PCR tests are reliable. Accordingly, the absence of a known PCR error rate for the FBI laboratory does not warrant the exclusion of the government's evidence.
Shea finally challenges the reliability of the DNA evidence by pointing to several alleged deficiencies in the FBI's evidence handling and quality control procedures. Shea contends that the FBI laboratory mishandled the evidence by packaging the dried blood samples in individual paper coin envelopes and storing them together. Dr. Riley theorizes that this practice is fatally flawed because DNA from one sample could migrate through a paper coin envelope and contaminate other similarly packaged samples. Shea also contends that the laboratory's quality control procedures are deficient because substrate control samples were not taken and a positive control sample was not tested for each possible allele. The government responds by noting that Shea failed to produce any scientific evidence to support Dr. Riley's contamination theory and by explaining that the laboratory's quality control procedures conform to industry standards.
I need not address the merits of Shea's arguments. Instead, I join the many *341 courts that have addressed similar issues by concluding that because such arguments concern the way in which a method is applied in a particular case rather than the validity of the method, they affect the weight that should be given to the evidence rather than its admissibility.See, e.g., United States v. Beasley, 102 F.3d 1440, 1448 (8th Cir.1996); United States v. Hicks, 103 F.3d 837, 848 (9th Cir.1996); United States v. Chischilly, 30 F.3d 1144, 1154 (9th Cir.1994), cert. denied, ___ U.S. ___, 115 S. Ct. 946, 130 L. Ed. 2d 890 (1995); United States v. Bonds, 12 F.3d 540, 563 (6th Cir.1993); United States v. Jakobetz, 955 F.2d 786, 800 (2d Cir.), cert. denied, 506 U.S. 834, 113 S. Ct. 104, 121 L. Ed. 2d 63 (1992).
B. Random Match Probability
The government's estimate of a 1 in 200,000 random match probability is based primarily on information drawn from a PCR database comprised of DNA profiles for 148 Caucasians, 145 African Americans, 94 Southeastern Hispanics, and 96 Southwestern Hispanics. Bruce Budowle, et al., Validation and Population Studies of the Loci LDLR, GYPA, HBGG, D7S8, and Gc (PM loci), Procedure, 40 Journal of Forensic Sciences 45, 50 (1995). Shea contends that this database is simply too small to be used reliably in estimating random match probabilities with the product rule.
The government cites a study published in a peer-reviewed journal to refute Shea's claim. Id. This study analyzes the government's database using several statistical tests in an effort to identify significant departures from Hardy-Weinberg and linkage equilibrium. The study states that the distribution of the various genotypes found at the 7 loci at issue in this case meet Hardy-Weinberg expectations and exhibit little evidence of deviation from linkage equilibrium. Accordingly, it concludes that "[t]he data demonstrate that valid estimates of a multiple locus profile frequency can be derived for identity testing purposes using the product rule under the assumption of independence."Id. at 53.
Notwithstanding the study cited by the government, legitimate questions can be raised concerning the reliability of a random match probability that is estimated with the product rule from a database as small as the one used here. Because such databases are comprised of a limited number of samples, the possibility of random error ordinarily must be considered. Further, legitimate *342 questions can be raised concerning the power of existing statistical methods to detect deviations from Hardy-Weinberg and linkage equilibrium when small databases are used. If random error is not accounted for and if the likely potential effects of factors such as population substructuring are not identified and addressed, a random match probability estimated with the product rule may be unreliable.
The recently released NRC II report addresses these issues by acknowledging the potential for error and suggesting several ways to conservatively account for the problem. First, the report describes alternative adjustments to the product rule to account for the systematic over-representation of homozygous genotypes that is caused by undetected population substructuring. NRC II, supra, at 99-100. If, as is the case with most PCR-based systems, the method used to identify alleles does not present significant ambiguity, the report recommends that homozygous frequencies be determined by using P2 + P(1-P)θ rather than by P2 where P is the allele frequency and θ is the percentage of excess homozygosity that is expected because of undetected population substructuring. Id. at 122. After examining empirical data from several sources, the report concludes that a θ value of .01 will conservatively address the likely potential systematic effect of undetected population substructuring except for cases involving small, isolated populations, where a θ value of .03 may be appropriate.Id. at 122. If an allele identification system such as one based on VNTRs is used where there is a potential that a heterozygous genotype may be misidentified as homozygous, the report recommends that homozygous genotype frequencies be calculated using 2P rather than P2. Id. at 122. The report concludes that these two methods will account in a conservative way for the likely systematic effect of undetected population substructuring. Id.
Undetected population substructuring and random error can also affect individual random match probability calculations in ways that are difficult to predict. NRC II, supra, at 112. Thus, the NRC II report also suggests a way of qualifying random match probability estimates to account for such uncertainties. After considering empirical data comparing genotype frequencies observed in a number of aggregate population databases with genotype frequencies found in known regional and ethnic subpopulation databases, the report concludes that likely uncertainties caused by random error and undetected population substructuring can be conservatively accounted for if the database used in calculating the random match probability contains samples from "at least several hundred persons" and the estimate obtained by using the product rule is qualified by stating that the true value is likely to be within a factor of 10 above or below the estimated value.Id. at 156.
The government agreed to adjust its random match probability estimate in the manner suggested in the NRC II report. Accordingly, *343 I consider whether the method used by the government's expert in estimating the probability of a random match, when adjusted in accordance with the recommendations contained in the NRC II report, satisfies Daubert's reliability standard.
Shea relies primarily on the testimony of Dr. William Shields in claiming that the FBI's methodology for estimating the probability of a random match is unreliable even if it is adjusted to conform to the recommendations contained in the NRC II report. Dr. Shields challenged the FBI's methodology by claiming that: (1) a θ value of .01 is insufficient to capture the likely systematic effects of population substructuring; (2) the NRC II's recommended factor of 10 correction is based on VNTR data that cannot reliably be applied to the PCR loci at issue here; and (3) the PCR database used in this case is too small, even when judged by the standards of the NRC II report, for the factor of 10 correction to account for potential error. Rather than using the adjustment to the product rule suggested in the NRC II report, Dr. Shields proposed an alternative method of accounting for potential error. Using his method, Dr. Shields claimed that the probability of a random match should be estimated at 1 in 49,000. If, as Dr. Shields suggests, a 95% confidence interval is then calculated to account for the potential effect of random error, the bottom end of the range in his estimate would be 1 in 23,000, rather than the bottom range of 1 in 20,000 proposed by the government.
The government countered Dr. Shields' testimony with testimony from Dr. Martin Tracey. Dr. Tracey (1) endorsed the use of both P2 + P(1-P)θ with a θ value of .01 and 2P as adjustments to homozygous genotypes, (2) opined that there is no reason to expect that the factor of 10 correction recommended in the NRC II report will be insufficiently conservative if it is applied to PCR loci, and (3) concluded that a database of 148 is sufficiently large to reliably permit the use of the factor of 10 correction recommended in the NRC II report.
Whether the adjustments to the product rule suggested in the NRC II report are sufficiently conservative and whether a database of 148 is of sufficient size to serve as the basis for a reliable random match probability estimate are important questions about which population geneticists can legitimately disagree. However, Rule 702 does not require scientific consensus. The government has produced a peer-reviewed study using accepted statistical methods to support its position that the estimation of a random match probability from the database used in this case will produce a reliable result. It has further qualified its estimate in accordance with the recommendations of a distinguished committee of scientists and academicians that included leading population geneticists as members. Under these circumstances, the concerns raised by Dr. Shields affect the weight that should be given to the evidence rather than its admissibility. See Bonds, 12 F.3d at 564 (substructuring *344 argument affects weight rather than admissibility); see also Jakobetz, 955 F.2d at 792.
C. Juror Confusion
Evidence that a defendant's DNA profile matches DNA extracted from an evidence sample suggests that the defendant cannot be excluded as a potential contributor, but it is of little value, standing alone, in proving the defendant's guilt. Giving the jury a random match probability estimate for the profile is one way of helping it assess the potential significance of a DNA profile match. However, because such evidence also has the potential to mislead, Rule 403 requires that the probative value of the evidence must be carefully balanced against the danger of unfair prejudice.
Shea argues that a jury would be so overwhelmed by evidence of a random match probability that it could not properly assess the possibility that a profile match is false unless a laboratory or industry error rate is calculated and combined with the random match probability estimate. Shea bases this argument on the following reasoning: (1) a random match probability estimate is meaningless if the declared DNA profile match is false; (2) the best evidence of whether a match is false in a particular case is the laboratory's false match error rate; (3) if the laboratory's false match error rate cannot be determined, the next best evidence is the industry's false match error rate; (4) jurors cannot understand the significance of a laboratory's error rate unless it is combined with the random match probability estimate. Because the FBI laboratory does not calculate a PCR false match error rate and the government refuses to combine what Shea suggests is the industry's PCR error rate with the random match probability estimate, Shea argues that the estimate is inherently misleading.
I reject Shea's argument because it is built on several flawed premises. First, I cannot accept Shea's contention that a laboratory or industry error rate is the best evidence of whether a test was properly performed in a particular case. Juries must decide whether a particular test was performed correctly based on all of the relevant evidence. This determination can never be precisely quantified because it will often depend in part on subjective factors such as the credibility of the person who performed the test. At best, evidence of a laboratory's past proficiency should be considered as one of several factors in making this important judgment.See NRC II, supra, at 85-86 ("[t]he risk of error in any particular case depends on many variables (such as the number of samples, redundancy in testing, and analyst proficiency), and there is no simple equation to translate these variables into the probability that a reported match is spurious"). Shea's method for dealing with the probability of a false match is thus seriously flawed because it would deprive the jury of the opportunity to determine the probability of a false match based on all of the pertinent evidence.
Second, I am unconvinced by Shea's claim that a jury cannot properly assess the potential of a false match unless a false match error rate is calculated and combined with the random match probability estimate. Shea relies on testimony and research conducted by Dr. Jay Koehler to support this contention. Although Dr. Koehler's research suggests that jurors could become confused if evidence of a false match error rate and a random match probability estimate are presented with little or no explanation, it does not support Shea's broader contention that jurors cannot be made to understand such evidence even if it is properly explained. See NRC II, supra, at 199 (noting that "[t]he argument that jurors will make better use of *345 a single figure for the probability that an innocent suspect would be reported to match never has been tested adequately"). In a real trial setting, the parties are given an opportunity to explain the significance of statistical evidence through expert testimony. Further, if a trial judge concludes that jurors could be confused by statistical evidence, the judge can deliver carefully crafted instructions to insure that the evidence is properly understood. Notwithstanding Dr. Koehler's research, I am confident that the concerns Shea raises can be properly addressed through expert testimony and, if necessary, clarifying jury instructions.
Shea next argues that a random match probability estimate is inherently misleading because the jury inevitably will confuse the probability of a random match with the potentially very different probability that the defendant is not the source of the matching samples. This type of incorrect reasoning is often referred to as the fallacy of the transposed conditional, or the prosecutor's fallacy. NRC II, supra, at 133. The probability of a random match is the conditional probability of a random match given that someone other than the defendant contributed the evidence sample. The potentially different probability that someone other than the defendant contributed the sample given the existence of a match can only be determined by considering all of the evidence in the case. Shea argues, based on research conducted by Dr. Koehler, that the jury will inevitably confuse these two probabilities.
Although I acknowledge that a jury could become confused concerning the meaning and potential significance of a random match probability estimate, I am confident that the risk of confusion is acceptably small if the concept is properly explained. Moreover, because such an estimate can be extremely valuable in helping the jury appreciate the potential significance of a DNA profile match, it should not be excluded merely because the concept requires explanation. Accordingly, I decline to exclude the government's random match probability estimate pursuant to Rule 403.
After carefully considering Shea's motion to exclude the DNA evidence, I reached the following conclusions:
(1) PCR is a scientifically sound technology that can be extremely helpful in resolving questions of guilt or innocence. The theory and techniques used in PCR are sufficiently established that a court may take judicial notice of their general reliability. See Beasley, 102 F.3d at 1448 (taking judicial notice of general reliability of PCR testing); see also United States v. Martinez, 3 F.3d 1191, 1197 (8th Cir.1993) (taking judicial notice of general reliability of DNA testing), cert. denied, 510 U.S. 1062, 114 S. Ct. 734, 126 L. Ed. 2d 697 (1994); Jakobetz, 955 F.2d at 799 (taking judicial notice of reliability of DNA testing).
(2) The PCR tests used in this case readily satisfy Rule 702's reliability requirement. Accordingly, disputes concerning the way in which the tests were conducted, while vitally important, are matters that should be left for the jury to resolve.
(3) Random match probability estimates calculated with the product rule provide an important means of placing the significance of a DNA profile match in an appropriate context. However, such estimates must be qualified to account for potential errors such as in the manner suggested by the NRC II report. The government satisfied this requirement.
(4) When the significance of a random match probability estimate is properly explained, the probative value of the evidence is *346 not substantially outweighed by the limited potential that jurors could be misled.
Accordingly, I denied the defendant's motion to exclude (document no. 15).
 The information contained in this section is undisputed. Thus, I have relied on published sources to supplement testimony offered during the evidentiary hearing. See, e.g., Elaine J. Mange and Arthur P. Mange, Basic Human Genetics (1994); National Research Council, DNA Technology in Forensic Science (1992) ("NRC I"); Lorne T. Kirby, DNA Fingerprinting: An Introduction (1992); National Research Council, The Evaluation of Forensic DNA Evidence (1996) ("NRC II").
 I refer in this opinion to sites or loci rather than genes. Genes are sites on a DNA molecule containing sequences of base pairs that provide instructions used to produce something, usually a protein. Mange, supra, at 517. Genes are often found at polymorphic sites. However, the base pair sequences at many polymorphic sites have no known function.
 The term genotype is most often used to refer to an organism's entire genetic makeup. NRC II, supra, at 216. However, it can also be used to describe the combination of alleles at one or more loci. Throughout this opinion, I use the term to describe the alleles for corresponding sites at a single locus.
 RFLP targets sites on DNA molecules that are known to have different lengths because of variations in the number of times that a sequence of base pairs is repeated. Such sites are referred to as Variable Number Tandem Repeats ("VNTRs"). RFLP uses restriction enzymes to cut DNA into fragments at the boundaries of a studied site. The relative lengths of the alleles for the site are then identified by a process known as gel electrophoresis. Mange, supra, at 306; NRC II, supra, at 65-67. Electrophoresis is described later as it is also used in typing one of the sites at issue in this case.
 DQ Alpha is located on chromosome six. GYPA and Gc are both located on chromosome four. LDLR is located on chromosome nineteen. HBGG is located on chromosome eleven. D7S8 is located on chromosome seven and D1S80 is located on chromosome one.
 For example, there are eight different alleles for the DQ Alpha site: 1.1; 1.2; 1.3; 2; 3; 4.1 and 4.2/4.3. The DQ Alpha test strip has a composite probe for alleles 1.1, 1.2 and 1.3 (it appears as 1 on the test strip). It also has separate probes for alleles 1.1 and 1.3 A positive response for the 1.2 allele is inferred from a positive response on the 1 probe and the absence of a response on either 1.1 or the 1.3 probes. The test contains separate probes for the 2, 3, and 4.1 alleles. It uses a composite probe for the 4.2 and 4.3 alleles and does not otherwise attempt to distinguish between the two alleles.
 The sum of the allele frequencies for a site totals 1. Thus, if a site has two alleles with equal frequencies, each allele will have a frequency of .5.
 For example, if a site has two alleles with equal frequencies, A and a, the genotype frequencies will be: AA = .25 (or .5 × .5); aa = .25 (or .5 × .5); and Aa = .5 (or 2(.5 × .5)). The product of allele frequencies for heterozygous genotypes must be doubled because the genotype Aa can be comprised either of an A from the father and an a from the mother, or vice versa.
 The distribution of alleles in a population will occur in Hardy-Weinberg proportions under appropriate conditions because genotypes are formed in accordance with the first law of genetics. This law, which is also known as the law of segregation, recognizes that observable traits are the product of two alleles which are segregated in reproductive cells so that a child inherits one allele from each parent. Mange, supra, at 52-53.
 Thus, if a sample has three genotypes with equal frequencies of .5, the probability of a random match will be 1 in 8 (.5 × .5 × .5 = .125 or 1/8 ).
 To understand why this is so, consider an extreme example: if a site has two alleles with equal frequencies, A and a, but no mating occurs with any person having the homozygous genotype aa, genotype frequencies in subsequent generations will quickly begin to change from the frequencies that would be predicted using the Hardy-Weinberg law.
 People v. Collins, 68 Cal. 2d 319, 66 Cal. Rptr. 497, 438 P.2d 33 (1968) is frequently cited as an example of how dependence skews the results obtained by using the product rule. In that case involving eyewitness identification, the prosecutor applied the product rule to variables that were not independent. The prosecutor used individual probabilities of a man with mustache (25%), a Negro man with beard (10%), a girl with ponytail (10%), a girl with blond hair (33%), a partly yellow automobile (10%) and an interracial couple in car (001%), and using the product rule arrived at a 1 in 12 million random match probability. Because the characteristics were not independent, the product rule yielded a drastically exaggerated result. Id. at 501, 438 P.2d at 37.
 Linkage equilibrium for sites on different chromosomes is possible under proper conditions because of the second law of genetics. This law, which is also known as the law of independent assortment, holds that chromosome pairs sort randomly during the process of reproductive cell formation. Mange, supra, at 518. Thus, parents with an Aa genotype on the first chromosome pair and a Bb genotype on the second chromosome pair will randomly produce reproductive cells with the following combinations of alleles for both sites AB, Ab, aB, ab. Random mating produces linkage equilibrium for these two sites because they are on different chromosomes.
Linkage equilibrium also is possible under proper conditions for sites on the same chromosome because of a phenomenon known as crossing over. Crossing over occurs when corresponding parts of a chromosome pair are exchanged during the production of reproductive cells. If equilibrium conditions persist in a population over a number of generations, crossing over will result in the independence in the population of genotypes on the same pair of chromosomes. Mange, supra, at 47, 196; NRC II, supra, at 64.
 I address Shea's claim that the evidence is too misleading to assist the jury in ruling on his Rule 403 objection. Shea does not otherwise argue that the evidence is inadmissible under Rule 702.
 Evidentiary reliability must be distinguished from scientific reliability. The latter concept concerns the extent to which a test produces consistent results whereas the former concept depends more on scientific validity; i.e., does a principle support what it purports to support. Daubert, 509 U.S. at 590 n. 9, 113 S. Ct. at 2795 n. 9.
 I treat the concept of fit as an aspect of Rule 702's reliability requirement because a scientific opinion cannot fit the facts of the case even if it is based on scientifically sound methods unless a scientifically valid connection also exists between the opinion and the issue it is intended to address.
 The concept of general acceptance was first applied to expert testimony in Frye v. United States, 293 F. 1013, 1014 (D.C.Cir.1923). There, the court stated that "while courts will go a long way in admitting expert testimony deduced from a well-recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs." Id. at 1014.
 The Third Circuit has identified additional factors such as "the degree to which the expert testifying is qualified, the relationship of a technique to `more established modes of scientific analysis,' and the `non-judicial uses to which the scientific techniques are put.'" Paoli II, 35 F.3d at 742 (internal quotations omitted). The Ninth Circuit has similarly suggested that courts can consider "whether the experts are proposing to testify about matters growing naturally and directly out of research they have conducted independent of the litigation, or whether they have developed their opinions expressly for the purpose of testifying." Daubert v. Merrell Dow Pharmaceuticals, Inc., 43 F.3d 1311, 1317 (9th Cir.), cert. denied, ___ U.S. ___, 116 S. Ct. 189, 133 L. Ed. 2d 126 (1995).
 In addressing Shea's arguments, I am mindful that the government bears the burden of proving that the predicates for admission have been satisfied by a preponderance of the evidence. Daubert, 509 U.S. at 592 n. 10, 113 S. Ct. at 2796 n. 10.
 The general acceptance of PCR technology is further underscored by the fact that at least two federal circuit courts and at least 16 state courts have approved the admission of expert testimony based on PCR analysis. See United States v. Beasley, 102 F.3d 1440, 1448 (8th Cir.1996) (concluding that courts in the 8th Circuit can take judicial notice of the general reliability of PCR testing); United States v. Hicks, 103 F.3d 837, 844 (9th Cir.1996); United States v. Lowe, No. 95-10404-PBS, 1996 WL 774905, at *16 (D.Mass. Dec.10, 1996) (collecting state court cases).
 Catherine Theisen Comey and Bruce Budowle, Validation Studies on the Analysis of the HLA DQα Locus Using the Polymerase Chain Reaction, 36 Journal of Forensic Sciences 1633 (1991); Bruce Budowle, et al., Validation and Population Studies of the Loci LDLR, GYPA, HBGG, D7S8, and Gc (PM loci), and HLA-DQα Using a Multiplex Amplification and Typing Procedure, 40 Journal of Forensic Sciences 45 (1995); Susan Cosso and Rebecca Reynolds, Validation of the AmpliFLP D1S80 PCR Amplification Kit for Forensic Casework Analysis According to TWGDAM Guidelines, 40 Journal of Forensic Sciences 424 (1995).
 The DNA Identification Act of 1994, 42 U.S.C.A. § 14131(a) (West 1995), requires the Director of the FBI to convene an advisory board to develop quality assurance standards for DNA testing. Until such standards are developed and approved by the Director, the FBI is obligated to follow standards developed by the Technical Working Group on DNA Analysis Methods (TWGDAM), a group comprised of analysts working in government and private laboratories. The FBI's PCR testing protocols and quality control standards conform to TWGDAM guidelines.
 Dr. Riley is an Associate Professor at the University of Washington's School of Medicine and School of Public Health. He holds a Ph.D. in biochemistry.
 I express no opinion concerning the admissibility of Dr. Riley's trial testimony on this point because the government did not seek to exclude his testimony pursuant to Rule 702.
 Guerrieri is a forensic scientist employed by the FBI. He has also worked for Roche Biomedical Laboratories as the assistant director of Roche's Forensic Identity Laboratory, and for the Commonwealth of Virginia as a forensic scientist in the first state DNA laboratory in the country. He has performed over one thousand PCR DNA typing tests. He holds a Master of Science in forensic chemistry.
 Mr. Guerrieri testified that the FBI's PCR laboratory follows TWGDAM standards by requiring its PCR examiners to submit to two open external proficiency tests and one blind proficiency test per year. Although he claims that no FBI examiner has ever failed a PCR proficiency test, he stated that the FBI does not calculate a laboratory error rate because its current proficiency testing program does not produce enough samples to serve as the basis for calculating a meaningful error rate and it would be impractical to develop a proficiency testing program that could produce a meaningful calculation. See generally, NRC II, supra, at 85-86 (discussing difficulties in attempting to calculate a laboratory error rate).
 If a laboratory failed to adhere to industry proficiency testing standards, it might call into question the validity of the laboratory's methods. See NRC II, supra, at 88 (recommending that laboratories establish proficiency testing programs). I need not address this issue, however, since it is undisputed that the FBI's proficiency testing program conforms to industry standards.
 A substrate control tests the substance from which a sample is obtained. For example, if an evidentiary blood sample is obtained from a steering wheel, a substrate control would test a different part of the steering wheel to detect DNA from another source which would compromise the evidentiary sample.
 A positive control is DNA of a known type. The FBI uses positive controls, but does not use a positive control for each possible allele.
 Shea also argues that the PCR test results should be excluded pursuant to Fed.R.Evid. 901 because the government failed to sufficiently demonstrate that the tests produced an accurate result in this case. In addition to claiming that PCR tests are inherently unreliable, Shea argues that the test results are invalid because the DQ Alpha test strips used in typing Shea's blood and a positive control sample do not show a positive response at the control probes. I reject Shea's argument because I am persuaded by Mr. Guerrieri's contrary testimony. Although it is not determinative, I also note that I saw a positive response at the DQ Alpha test strip for Shea's blood sample when I inspected the strip. I could not independently determine whether the test strip for the positive control sample showed a positive response because the government produced only an inconclusive photograph of this test strip.
 Data is collected for separate racial groups because significant population substructuring is known to exist for such groups. The NRC II report recommends that random match probabilities should be calculated for all potentially applicable racial groups when the race of the perpetrator is unknown. NRC II, supra, at 122. However, the report also recognizes that it may be appropriate to provide only the estimate for the major racial group that gives the largest probability of a match. NRC II, supra, at 114. In this case, the report detailing the PCR typing concludes that, "[t]he probability of selecting an unrelated individual at random having the same DQ Alpha, PM, and D1S80 types ... is approximately 1 in 700,000 in Blacks, 1 in 200,000 in Caucasians, 1 in 600,000 in Southeastern Hispanics, and 1 in 1.3 million in Southwestern Hispanics." The government informed the jury only of the random match probability estimate for Caucasians since the match probability is highest for this group. Shea did not challenge the government's position on this point.
 The government also relies in part on a peer-reviewed study of a somewhat larger database to support its claim that the distribution of genotypes for the D1S80 locus meet Hardy-Weinberg expectations. Bruce Budowle, et al., D1S80 Population Data in African Americans, Caucasians, Southeastern Hispanics, Southwestern Hispanics, and Orientals, 40 Journal of Forensic Sciences 38, 40 (1995).
 Random error is error that can occur by chance because a database does not contain the entire population being studied. Daniel L. Rubinfeld, Reference Guide on Multiple Regression, in Moore's Federal Practice: Reference Manual on Scientific Evidence 415, 466 (1994).
 Substructuring also usually produces an under-representation of heterozygous genotype frequencies from those that would be predicted using the Hardy-Weinberg law. NRC II, supra, at 122. Since this effect will tend to favor defendants, however, the report does not recommend any adjustment to the formula used to estimate heterozygous genotype frequencies.
 The report alternatively states that "[a] more conservative value of θ = .03 might be chosen for PCR-based systems in view of the greater uncertainty of calculations for such systems because of less extensive and less varied population data than for VNTRs." NRC II, supra, at 122. This even more conservative approach is not offered as a formal recommendation. Id.
 The report notes that the uncertainty can be greater for very small profile frequencies. Id. at 160. Further, the factor of 10 qualification obviously is not of great value in qualifying random match probabilities that are greater than 10-3.
 The genotypes for the D1S80 site and 5 of the remaining 6 sites examined in this case were found to be heterozygous. When the genotype frequency for the single homozygous locus is recalculated using a θ value of .01, it does not significantly affect the government's random match probability estimate because the FBI routinely rounds down when calculating the probability of a random match and that rounding process more than captured the effect of the adjustment that needed to be made to the single homozygous genotype frequency. When the NRC II's factor of 10 adjustment is applied to the random match probability estimate, it results in a range of possible results of from 1 in 20,000 to 1 in 2,000,000.
 Dr. Shields is a professor at the State University of New York's College of Environmental Science and Forestry. He holds both a Masters and a Ph.D. in Zoology and has written extensively on inbreeding and population structure, particularly regarding birds.
 The NRC II report identifies equations that can be used in estimating genotype frequencies when the person who contributed a crime scene sample is known to come from the same subpopulation as the suspect. NRC II, supra, at 113-16. Dr. Shields argues that these equations should be used in all cases. Moreover, he claims that a θ value of .05 rather than .01 or .03 should be used in all cases. Once genotype frequencies are calculated using these equations, Dr. Shields claims, the adjusted frequencies can be multiplied using the product rule and a confidence interval can be calculated to account for random error. The NRC II report considered a similar approach and concluded that it is "unnecessarily conservative." Id. at 114.
 Confidence intervals qualify a conclusion in an effort to account for the effect of random error by describing a range of possible results that is expected to contain the true result a given percentage of the time. Id. at 146.
 Dr. Tracey is a Professor of Biological Sciences at Florida International University with a Ph.D. in biology. He has written extensively in the field of population genetics and has served on the editorial boards of a number of peer-reviewed journals.
 The parties assume that error rate information is admissible at trial. This assumption may well be incorrect. Even though a laboratory or industry error rate may be logically relevant, a strong argument can be made that such evidence is barred by Fed.R.Evid. 404 because it is inadmissible propensity evidence. Imwinkelreid, supra, at 1271-81. I need not determine whether error rate information is ever admissible, however, because the point is not essential to my analysis, and the government did not object to Shea's effort to introduce error rate information at trial.
 Dr. Koehler is an Associate Professor of Behavioral Decision Making at the University of Texas. He holds a Masters and Ph.D. in behavioral science.
 To illustrate how these two probabilities can be very different, consider a hypothetical case where: (1) the defendant's DNA profile is correctly found to match DNA left by the perpetrator at the crime scene during the commission of the crime; (2) the random match probability estimate for the observed DNA profile is 1 in 1,000,000; and (3) undisputed evidence establishes that the defendant did not commit the crime. In this hypothetical case, the random match probability estimate is 1 in 1,000,000 even though the probability that someone other than the defendant contributed the evidence sample is 1.