The Ongoing Debate: Presenting Forensic Voice Comparison Evidence in Court
Kurt Lenz

U.S. Courts have been at best skeptical of forensic voice comparison evidence. This seems to be due to a pair of factors. First, judges perceive expert testimony about voice comparison as infringing on the province of the fact-finder because it involves comparing two voices to determine whether they sound alike1 ; to many judges, any layperson can make that comparison by simply listening to the speech evidence. Second, a prominent voice comparison technique, spectrographic analysis, has been widely discredited as reliable evidence of speaker identity.2 Although more sophisticated and scientifically valid means of voice comparison exist, experts in the field have thus far failed to articulate a methodological framework for presenting such evidence in a manner that U.S. courts have consistently deemed sufficient under Daubert.3

Thus, a debate is currently occurring among academics in the field over how expert testimony may be properly framed to ensure reliable and credible evidence of voice comparison. Featuring prominently in this debate is a position statement on forensic speaker comparison methodology recently published in the United Kingdom (the “UK Position Statement” or “Statement”)4, as a result of a collaborative effort by leading UK researchers. A recent response5 to the position statement has offered a detailed critique of the UK Position Statement. Together, these papers may lead to a consensus for a framework for the presentation of forensic voice comparison evidence that will make expert testimony on the subject more likely to be admitted as evidence in U.S. courtrooms.

The views reflected in the UK Position Statement were first delivered at the August 2005 meeting of the International Association for Forensic Phonetics and Acoustics in Marrakech, Morocco, and resulted from a collaborative exercise among a number of researchers and forensic practitioners working in the United Kingdom.6 It was circulated to all practicing forensic speech scientists and interested academics in the UK, and all but a handful became co-signatories to the Statement. The Statement was thereafter submitted to prosecutorial authorities throughout Great Britain.

    A. Motivations and Goals for the UK Position Statement
The UK Position Statement was motivated by a concern about the framework in which conclusions are typically expressed in forensic speaker comparison cases.7 The controversy had been brewing among forensics professionals since at least 1996, when the Appeal Court of England and Wales ruled that DNA findings in a criminal case had been presented in a way that gave undue weight to the evidence.8 In that case, the particular approach to representing DNA evidence was ruled improper. On considering the likelihood scales used by most practitioners in speaker comparison cases, it became clear that those findings similarly gave the same false weighting to speech findings as the outlawed framework did to DNA findings.9 The authors of the UK Position Statement therefore sought to develop a new framework for speech findings that is, at a conceptual level, identical to the revised approach used today in the presentation of DNA evidence.10

The central aim of the UK Position Statement is to fundamentally change the roles of analyst and evidence.11 In the past, forensic speech scientists were presented in court as identifying speakers. Under the new approach, experts do not make identifications. Rather, their role becomes that of providing an assessment of whether the voice in a questioned recording is more or less comparable to the defendant’s voice. The expert’s principal activity, therefore, is not identification but comparison. This new approach would bring the field of voice comparison into line with other fields of forensic science.12

    B. The Proposed Framework
Typically, in forensic voice comparison, a recording of an unknown voice, usually of an offender, is compared with one or more recordings of a known voice, usually of a suspect or defendant.13 The parties want to know if the unknown (or questioned) voice comes from the same speaker as the known voice. The expert is asked to offer an opinion as to how probable it is that the samples have been produced by the same person. The UK Position Statement proposes a methodological framework for answering this question.

In the framework proposed in the UK Position Statement, speech samples are to be compared in terms of two serially ordered factors: consistency and distinctiveness. Consistency is characterized as “whether the known and questioned samples are compatible, or consistent, with having been produced by the same speaker.”14 It is assessed by “the degree to which observable features [are] similar or different.”15 Differences between known and questioned samples count against consistency unless “they can be explained by models of acoustic, phonetic, or linguistic variation (e.g. by reference to differential channel characteristics, [or within-speaker] sociolinguistics, psychological and/or physical factors).”16 Pursuant to this framework, consistency is quantified on a three-point scale. If the expert concludes that the samples are consistent, then the expert turns to the question of distinctiveness. Thus, a judgment on distinctiveness is only made if there has first been a positive determination of consistency.

The UK Position Statement emphasizes that a positive determination of consistency does not imply that the known and questioned samples were necessarily spoken by the same person, since “the cluster of features leading to the consistency decision . . . [may] be shared by a substantial number of other people in the population.”17 In other words, two very similar, yet typical, samples will not be valued as highly in terms of strength of evidence in favor of identity as two very similar, yet atypical, samples.18 The likelihood that the samples have been produced by the same speaker will be greater if their shared features are distinctive or unusual. Under the framework, distinctiveness is assessed on a five-point scale ranging from the “not-distinctive” to “exceptionally-distinctive,” the latter reflecting that “the possibility of this combination of features being shared by other speakers is considered to be remote.”19 The presumption is that an expert witness will report that the samples are consistent with having been produced by the same speaker, and then explain the degree of distinctiveness as an indicator of how unusual it would be to find this consistency if the two samples were not produced by the same speaker.

In July 2009, two linguists named Phil Rose and Geoffrey Morrison jointly published a detailed critique (the “Response”) of the UK Position Statement in the International Journal of Speech Language and the Law.20 While they applauded the motivation behind the UK Position Statement and welcomed its general direction, they observed that the proposed framework itself reflected an imperfect compromise that failed to fully incorporate what is, in their view, the logically and legally correct framework for the evaluation of forensic comparison evidence: the likelihood-ratio framework.21 The likelihood-ratio framework essentially represents the current practice in other fields such as the evaluation of DNA evidence, and has been endorsed by a large number of forensic statisticians, legal experts, and forensic scientists.22

    A. The Likelihood-Ratio Framework
The central question to be answered by the likelihood-ratio framework is how probable it is, given the voice evidence, that the questioned and known samples have been spoken by the same person. The likelihood-ratio framework is based on Bayes’ Theorem23, which provides that the probability of a hypothesis being true, given the evidence, can be estimated from two things: (1) how probable the hypothesis is, before the evidence is adduced; and (2) the strength of the evidence.

Thus, the likelihood-ratio framework provides that the odds of two samples being from the same speaker, given the speech evidence, is derived quantitatively by multiplying the “prior odds” in favor of both samples being the same speaker by the strength of that evidence. The “prior odds” are the odds in favor of the hypothesis before the voice evidence is adduced. These are simply the probability that it is the same speaker divided by the probability that it is a different speaker. At its broadest, this ratio could include anyone in the world, but the prior odds can usually be considerably narrowed by taking into account obvious information in the voice like sex and accent, as well as other pragmatic information. The likelihood ratio is the most important metric in this equation because it is a measure of the strength of evidence in favor of the hypothesis. The likelihood ratio quantifies how much more likely you are to get the differences between the two speech samples assuming they have come from the same speaker than assuming they have come from different speakers.24

This approach has been endorsed by the main textbooks on the evaluation of forensic evidence and forensic statistics, which stress that it is the role of the forensic expert to quantify the strength of the evidence by estimating its likelihood ratio:
    “The case made for this approach, whether the subject matter is DNA, glass fragments, clothing fibres or whatever, is overwhelming. . . Statistical evaluation, and particularly Bayesian methods such as the calculation of likelihood ratios . . . are the only demonstrably rational means of quantifying the value of evidence available at the moment: anything else is just intuition and guess-work.”25
    B. Criticisms of the UK Position Statement
Chief among the criticisms of the UK Position Statement is the fact that it would permit experts to deviate from the likelihood-ratio framework at several key points. The UK Position Statement provides that experts should not testify as to the identity of the speaker of a given sample (i.e. that the defendant produced the questioned sample), but rather should confine their testimony to comparing the speech samples (i.e. that the two speech samples originate from the same source).26 However, the UK Position Statement would permit an expert, where the samples are not consistent, to state that the samples are spoken by different speakers.27 Also, the UK Position Statement would permit an expert to identify the speaker where independent evidence shows that a closed set of known speakers was present and participating in the conversation.28 As the Response points out, there is no logical or scientific basis for carving out these two exceptions from the general rule that experts should confine their testimony to the mere probability, based on the likelihood-ratio, that the two samples came from the same source.29

Another significant criticism of the UK Position Statement is that it fails to account for the necessity of comparing multiple features in speech samples.30 Forensic speech comparison consists of:
    “separating out the samples into their constituent phonetic and acoustic ‘strands’ (e.g. voice quality, intonation, rhythm, tempo, articulation rate, consonant and vowel realizations) and analyzing each one separately.”31
Indeed, multidimensionality is one of the speech attributes that contributes to the ability to discriminate between voices. However, the UK Position Statement gives no indication of how ultimately to combine the individual evidence from each of these “strands,” even though the available literature already contains a number of proposed procedures for dealing with such multivariate data within the likelihood-ratio framework.32

Finally, Rose and Morrison acknowledge that there is a real problem, for the purposes of conducting a quantitative likelihood-ratio-based analysis, of defining a relevant reference population and obtaining sufficient demographic data. The first problem is theoretical and relates to the choice of the relevant population to sample for comparison purposes, and the size of that sample. The second problem is practical and relates to the actual collection of language data. The authors argue, however, that neither of these problems prevents the use of likelihood ratios.33

In summary, the Response contends that the UK Position Statement’s proposed framework is not, in fact, a likelihood-ratio framework, even though likelihood-ratio frameworks characterize current thinking on the evaluation of forensic evidence.34 The Response therefore rejects the UK Position Statement’s claim that its proposed framework, “at a conceptual level, [is] identical to that used nowadays in the presentation of DNA evidence.”35 As a result, the UK Position Statement has not achieved its purported goal of “. . . bring[ing] the field [of forensic voice comparison] into line with modern thinking in other areas of forensic science.”36 The authors of the Response instead urge forensic voice comparison researchers and expert witnesses to rapidly move towards adopting quantitative likelihood-ratio statements as the standard.37

The UK Position Statement proposes a framework for forensic voice comparison that would represent a substantial step forward in voice comparison methodology by bringing the field more in line with the majority of other forensic sciences and by offering an approach that should be acceptable to U.S. courts under Daubert. Rose’s and Morrison’s Response agrees with the UK Position Statement on more points than it disagrees. The Response does, however, highlight some significant shortcomings and suggests alternative approaches that would address those shortcomings, principally by adhering more strictly to the likelihood-ratio framework. Continued engagement among the authors of these two documents and their colleagues in the field should soon lead to a consensus on these issues, and forensic speaker comparison evidence should soon thereafter become more widely utilized in the courts.


• Kurt W. Lenz is a visiting assistant professor of legal skills at Stetson University’s College of Law. He has written on the subjects of expert testimony and forensic evidence, and is currently pursuing a graduate degree in applied linguistics at the University of South Florida.

