Eleven DNA sequences and an initially underapreciated scientific revolution

Martin Kreitman’s (1983, Nature 304: 412–417) ground-breaking paper on previously unobserved DNA polymorphism was an appropriate opener for this quarter’s reading group on “Classics in Evolution”. Alisa Sedghifar lead our discussion. She reminded us of the impressive amount of theory about sequence evolution that had been developed before actual sequence data was available. Kreitman’s publication of variation in the sequence of the alcohol dehydrogenase (Adh) gene among populations of Drosophila melanogaster initiated a significant transition. For decades after the population genetic synthesis in the 1920s, the field had been largely dominated by theory. The 1970s and 80s reflect the turning point after which population genetics became more and more data-driven. Although the proper revolution was not sparked until powerful next-generation sequencing technologies were developed in the 1990s, these two decades saw the first RNA and DNA sequence data, and were the hour of birth of coalescence theory.

Prior to the 1970s, genetic diversity could be measured only indirectly via gel electrophoresis of proteins. This method could resolve different protein variants (allozymes) based on their electric charge. Although the amount of variation discovered this way was considerable (Lewontin and Hubby 1966 would be another great Classic), DNA sequence polymorphisms that did not change the physical properties of enzymes or their amino acid sequence went unnoticed. So did mutations in non-coding and untranslated regions. Because it was impossible to distinguish between silent mutations and those that alter the phenotype, reliable estimates of rates of neutral substitutions could not be obtained. Hence, the long-standing debate about the relative importance of (nearly-) neutral evolution versus selection was a hypothetical one in the absence of empirical evidence.

Kreitman used the Maxam-Gilbert method (Maxam and Gilbert 1977) to obtain eleven DNA sequences of the Adh gene from five geographically distinct D. melanogaster populations. It was previously known that almost all populations were polymorphic for two alleles, the slow (Adh-s) and fast (Adh-f) alleles, which differ by a single amino acid. They were so called because they move slow and fast on the gel, respectively. Kreitman’s aim was to find out whether he could find genetic heterogeneity among proteins with identical electrophoretic mobilities, e.g. among the Adh-s alleles. And he did. Comparing five Adh-f and six Adh-s alleles, he detected many silent DNA changes in coding and non-coding regions, but no amino acid change within the two allelic classes. Across all classes of annotation (flanking regions, introns, exons, untranslated regions), there were 43 sequence polymorphisms, 14 of them in the coding regions (translated regions of exons), 18 in introns, 3 in untranslated regions, and 8 in the flanking regions. Thirteen out of the 43 sites carried singletons with respect to the sample. The single nonsynonymous polymorphism in the coding sequence resides in exon 4 and accounts for a change from Threonine (Adh-f) to Lysine (Adh-s). Besides sequence polymorphisms, Kreitman also found length polymorphisms (indels, repeats) in one intron and in the 3’ flanking region. Of course, this type of variation also could not have been detected using allozyme data.

Kreitman goes on summarising relative rates of nucleotide substitution. He found no evidence for an unequal distribution of possible substitutions, and no significant deviation from a 1:1 ratio of transitions and transversions. However, the effective proportion of silent sites in translated exonic regions was significantly higher compared to introns and other regions. Kreitman interprets this as evidence for selective constraints; in the absence of these constraints, a higher proportion of non-silent sites would be expected. This pattern is slightly confounded by high effective proportions in two introns, though. Kreitman speculates that these introns might also face some constraints. Interestingly, he also takes very low levels of effective silent polymorphisms in flanking regions as potential evidence for strong functional constraints. Hence, either very low or very high effective proportions of silent sites are interpreted as evidence for constraints. This left me somewhat puzzled.

To me, the most interesting part of the paper is when Kreitman sets out to argue about the strength of natural selection against replacement substitutions. Again, this is triggered by the fact that only one out of 13 polymorphisms in the coding region was nonsynonymous. Under neutrality, the expected number of replacement polymorphisms would be 39, instead of one, which brings us to the quote of this paper:

“The large discrepancy between the expected (39) and observed (1) number of replacement polymorphisms clearly indicates that the overwhelming majority of amino acid changes in the Adh polypeptide sequence are constrained by natural selection.”

However, does this mean that the selective disadvantage of such mutations must have been large? Kreitman reminds us that it is the product of the selection coefficient and the effective population size that matters. Given a large enough effective size, selection coefficients might have been small and, hence, in the realm of what Kimura (1968, 1969; two more Classics!) defined as ‘effectively neutral’. To answer this, Kreitman does some back-of-the envelope calculations and, using his observed number of silent polymorphisms, comes up with an estimated effective population size of about 3 million. This is about 300 higher than previous estimates at that time. Therefore, the effective population size in D. melanogaster seemed high enough to argue that the discrepancy between observed and expected numbers of replacement polymorphisms at the Adh locus was a consequence of purifying selection against mildly deleterious mutations.

Kreitman closes his article with a phylogenetic comparison of the Adh alleles. The Adh-f alleles are more homogeneous as a group than the Adh-s alleles, and the two sibling species of D. melanogaster, D. simulans and D. mauritians, both carry only the slow allele. Therefore, Kreitman concludes that the Adh-f is derived from the Adh-s allele.

From a historical perspective, it is interesting that this article hardly made it into Nature. It was initially rejected by an editor. Read more on the story in a blog article by C. Bergman: http://caseybergman.wordpress.com/2013/08/04/on-the-30th-anniversary-of-dna-sequencing-in-population-genetics/.

1 thought on “Eleven DNA sequences and an initially underapreciated scientific revolution

  1. Pingback: The Price equation and quantitative genetics in view of inclusive fitness and group selection | simonymous

Leave a comment