Why do evolutionary rates vary




















Regions were chosen so as to maximize their length. The statistical significance of a correlation was assessed by a permutation test. The values for one variable were randomly permuted, so that each value for one variable was randomly paired with a value for the other variable.

The correlation for the randomized data was then calculated. This process was repeated 10, times. The P value was taken to be the fraction of these correlations that had an absolute value at least as large as that of the correlation of the actual unpermuted data.

For the correlations of ER with rate variability, the regression line was calculated separately for every permutation, and, in the sampling analysis discussed later , for every combination of a permutation and a random sample. As discussed in Results, genes that evolve more slowly will exhibit more sampling variance for rate estimates, leading to greater deviations of the observed ratio of rates from the central value.

This may make a variable appear to correlate with rate variability when it merely correlates with rate. To eliminate this effect, we equalized sampling variance by sampling some fixed total number, n , of amino acid changes from each gene with at least n changes in the terminal branches. These were drawn at random, without replacement, from the changes in the two lineages e. Equivalently, the number of the n changes assigned to one lineage was drawn from a hypergeometric distribution, with the remainder assigned to the other.

Consider a mammalian ortholog set with 7 inferred amino acid changes in the primate lineage and 15 in the rodent lineage. After sampling, the number of changes in the primate lineage will be an integer between 0 and 7, inclusive, with probabilities given by a hypergeometric distribution. The number in the rodent lineage will correspondingly range from 10 to 3, so that the sum for the two lineages is necessarily This sampling process was performed repeatedly, and an average of the resulting correlations was taken.

The statistical significance of the result was evaluated using a permutation test as described earlier. Our analyses involved ortholog sets from either four mammalian species or four Drosophila species, related as shown in figure 1. We found that these analyses were compromised by the quality of the originally reported genome sequences of the rhesus macaque Rhesus Macaque Genome Sequencing and Analysis Consortium et al.

For the rhesus macaque, we used a more recent genome sequence Yan et al. We found that we could improve upon the annotations of this genome; apparently the propagation of gene models from the earlier, less complete rhesus genome sequence had led to some incorrect gene models see supplementary information , Supplementary Material online, for an example.

We therefore utilized coding sequences generated by our own gene-finding techniques. The same holds for the two Drosophila species pairs correlation 0. S5 , Supplementary Material online. Relationships between protein ERs in different lineages. Each point represents a set of orthologous genes. One source of deviation from a perfect correlation is sampling variance due to the finite number of substitutions observed.

We assessed this effect by simulating Poisson sampling of estimated rates. The inferred substitutions were partitioned between the two lineages according to branch lengths, corresponding to a nonfluctuating ER and a perfect correlation. We then drew Poisson-distributed samples using these substitution numbers as the means.

This produced correlations significantly higher than those observed for the real data: 0. Thus, sampling variance does not explain all of the deviation from a perfect correlation, and ERs indeed vary between lineages. The results presented later provide information about the nature and causes of this rate variation.

Table 1 shows the relationship between phylogenetic depth and variation in ER. Little or no correlation is observed. What little correlation is observed for flies can apparently be explained largely by greater sampling error for highly expressed proteins because they tend to evolve more slowly, as demonstrated by the sampling analysis shown in table 3.

Thus, despite the fact that expression level is the best known predictor of ER, it does not correlate strongly with the tendency of that rate to fluctuate. If, on the other hand, changes in ER are due to factors external to the protein, no such relationship is expected.

The internal branch value served as an estimate of the mean of the rate parameter. This is, we believe, preferable to using the external branch values for this purpose: because these values are used to calculate the ratio, using them for the rate estimate as well could lead to artificial correlations.

The relationship between rate and variability was measured by the rank-order correlation between the deviations and internal branch parameters. Analysis of the relationship between ER and its temporal variation. The method is illustrated using a set of hypothetical genes, each represented by a point. The horizontal axis represents the logarithm of the estimated rate parameter e. A The vertical axis represents the logarithm of the ratio of the rate parameter estimates for the two terminal lineages e.

A least-squares line is fit, and the absolute deviations from the line are taken as estimates of rate variability. B The vertical axis represents the magnitude of these deviations. The coefficient of correlation between the variables in B is an indication of the relationship between rate and rate variability.

As shown in table 4 , the correlation between protein ER and its variability is weak. Even this weak correlation might be artifactual. This statistical fact would lead to a negative correlation even in the absence of a biological effect. To investigate the contribution of this effect, we performed computations that eliminated it by randomly thinning the substitution counts in the branches used to calculate the ratios. This leaves all genes with the same total number of substitutions in these branches, but with a variable distribution between branches that reflects the ratio of rates.

Hence, it largely equalizes sampling variance among proteins with different ERs. For simplicity, this analysis uses protein p-distances based on a most parsimonious reconstruction. The sampling procedure eliminates much or all of the small negative correlations described earlier table 5. Thus, those correlations are, in part or in whole, artifacts of sampling variance. The true correlation between protein ER and its tendency to fluctuate appears to be close to zero. We also performed a test based on the index of dispersion.

This quantity will be independent of the mean ER if the variability of the rate, as measured by the coefficient of variation ratio of standard deviation to mean , does not depend on the rate. We estimated this quantity from the terminal substitution counts using the method of Gillespie and calculated its correlation with the internal branch count.

For mammals, the Spearman rank-order correlation was 0. For flies, the Spearman correlation was 0. Explanations for this observation include the ideas that conserved, functionally important genes are more likely to be retained as duplicates Davis and Petrov and that retained duplicates are highly structurally constrained Yang et al. In contrast to duplicates, singleton genes are more poorly annotated and may have less critical functions Jordan et al.

Gene duplication is particularly common in plants due to the prevalence of whole-genome duplication WGD by polyploidy.

WGD has occurred throughout the evolutionary history of flowering plants Soltis PS and Soltis DE , including a putative event at the base of the angiosperms Jaillon et al. In addition, many species, like Arabidopsis and maize, have experienced multiple WGD events over their evolutionary history Vision et al.

However, WGD is not the only mode of gene duplication. Genes may also be duplicated by segmental events that encompass large chromosomal regions, by dispersed duplication of single genes Akhunov et al.

For example, genes retained as duplicates after WGD events are overrepresented for transcription factors and signal transducers Blanc and Wolfe ; Maere et al. In contrast, tandemly duplicated genes, which represent one type of non-WGD duplicate, are biased for membranous proteins and genes involved in stress response Rizzon et al. Given these functional differences, it seems plausible that evolutionary rates vary not only between singletons and duplicates but also between WGD and non-WGD duplicates.

To date, there have been no genome-wide studies of evolutionary rates among plant nuclear genes. As a result, there has been no accurate description of rate variation among genes, little investigation as to whether rates vary along chromosomes, and few attempts to correlate evolutionary rate with duplication status or other important evolutionary characteristics.

The dearth of rate studies stems from a lack of sequenced closely related genomes that permit accurate ortholog identification Gaut and Ross-Ibarra The A. Moreover, the two genomes have shared WGD events, including one or two events near the base of the angiosperms Simillion et al. As a result, the duplication status of individual genes should be well conserved between species. In this study, we seek to characterize genome-wide patterns of rate variation among plant nuclear genes in the hope of inferring some of the evolutionary forces that shape this variation.

We then use these rate estimates to address the following three questions. Second, what are the major correlates of evolutionary rates? Finally, do rates vary as a function of duplication status? The orthologs and alignments for this study are the same as those used in Yang et al. To identify duplicated and singleton genes within species, an all-against-all BlastP Altschul et al.

Duplicated genes were identified according to previous methods Gu et al. Next, we submitted each protein as the query to search against the A. Proteins were removed from further consideration if they formed a link due to their homology with the same repetitive element. Finally, a single-linkage algorithm was used to group proteins into families. For duplicated genes, we classified them as to whether or not they were derived from WGD events according to the assignments of Blanc et al.

The Blanc et al. Because some genes were found in both age classes, we restricted our data set to genes with only one age annotation. We also repeated all analyses using the WGD duplication definitions of Bowers et al. All the results were qualitatively identical with the Blanc et al. We utilized principal component regression PCR analysis to explore the potential contribution of evolutionary parameters to the total variance in evolutionary rate among genes Jolliffe ; Drummond et al.

We log transformed the predictor variables if log transformation led to a higher correlation coefficient, added a constant 0. We incorporated 14 gene characteristics into our PCR model based on availability and precedence in the literature. For each A. We obtained A. Given these markers, local recombination rates were estimated by using MareyMap Rezvoy et al.

The LOESS procedure depends on two parameters: the polynomial degree and the span, which describes the number of points used to calculate the local polynomial around a marker. We assessed the multifunctionality of a gene by counting the number of biological processes in which a gene is involved Salathe et al.

Yang et al. For each orthologous gene pair, we obtained the divergence score d SM , defined as the fraction of both sequences that does not contain a region of significant local similarity Castillo-Davis et al.

A d SM value of 0 indicates complete sharing of motifs between sequences, whereas a d SM value of 1 indicates an absence of shared motifs. To include information about chromosomal location, we scaled the distance of each gene from the centromere.

On each chromosomal arm, values ranged from 0 to 1, with higher values indicating greater physical distance from the centromere. We began with 19, orthologous pairs Yang et al. First, we retained only the orthologous pairs that were defined as duplicated in both species or deemed as singletons in both species.

Second, we retained only duplicated genes that had a single unambiguous assignment with regard to early or recent WGD events. Our final data set consisted of 11, orthologous pairs, including 9, duplicated genes and 1, singletons. The K s distribution had a mean of 0. The coefficient of variation CV of K s was 0. The K a estimates had a lower mean, at 0. N ote. S1 — Supplementary Data , Supplementary Material online, for chromosome 2—5, respectively. In general, there were few marked peaks for K s fig.

To test whether divergence values within windows were higher than expected, we randomly permuted K s values among genes, holding gene location and window definitions constant.

Over 10, permutations, we determined whether an observed K s value for a window was extreme. Figure 2 provides an example whereby K s values in a window are elevated for some regions near the centromeres and in the region spanning 24—27 Mb on chromosome 1. Generally, when K s values were extreme, they tended to be elevated in arm regions proximal to centromeres and reduced near telomeres fig. S1 — Supplementary Data , Supplementary Material online.

To test whether divergence values within windows were higher or lower than expected, we randomly permuted the K values among genes, holding gene location and window definitions constant. Over 10, permutations, we determined whether the observed value for a window was extreme. The top bar in each plot shows the P values that the observed value is higher than expected; the bottom bar in each plot shows the P values that the observed value is lower than expected.

The dotted lines indicate the mean values of evolutionary rates for all genes on chromosome 1. To perform a general analysis of the factors that contribute to evolutionary rates, we selected 14 variables that might correlate with or contribute to evolutionary rates table 2. The 14 variables were available for orthologs. Most of these variables were correlated with evolutionary rates in pairwise fashion table 2. Most variables were correlated with K s as well, but the pattern differed slightly from K a table 2 ; for example, duplication mode was correlated with K a but not K s.

N OTE. Of course, many of these factors are intercorrelated, making it difficult to identify the contribution of individual factors to evolutionary rates. However, this experimental approach did not provide evidence for an increase in mitochondrial DNA mutation due to metabolic oxidative stress Joyner-Matos et al. The relationship between life history and molecular evolutionary rates is more apparent in animal nuclear genomes.

While a study on the effect of 14 life-history traits on molecular evolutionary rates of mitochondria in mammals did not provide strong support for the generation time hypothesis Nabholz et al. Evidence suggests that their nuclear genomes evolve according to expectations from neutral theory. For example, the nuclear DNA of humans has a lower molecular evolutionary rate compared to primates that have shorter generation times. Specifically, in mammals, neutral evolutionary rates depend on generation time, while non-synonymous rates depend on population size Nikolaev et al.

Similar patterns have been demonstrated in invertebrate animals with the exception that generation time was correlated with non-synonymous rates Thomas et al. These studies further support the generation time hypothesis, but we must keep in mind that it is not mutually exclusive from population size effects. We have discussed evidence to support the idea that molecular evolutionary rates are driven by life history. By comparing differences among a wide variety of organisms, biologists can test the prediction that DNA nucleotide sequences do indeed evolve according to a rate that, at least partially, depends on organism-level traits.

Generation time and metabolism, each to some degree or in combination, affect the mutation rates of some organisms and, thus, their molecular evolutionary rates.

Even so, some relationships among generation time, metabolism, and molecular evolution depend on whether the organism is a plant or an animal and the location of the genome within the cell i. Also, differences in neutral vs. Bazin, E. Population size does not influence mitochondrial genetic diversity in animals. Science , Galtier, N. Mitochondrial whims: Metabolic rate, longevity and the rate of molecular evolution.

Biology Letters 5 , Gillooly, J. The rate of DNA evolution: effects of body size and temperature on the molecular clock. Graur, D. Fundamentals of Molecular Evolution. Sunderland, MA: Sinauer Associates, Joyner-Matos, J.

No evidence of elevated germline mutation accumulation under oxidative stress in Caenorhabditis elegans. Genetics , Kimura, M.

The Neutral Theory of Molecular Evolution. Muller, K. Evolutionary rates in Veronica L. Plantaginaceae : Disentangling the influence of life history and breeding system. Journal of Molecular Evolution 70 , Nabholz, B.

Strong variations of mitochondrial mutation rate across mammals-the longevity hypothesis. Molecular Biology and Evolution 25 , A. Determination of mitochondrial genetic diversity in mammals. Genetics , B. Nikolaev, S. Life-history traits drive the evolutionary rates of mammalian coding and noncoding genomic elements. Smith, S. Rates of molecular evolution are linked to life history in flowering plants. Thomas, J. Mutations do not seem to have major effects on limiting evolution because diversity in morphological evolution evolution of physical characteristics does not correlate well with DNA mutation rates.

However, in some cases, evolution rates can depend on mutation rates. A good example is antibiotic resistance. Bacterial mutation rates can induce changes in the ability to become resistance to antibiotics. If certain characteristic are more efficiently selected against in one species compared with a different species, then the rates of evolutionary change will vary between them.

Selective pressures are greater in larger populations. Therefore, small populations might not be able to evolve rapidly enough in a rapidly changing environment. In this scenario, inefficient natural selection will be a limiting factor in the rate of evolution. Finally, constraints that occur when a mutation in a gene produces a beneficial characteristic for the species but impedes the function of other gene products can be a limiting factor on the rate of evolution.

These architectural constraints can be bypassed if gene or whole genomes are duplicated. When this occurs, the extra genes can compensate for the negative effects on gene function imposed by the beneficial new function from the gene that is mutated.

In a sense, these extra genes will speed up the rate of evolutionary change.



0コメント

  • 1000 / 1000