The purpose of this forum is to introduce notable papers and books published by you and other persons. The work can be new or old, but it should be of wide interest and high quality. A brief comment on the significance of the work should be attached. The current categories of the subjects are (1) adaptation, (2) behavioral evolution, (3) dosage compensation, (4) evo-devo, (5) gene evolution, (6) genomic evolution, (7) molecular phylogeny, (8) natural selection, (9) phenotypic evolution, (10) sensory receptors, (11) sex chromosomes, (12) sex determination, (13) speciation, (14) symbiosis and evolution, and (15) horizontal gene transfer. However, new categories can be added if necessary. Emphasis will be given on the biological work rather than on the mathematical. Any person may post a paper by sending it to one of the editors listed below. We also welcome your comments on posted work, but we moderate all the comments to control spam. This forum is primarily for scientific discussion and to construct a database for good molecular evolution papers.

Wednesday, December 12, 2012

Ortholog Conjecture Debated

Contributed by: Jianzhi Zhang
Most molecular biologists would agree that a gene tends to be more similar to its orthologs than paralogs in terms of function.  This fundamental tenet, recently termed the ortholog conjecture, is a cornerstone of phylogenomics and is used by both computational and experimental biologists in predicting, interpreting, and understanding gene functions.  But, is this conjecture wishful thinking or empirically founded?
Orthologous genes arise via speciation, whereas paralogous 
              genes are generated by gene duplication.
Orthologs: A1 and A2; B1 and B2.
Within-species paralogs: A1 and B1; A2 and B2.
Between-species paralogs: A1 and B2; A2 and B1.

In a pioneering study, Nehrt et al. (3) attempted to test the ortholog conjecture using Gene Ontology (GO) annotations that were based on experimental data.  Contrary to everyone’s expectation, they found that the functional similarity between orthologs is lower than that between paralogs, when the level of sequence divergence is controlled.  Based on this and other findings, the authors proposed that protein function evolution is primarily determined by “the cellular context in which proteins act”.  This would explain why within-species paralogs, which are always in the same organism, were found functionally more similar than orthologs, which by definition reside in different organisms.

Nehrt et al.’s (3) finding stirred considerable controversies in cyberspace when published in the summer of 2011, evidenced by numerous discussions in various blogs.  The last 10 months have seen three papers that challenged Nehrt et al.’s conclusion from different angles, although the three papers do not completely agree with one another either. 

First, Thomas and colleagues, representing the group that annotated GO, claimed that GO annotation differences between homologous genes “do not reflect differences in biological function, but rather complementarity in experimental approaches” (4).  That is, gene function data are so sparse at the present that GO annotations reflect ascertainment biases in experiments rather than true functional differences. 

Second, Altenhoff et al. (1) identified a number of biases in GO.  After correcting these biases, they found weak but significant evidence for the ortholog conjecture.

Most recently, Chen and Zhang (2) reanalyzed GO annotations and confirmed some of the biases identified by Altenhoff and colleagues.  Most disturbingly, however, was the finding of many errors in GO annotation.  Even in so-called experiment-based annotations, across-species functional inferences were frequently made.  For example, an experiment was conducted on a monkey gene, but the function was annotated in GO for its human ortholog, based ironically on the ortholog conjecture. 

In one part of their study, Chen and Zhang (2) focused on pairs of orthologs or paralogs that have identical protein sequences and were studied in the same papers.  Surprisingly, while all nine such paralogous pairs have 100% GO-based functional similarity, only nine of 31 such orthologous pairs have 100% functional similarity.  More extremely, eight of the 31 orthologous pairs show 0% functional similarity, yet none of the papers that studied them explicitly mentioned their functional dissimilarity.  Apparently, they reflect ascertainment biases rather than true functional differences.  The authors also noted an upward trend in the functional similarity of orthologs, relative to that of paralogs, when analyzing the time series data of GO in the last five years. 

These and other findings led Chen and Zhang (2) to conclude that the current GO is unsuitable for testing the ortholog conjecture.  They thus turned to RNA-Seq gene expression data, which would be relative immune to ascertainment bias and annotation error.  They reported that orthologs are more similar to each other than to paralogs in gene expression.  But, regarding gene function, the jury is still out.  The sheer difficulty of proving or rejecting the ortholog conjecture, one of the most wildly assumed principles of molecular evolution, was completely unexpected, and it still amazes me to this day.   


1. Altenhoff AM, Studer RA, Robinson-Rechavi M, Dessimoz C (2012) Resolving the Ortholog Conjecture: Orthologs Tend to Be Weakly, but Significantly, More Similar in Function than Paralogs. PLoS Comput Biol 8(5): e1002514.

3. Nehrt NL, Clark WT, Radivojac P, Hahn MW (2011) Testing the Ortholog Conjecture with Comparative Functional Genomic Data from Mammals. PLoS Comput Biol 7(6): e1002073.

4. Thomas PD, Wood V, Mungall CJ, Lewis SE, Blake JA, et al. (2012) On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report. PLoS Comput Biol 8(2): e1002386.

Tuesday, December 4, 2012

Speciation Driven by Divergence in Heterochromatic Repeats

Contributed by: Zhenguo Lin

 Darwin used the title "On the origin of species" for his most famous book published in 1859. In this book he explained how a single species changes over time, but did not provide a proper explanation about how a species split into two or more different species. The problem of speciation has now become an important subject in evolutionary biology. From Hugo de Vries, Theodosius Dobzhansky, and Ernst Mayr to contemporary workers such as Jerry Coyne and Allen Orr, this problem has been studied extensively.

In this case it seems to be crucial to study speciation at the molecular level. In their recent review article, Nei and Nozawa (1) emphasized the importance of mutations in speciation by presenting many cases of molecular studies.  One of the mechanisms they considered is hybrid incapacity associated with heterochromatin. Specifically, they stated that hybrid sterility or inviability may occur by changes in repeat DNA elements in heterochromatin regions of the genome. Two representative examples were presented in the review article. (1) The different numbers of 359 bp repeats (zygote hybrid rescue locus, Zhr) caused hybrid inviability between Drosophila melanogaster males and D. simulans females.  (2) The localization of Odysseus homeobox (OdsH) protein to heterochromatic Y chromosome causes hybrid male sterility between D. mauritiana females and D. simulans males.  Recently conducting a comparative study of the genomic sequences from two closely related flycatcher bird species, Ellegren et al. (2) suggested that the divergence of complex genomic repeat structures (centromere and telomeres) may have generated the two species.

Figure 1 a, Male collared flycatcher. b, Male pied flycatcher. (From Ellegren et al. (2)).

 The collared flycatcher Ficedula albicollis and the pied flycatcher Ficedula hypoleuca diverged less than 2 million years ago.  They look very similar except for the presence of white collar in the former species (Figure 1). The authors from Uppsala University in Sweden have sequenced the ~1.1Gb genomic regions for 10 unrelated males in each species. By comparing these genomic regions, the author identified 50 "divergence islands", which show significantly high levels of sequence divergence between the two species. The length of an "island" ranges from 100 kb to 3 Mb, with a mean of 625kb. Interestingly, these “islands” are over-represented in the telomere or centromere regions, which are rich in repeat structures (Figure 2).  After detailed analyses of various evolutionary patterns of these "divergence islands" , such as  local mutation rates,  levels of nucleotide diversity,  allele-frequency spectra,  levels of linkage disequilibrium and shared polymorphisms,  the authors confirmed that these islands have experienced parallel selection in each species. Although no direct evidence was provided to support how these "divergence islands" contributed to the speciation, the authors believed that these observations "raise the possibility that centromeres or other heterochromatic repeats themselves are the driver of speciation" (2).


Figure 2. Distribution of divergence measured as the density of fixed differences per bp for 200-kb windows across the genome. Chromosomes are listed in numerical order and are separated by gaps. Red horizontal bars show the approximate location of centromeres in homologous chromosomes of zebra finch. Open read symbols are used to indicate that avian microchromosomes are generally acro- or telocentric. Both ends of these chromosomes are labeled as the orientation is not known. For chromosomes 4, 6 and 8, there is a lack of an in situ mapped marker 5′ of the centromere in zebra finch. (from Ellegren et al. (2)).

1. Nei, M. and Nozawa, M. (2011), 'Roles of mutation and selection in speciation: from Hugo de Vries to the modern genomic era', Genome Biol Evol, 3, 812-29.
2. Ellegren, H., et al. (2012), 'The genomic landscape of species divergence in Ficedula flycatchers', Nature. doi:10.1038/nature11584