Earlier discussions about genome evolution in this and other blogs coincided with my being reminded of a Trends in Genetics paper in 2009 by Khalturin et al. (1) on the subject of ‘orphan’ genes, and there have been two recent papers on this topic, Tautz et al. (2) and Ranz & Parsch (3) that seem worthy of comment here. Orphan genes are individual genes or small gene families that are sequestered within specific taxonomic groups, but have no known related genes outside the group. The term is of course a misnomer, and could be highly misleading, because unless you’re a Creationist the DNA that’s a gene today had to be in some genome somewhere ancestrally. But in what form and how did it arise?
An important generalization at the core of modern molecular evolution is that evolution occurs by duplication events. The idea is that genes have from early days been so structured that they cannot arise just by random mutation of single nucleotides. But duplication is clearly only part of the story: since every gene needs to be regulated, and regulatory elements are shorter, more fluid in number and location than the genes they regulate, they can arise more easily by mutation alone than whole genes can. Thus even in modern theory a combination of duplication and ‘ordinary’ mutation is responsible for genome evolution. But if genes themselves (and/or their exons) arise by duplication, that creates a family of related sequences. So how can there be orphan genes, without a trace of relatives?
The above authors point out that in every taxonomic group so far studied up to 20% or even more of its genes have no recognizable homologues in other species. Or, more properly, they are ‘Taxonomically Restricted Genes’ (TRG’s) with perhaps a small gene family that is, however, found only within a specific taxonomic clade. Given the idea that genomes evolve by duplication, the prevalence of orphans needs some explanation, and these authors basically provide three.
First, the authors argue that the existing evidence suggests that the orphan genes fulfill some restricted taxon-specific adaptive needs (e.g., specific functional cell-types in cnidarians). If that is the case, the relatives in collateral taxa must have lost their function, their trace erased by mutation or deletion not opposed by selection. Khalturin et al. (1) suggest that “once a certain evolutionary time has elapsed” sequence similarity to the ancestral gene will be erased.
Mutational erasure is clearly possible over time. Marshall et al. (4) tried to quantify the idea in 1994, concluding that after about 10 million years, genes mutated into pseudogenhood could no longer be revived by mutation (but retain enough sequence to still be recognized as pseudogenes).
For this to be the case, we have to assume the ‘parental’ gene(s), that must have existed if genes at the time of the taxonomic split had previously been important when the new species branched off, but then were later removed by drift or selection from the descendants of the parental clade, while serving some strong, or new adaptive function in the new clade. Presumably the parental taxa didn’t include a large gene family related to the orphan, because if they did a large gene family would have been serving one—or many—important functions at the time and there would likely still be at least some of them around today.
We know that gene families can persist in widespread branches of life, without sequence easily recognized by BLAST or other homology searches, based on work by Kazz Kawasaki (5) in my group for unusual kinds of proteins, such as the disordered proteins like those involved in biomineralization, in which the protein 3D structure is not as important for function as its ion-binding capacity. At least one of the genes in the SCPP gene family (Amelogenin, responsible for capturing Ca+ ions in forming dental enamel crystals) was considered an orphan gene, until we identified its relatives (6).
Second, the orphan-paper authors acknowledge that BLAST searching and our genome data bases are imperfect, so that relatives of some of these orphans may be eventually identified. However, it seems unlikely that that will account for all of them, and there should be some gene-age consequences both for the adaptive function (something new in the clade, for example) and time for the ancestral genes to be erased. Some evidence cited by Tautz (2) is ambiguous in regard to the estimated age and functions of orphans, so the picture is not wholly clear. Is it more plausible that in so many cases the homologies just haven’t been recognized for some reason, including incomplete genomic data currently available?
Three, the explanation that is most interesting is that orphan genes really are orphans in the sense of having arisen de novo, without being copies of functional genes. Tautz (2) and Ranz (3) both suggest that regulatory sequences might arise near to DNA that has enough of the structure of genes (start, stop, and splice sequences, proper coding exons, polyA addition sites) to be transcribed as well as translated, and serve some function that over-rode any possible toxicity a new protein might have in the cell. Regulatory sequences usually involve many different TF-binding elements, so may have been put in these places by translocation events of such elements from other genes. Or in examples cited, the new gene may be in an exon of an existing (and hence already regulated) gene.
Tautz (2) provide a step-by-step scenario for de novo gene creation. These authors recognize the stretched plausibility of such ideas, given the seemingly miniscule probability that functional genes—with strongly advantageous effects—could arise this way. This is certainly a challenge to Ohno!
One possibility mentioned is that the recent discovery that much or most of DNA, including non-coding DNA, is transcribed into RNA. The cell obviously tolerates this RNA litter, which could make it more likely that occasionally such an RNA has translatable properties. Of course, one might suggest that any such RNA sequences are actually the unrecognized fragmentary trash of long-dead genes. If de novo creation were to happen often, most of the time selection would perhaps remove it. But over millions of years maybe it happens enough.
Could it be that the history of discovery has misled us to become Ohno-ized? We discovered interrupted genes, which led to theories of gene origin by exon duplication (and many genes have repeat exons with high duplication/deletion properties). Then we discovered gene families, and this led to the obvious conclusion that duplication was ‘the’ mode of genome evolution. We excepted enhancer evolution which can easily come and go by normal point mutation. But this led to the discovery of the generality of gene families and focused attention on them, and the networks in which they participate, and the related but diverging functions they fill.
Could it be that instead, new genes often really do arise by de novo mechanisms, and disappear by deletion before they generate large gene families? If they are old enough, they and their paralogs would be less taxonomically restricted than if they are recent. After all if you had a phylogenetic dart board and randomly through darts, most would hit on some branch, not at the very top: their descendants would be ‘taxonomically restricted’. So it’s the lack of collateral relatives rather than the taxonomic restriction that seems most curious to me.
Other authors have suggested that human orphan genes are often expressed in the brain, but that seems to me to be yet another kind of forced human exceptionalism, because most genes old or new are expressed in the brain. Likewise, Khalturin (1) propose that taxon-specific genes “drive morphological specification,” as part of “rewiring” of the networks of regulatory genes. But isn’t this always the case? Except in some special circumstances, traits are usually affected by many interacting genes. They seem to evolve by gradually diverging functions emerging from selection acting (again gradually) on the diversity made possible by the redundancy generated by gene duplication. It seems rather unlikely that a newcomer of basically random structure could participate in such a network (or be properly expressed in a relevant tissue context) to experience strong positive selection.
Orphan genes may be simply be lucky genes in complex systems that happened to survive for us to observe them--different contributing genes, for different reasons including drift, surviving in different taxa. The 20% of such genes that are orphans may just be the normal passengers on the train of duplication and deletion.
If de novo evolution is common, or more common than we thought relative to gene duplication, we may have to revisit the strong evidence for the evolution of gene evolution by exon duplication and the proliferation of ancient gene families. Have we missed something?
This could be a startling realization. I’m sure many Mol. Evol. readers will know more about this than I do, and I’d like to see what you think.
1. Khalturin, K, Hemmrich, G, Fraune, S, Augustin, R, and Bosch, TCG. More than just orphans: are taxonomically-restricted genes important in evolution?Trends in Genetics 25(9): 404-413, 2009.
2. Tautz, D, and Domazet-Loso, T. The evolutionary origin of orphan genes. Nature Reviews Genetics 12(10):692-702, 2011.
3. Ranz, J, and Parsch, J. Newly evolved genes: Moving from comparative genomics to functional studies in model systems. BioEssays 34: 477-83, 2012.
4. Marshall, CR, Raff, EC, and Raff, RA. Dollo’s law and the death and resurrection of genes. PNAS 91(25): 12283-7, 1994.
5. Kawasaki, K, Buchanan, AV, and Weiss, KM. Biomineralization in humans: making the hard choices in life. Ann. Rev Genet,43: 119-142, 2009.
6. Kawasaki, K, and Weiss, K. Mineralized tissue and vertebrate evolution: the secretory calcium-binding phosphoprotein gene cluster. PNAS 100:4060-65, 2003.