Indefinable Genes and the “Wild West” Genomic Landscape

This article is supplemental to “Genes and the Central Fallacy of Evolutionary Theory”, and can best be read in conjunction with that essay. Original publication of this article: February 28, 2013. Date of last revision: February 28, 2013. Copyright 2013 The Nature Institute. All rights reserved.

By placing your cursor on many scientific terms such as “nucleotide base” (try it here), you may find them to be clickable links into a separate glossary window (or tab, if your browser is set that way). You can in any case open the glossary for browsing by clicking here.

The idea that discrete, stable, reliably replicated genes are the essential heritable material, making up the organism’s contribution to evolutionary change, becomes difficult to defend if we cannot find any functional genomic entities that are discrete, stable, and reliably replicated. And we cannot. This is true whether we are looking for genes or any other functional units of the genome.

For a good while the concept of the gene has been on life support. Writing in 2000 as the Human Genome Project was drawing toward completion, MIT science historian Evelyn Fox Keller foresaw that “the primacy of the gene as the core explanatory concept of biological structure and function is more a feature of the twentieth century than it will be of the twenty first”. She added: “The prowess of new analytical techniques in molecular biology and the sheer weight of the findings they have enabled have brought the concept of the gene to the verge of collapse” (Keller 2000, pp. 9, 69*).

The formal pronouncement of death, as we will see shortly, seems to have occurred with the late 2012 release of results from Project ENCODE (“Encyclopedia of DNA Elements”), published in several of the leading scientific journals.

It was once thought that a chromosome’s DNA consisted primarily of neatly arranged protein-coding genes. These genes were transcribed (by certain enzymes) from one of the two strands (the “template” strand) of the double helix, producing messenger RNAs, which in turn were translated into protein.

Eventually it was realized that the conventional, protein-coding sequences constituted only about 1.5 percent of the human genome, and yet most of the rest of the genome is in fact transcribed. And not just transcribed, but transcribed “every which way”, resulting in a “Wild West landscape” of transcription (Lee 2012*). It turns out that both strands of the double helix are transcribed, even though the strand sequence opposite to a protein-coding sequence could not reasonably be expected to produce a protein-coding RNA. In general, transcription for production of proteins is only a small part of the total transcription that takes place within the human genome.

What else is transcribed?

Extensive intronic tracts within most genes — tracts whose RNA transcripts are discarded from the final, protein-coding RNA, but that are now known to have a large range of functions in their own right.
Vast intergenic regions — regions between genes or gene clusters.
Pseudogenes — genes that are thought to have lost their primary functions, but that are now being found to fulfill various roles in gene regulation.
Retrotransposons — DNA sequences that can be duplicated and inserted at new locations in the genome.

There are, in addition, numerous classes of small RNAs, some transcribed from the coding and noncoding portions of traditional genes, some from regulatory loci such as promoters and enhancers, some from repetitive DNA, and some from just about every other kind of genomic real estate. These small RNAs are now being found to have wide-ranging functions in the cell, especially as regulators of gene expression.

Transcription of a given gene may start from different places, may end in the middle of the gene, may continue past the end for an indefinite distance, and may occur simultaneously from a given promoter in opposite directions along the two strands of the double helix. Remarkably, “an average of 10 transcription units, the vast majority of which make long noncoding RNAs, may overlap each traditional coding gene” (Lee 2012*). And then there is this:

Beyond the linear organization of genes and transcripts on chromosomes lies a more complex (and still poorly understood) network of chromosome loops and twists through which promoters and more distal elements, such as enhancers, can communicate their regulatory information to each other. ... [These] findings begin to overturn the long-held (and probably oversimplified) prediction that the regulation of a gene is dominated by its proximity to the closest regulatory elements. (Ecker, Bickmore, Barroso et al.*)

In other words, what constitutes a functional element of the genome is not defined merely by a local sequence, whether faithfully replicated or not, but by an active genomic and cellular context. As a result, “even seemingly simple gene structures may be hiding an astonishing variety of transcript forms” (Stamatoyannopoulos 2012*).

We come, then, to the “pronouncement of death” mentioned above. It was repeated in a number of Project ENCODE papers, generally in somewhat tortured form, typified by the following quotations:

Although the gene has conventionally been viewed as the fundamental unit of genomic organization, on the basis of ENCODE data it is now compellingly argued that this unit is not the gene but rather the transcript. On this view, genes represent a higher-order framework around which individual transcripts coalesce, creating a polyfunctional entity that assumes different forms under different cellular states, guided by differential utilization of regulatory DNA”. (Stamatoyannopoulos 2012*).

The assignation of a gene is based on functional data superimposed on a physical region of the genome, which can have multiple and complex functional elements operating under distinct conditions or in distinct biological contexts. (Chanock 2012*)

We would propose that the transcript be considered as the basic atomic unit of inheritance. Concomitantly, the term gene would then denote a higher-order concept intended to capture all those transcripts (eventually divorced from their genomic locations) that contribute to a given phenotypic trait. [With further work we can expect to see] the overlap of most genes previously assumed to be distinct genetic loci. This supports and is consistent with earlier observations of a highly interleaved transcribed genome, but more importantly, prompts the reconsideration of the definition of a gene. (Djebali et al. 2012*)

That is all fine, but a few things require noting:

The “polyfunctional entity that assumes different forms under different cellular states” is obviously not an entity at all. The phrasing abstracts from an almost infinitely complex, functionally interrelated set of activities that penetrate to the furthest reaches of the cell and organism, and then reifies the abstraction into an entity.
It doesn’t really need arguing that RNA transcripts, with their unbounded potential for variation, regulation, and molecular association — all bearing on function — are not “atomic” elements at all. And if we claim them as the “basic units of inheritance”, then we cast aside the conventionally asserted requirements for evolutionarily significant inheritance cited in the main article (requirements for, among other things, generation-to-generation stability and digital reliability of replication).
It is impossible to enumerate “all those transcripts that contribute to a given phenotypic trait”¹. This follows immediately from the integral, interwoven character of the organism, and is one of those “lessons of context” that nearly every article in the contemporary literature of molecular biology serves to reinforce. And if we cannot enumerate the transcripts contributing to the proposed “higher-order” redefinition of the gene, then how much is that redefinition worth?
In any case, RNA transcripts are not “functional units of genomic organization” at all, because they are not parts of the genome — certainly not parts of the genome that evolutionary biologists speak of as being replicated and slowly mutated in order to produce step-by-step change down through the generations. RNA transcripts are products of elaborate biological activity — activity through which the cell can synthesize up to thousands of different transcripts (and therefore thousands of different proteins) from the same DNA sequence.

We cannot help asking, then: What is the stable, reliably replicated, and functionally significant unit of inheritance that evolutionary theory has demanded for most of the past century? What fixed substance lends itself to accurate replication and to step-by-step modification in any biologically significant (functional) sense? Yes, there are stretches of the DNA sequence that get replicated and passed on with good fidelity for some number of generations. But what the organism makes of those rather arbitrary and indefinable (because functionally plastic) stretches depends on the coherent activity of the organism as a whole. Without the coordinated evolution of this entire activity, changes in isolated sequences could not mean anything at all.

Notes

1. The usual procedure for relating a DNA sequence to a given trait involves modifying the sequence and checking for any readily observable differences in the trait. Regarding this “method of differences”, see the supplemental article, ”Missing Heritability — Or Whole-Organism Inheritance?”

References

Chanock, Stephen (2012). “Toward Mapping the Biology of the Genome”, Genome Research vol. 22, pp. 1612-5. doi:10.1101/gr.144980.112

Djebali, Sarah, Carrie A. Davis, Angelika Merkel et al. (2012a). “Landscape of Transcription in Human Cells”, Nature vol. 489 (Sep. 6), pp. 101-8. doi:10.1038/nature11233

Ecker, Joseph R., Wendy A. Bickmore, Inês Barroso et al. (2012). “ENCODE Explained”, Nature vol. 489 (Sep. 6), pp. 52-5. doi:10.1038/489052a

Keller, Evelyn Fox (2000). The Century of the Gene. Cambridge MA: Harvard University Press.

Lee, Jeannie T. (2012). “Epigenetic Regulation by Long Noncoding RNAs”, Science vol. 338 (Dec. 14), pp. 1435-9. doi:10.1126/science.1231776

Stamatoyannopoulos, John A. (2012a). “What Does Our Genome Encode?” Genome Research vol. 22, pp. 1602-11. doi:10.1101/gr.146506.112

Steve Talbott :: Indefinable Genes and the “Wild West” Genomic Landscape