—
—
— Places
- Topical index and search
  Best way to find most things
- Biology Worthy of Life: main page
  Bringing biology back to life
- Book: Organisms and Their Evolution
  Let’s take living beings seriously
- List of all major articles since 2009
  With descriptions of the articles’ content.
- Latest news
  News of interest to readers of this site
- Primary Nature Institute website

Biology Worthy of Life
An experiment in revivifying biology

Supplemental Text

Look What’s Happened to Genetic Synonyms!

This article is supplemental to “The Poverty of the Instructed Organism: Are You and Your Cells Programmed?”, and should be read in conjunction with that essay. Both pieces are part of the Biology Worthy of Life project. Original publication of this article: October 24, 2013. Date of last revision (material added): November 4, 2013. Copyright 2013 The Nature Institute. All rights reserved.

By placing your cursor on many scientific terms such as “genome” (try it here), you may find them to be clickable links into a separate glossary window (or tab, if your browser is set that way). You can in any case open the glossary for browsing by clicking here.

Tags: DNA; DNA/junk; form/molecular; gene regulation; machine idea/code; translation

We highly recommend that you first read the brief preface and commentary relating to this article, entitled Why Genetic Synonyms Are Not Synonymous.

Contents

Introduction

The supposed redundancy of the genetic code

The genetic code, which maps successive elements (“codons”) of protein-coding DNA to successive amino acids of a protein, has been called “redundant” and “inefficient” because multiple codons represent the same amino acid. Such codons are said to be synonymous.

The unique voices of synonymous codons

But now it’s being found that synonymous codons are far from synonymous. They play wide-ranging and profound roles in the regulation of gene expression and protein activity.

A distortion of biological thinking

The code-centered view of DNA is an extreme simplification and misrepresentation, perhaps best exemplified in the thinking of Richard Dawkins. His take on the matter is here shown to be root-and-branch false.

Era of the code: sweeping away “messy molecular complexities”

The history of the genetic code reveals the strong sway of clean logic over the mind of the biologist — and the ability of this logic to put the lives of organisms out of sight.

A code, or a living speech?

The kind of code, or computer program, that has been projected onto DNA is a highly constricted and abstract — you might almost say “artificial” — manifestation of human thought. It would be much more revealing of the organism if we were to think, not of this constrained, computer-adapted expression of our minds, but rather of the full-bodied, rich-textured, many-layered character of human speech.

rule

The structure of a protein has a great deal to do with its function. Structure and function, as biologists now have good reason to recognize, always express themselves as dynamic, interweaving form, not an interaction of machine parts. But the habits of thought inherited by contemporary biologists are another matter.

Protein structure was long thought to be rigidly dictated by the comprised sequence of amino acids, and the sequence of amino acids in a protein was in turn thought to be specified by the DNA sequence, or gene, “coding for” that protein. Of course, how one looks determines what one sees, and during the heyday of the genetic code (see accompanying box) biologists were looking not so much for the play of living form as for elements of logic and computation. What they chose to see as a logically cut-and-dried relation between “letters” of the DNA sequence and the succession of amino acids in a protein led to the notion that DNA holds the information or program governing the organism as a whole.

The “Genetic Code”

An organism’s genome consists, speaking approximately¹, of four distinct “letters” (nucleotide bases — about 6.4 billion of them) strung out in innumerable combinations. The successive letters of a protein-coding DNA sequence, or gene — when those letters are grouped in threes — correspond more or less closely to the successive amino acids in a protein. The DNA triplets are known as “codons”, and are fundamental elements of the so-called “genetic code”. A protein may consist of up to thousands of amino acids encoded in this way, with one three-letter codon in the DNA sequence representing one amino acid in the protein sequence.

An additional participant among many in the endlessly complex processes leading from DNA to protein is an intermediary molecule known as “messenger RNA”, or “mRNA”, which possesses a triplet-encoding derived from and mirroring that of the DNA sequence. In the standard terminology, DNA is transcribed (by an enzyme complex) into RNA, and RNA is translated (by one of the most elaborate molecular complexes in the body, the ribosome) into protein.

Pure, abstract sequence, somehow decontaminated of all complex molecular variation, was thought to be everything. This then enabled biologists to analogize the genetic code to the bits and bytes of computer code. The latter consists of a digital sequence of “zeros” and “ones” carefully engineered so as to be conceivable in complete abstraction from the details of the physical medium, whether transistor or magnetic tape or pitted optical disk.

Not many stopped to ask whether the organism — eating, metabolizing, excreting, growing, mating, moving about on earth, participating in diurnal and seasonal rhythms, enduring the extremes of weather and other environmental challenges — could possibly be understood at the molecular level in such radical detachment from its physical becoming, as if the forms of its bodily existence were as incidental as the choice between magnetic tape and optical disk.

The classical view of the relation between a DNA sequence (gene) and its corresponding protein was crisply stated by Richard Dawkins in his bestselling book, The Blind Watchmaker*. The chain of amino acids in a protein, Dawkins wrote, commonly coils up into a kind of complex knot, “the precise shape of which is determined by the order of amino acids. This knot shape therefore never varies for any given sequence of amino acids”.

The sequence of amino acids in turn is precisely determined by the code symbols [codons] in a length of DNA. ... There is a sense, therefore, in which the three-dimensional coiled shape of a protein is determined by the one-dimensional sequence of code symbols in the DNA (p. 171).

With his computer-influenced mindset, Dawkins went on to claim that “the whole translation, from strictly sequential DNA ROM [read-only memory] to precisely invariant three-dimensional protein shape, is a remarkable feat of digital information technology”.

Yes, it would indeed be remarkable — and remarkably machine-like — if this were true. But it is not true². In fact, the deterministic relations described by Dawkins are today known to be so egregiously false that I can only assume (never having checked) that by now he readily acknowledges as much himself. But the kind of thinking at work in the mistake remains powerful, with disastrous results for biology. I would like to say something further here to characterize that mistake — a mistake that amounts essentially to an abandonment of the organism in favor of historically recent computer-thinking. And of the numberless pathways through the cell and organism one might choose to illustrate the point, I will follow just one.

The supposed redundancy of the genetic code. The protein-coding portions of a genome present us with 64 distinct codons (see box), whereas there are, in general, only 20 different amino acids. So it’s possible — and is actually the case — that multiple codons specify the same amino acid. As many as six different codons correlate with — or code for, as the jargon goes — a single amino acid.

That’s where the problem of synonymous codons comes in: when multiple codons map to the same amino acid, they are — or rather were, under the dominant coding metaphor — considered equivalent. After all, this mapping of DNA codons to amino acids is precisely what the genetic code was supposed to achieve.

Given their apparently meaningless differences, synonymous codons and the synonymous mutations giving rise to them received little attention for many years. If, as we heard Dawkins claim, the sequence of amino acids in a protein determines its function, and if multiple codons specify the same amino acid, then what difference could it make which synonym happens to be employed in a particular case? And what would it matter if a mutation changed a given codon into a synonymous one? Since the genetic code would remain unchanged, such mutations were termed “silent”. Likewise when a synonymous substitution resulted from the cell’s editing of mRNA molecules.

Note how abstract and airy this informational view of DNA and RNA had become. The fluid patterns of interpenetrating molecular forces that somehow “carried” the code hardly mattered, and were often said to be arbitrary; many other molecules would have done just as well. Everything depended only on an abstract iteration of four digital “letters” whose dynamically nuanced patterns of form and force and function in specific circumstances could hardly interfere (so it was thought) with their unambiguous coding role. The same reduction of form to a kind of code was applied to many proteins, whose functioning was reduced to the yes-or-no, zero-or-one mechanical fit of lock and key (to use what was long the dominant terminology).

The unique voices of synonymous codons. But the story has been dramatically changing over the past few years — a change you can already hear in article titles such as “Breaking the Silence”, “A Hidden Genetic Code”, and “Sounds of Silence: Synonymous Nucleotides as a Key to Biological Regulation and Complexity”. Synonymous codons, it turns out, are not silent at all. They can speak with distinctive voices in the organism, performing in radically non-synonymous ways. Why? Because the genetic code does not begin to capture the full reality of the cell’s living interactions with a DNA sequence. As one group of researchers has summarized it:

Emerging evidence shows that “silent” substitutions carry a wealth of information, which is written over the encoded amino acid sequence, and that this information can be used to regulate translation speed, protein homeostasis, metabolic fate, and even post-translational modifications. (Shabalina et al. 2013*)

Allow me to explain. (Readers should feel free to skim through or pass over the more technical remarks in the following bullet items.)

bullet Switching between synonymous codons can make a huge difference in the amount of protein produced in a cell, and therefore in the functional effect of that protein. In one experiment, varying a bacterial gene in many different (but synonymous) ways, produced a 250-fold difference between the highest- and lowest-expressing forms of the “same” gene (Kudla et al. 2009*). Another team of investigators has shown that “the use of particular codons can increase the expression of a gene by more than 1000-fold” (cited in Novoa and de Pouplana 2012*).

bullet As with virtually everything in the organism, the influences of synonymous codons are context-specific. Harvard molecular biologists recently compared synthesis of a particular protein in bacteria (Escherichia coli) that lived in nutrient-rich and nutrient-poor environments. When the nutrients (amino acids) were abundant, synonymous variants of the gene under study behaved similarly. But when the environment became nutrient-poor, the researchers found up to a 100-fold difference in protein synthesis associated with different synonymous codons (Subramaniam et al. 2013*).

Under stressed conditions, not only was protein synthesis from certain mRNAs greatly accelerated, but synthesis of whole classes of other proteins [with different “biases” in use of synonymous codons] was shut down, thus preserving critical resources. The study suggests that organisms exploit differences between synonymous codons as “a general strategy to adapt protein synthesis to their environment”.

The lead writer of the paper commented for a Harvard University news release that “many researchers have tried to determine whether using different codons affects protein levels, but no one had thought that maybe you need to look at it under the right conditions to see this”. In other words: context matters, a fact that apparently came as a revelation to many.

bullet A number of effects associated with the distinct forms of synonymous codons are mediated by differences in the folding of the mRNAs. The dynamic, mutually adjusting conformation and sculptural interaction of complex molecules such as DNA, RNA, and protein are now being intensely explored by molecular biologists, and their findings reveal living processes to be, in a straightforward meaning of the term, more like a dance than the meshing of mechanical parts.

In the case at hand, an important group of effects has to do with the way mRNA folds under the effect of synonymous codon choice, and then, with its given folds, interacts with the intricately responsive structure of the ribosome. This ribosome — a molecular complex that employs mRNAs for the production of protein — consists in humans of four preprocessed ribosomal RNAs, more than seventy proteins, and over two hundred other factors.

The detailed interplay of parts among this ribosomal complex, the messenger RNA it is currently translating, and the various regulatory proteins and other factors associated with that mRNA — all in the interest of producing a protein in the right place, with the right protein folding structure (see immediately below), and capable of fulfilling a needed function — beggars our current powers of imagination. (This assumes, of course, that we even try to imagine the interweaving forms involved, as opposed to pursuing a simplistic machine-logic³.)

bullet The substitution of one synonymous codon for another together with any resulting changes in mRNA conformation can, strikingly, also affect the folding of the protein produced from the mRNA. Even more strikingly, such substitutions can influence the so-called “post-translational modifications” whereby the protein is conjoined with various, often small, molecular groups. Both the protein-folding and the chemical modifications can alter and even completely redirect the functioning of a protein. Synonymous codons can exert such influence by affecting the speed and pausing of the translation process, both of which in turn bear on the folding and modification of the protein — processes that can occur simultaneously with translation.

bullet A synonymous codon can change the sites where microRNAs bind to messenger RNAs. These very small microRNA regulatory molecules (roughly twenty-two “letters” in length) play a vast and sprawling role throughout the cell in RNA degradation, repression of translation, and in the tuning and balancing of protein expression among collections of distinct mRNA molecules. A single microRNA may be capable of modifying the expression of hundreds of mRNAs, and so these small molecules help to regulate a major part of the genome. microRNAs also figure prominently in many diseases, including cancer. And it is now becoming clear that differences among synonymous codons in turn help to regulate microRNAs by altering microRNA binding sites (and immediately flanking regions) on mRNA molecules.

This is a huge topic worthy of a lengthy article in its own right. (Here is where you can skim through a survey of the diverse roles of microRNAs in the life of the organism.)

bullet Again, a synonymous codon can alter binding sites for the various factors involved in RNA splicing. It is through such splicing that the cell stitches together up to thousands of different proteins from a single gene. There is virtually nothing, from embryonic development to brain functioning, that is not affected by RNA splicing. (Read the notes here and here for an overview of the stunning array of regulatory processes involving RNA splicing.)

bullet Then there is the matter of disease. “Upwards of 50 disorders — including depression, schizophrenia, multiple cancers, cystic fibrosis and Crohn’s disease — have now been linked to synonymous mutations. ... In one recent inspection of more than 2,000 human genome studies, for example, a team from Stanford University School of Medicine in California found that synonymous mutations were just as likely as nonsynonymous ones to play a part in disease mechanisms” (emphasis added). So much for the decisive, all-important role of the genetic code, traditionally understood! “At the moment”, according to Laurence Hurst at the University of Bath in the UK, “we are discovering the major mechanisms by which synonymous mutations can be associated with disease. And they are vastly more diverse than people thought” (Katsnelson 2013*).

bullet Also relating to health: when pharmaceutical companies introduce a gene from one organism into a different (host) organism in order to produce a novel protein in the host, they typically try to “tune” the gene for maximal production by recoding it so as to employ codons favored by maximally expressed genes in the host. “The results, however, are notoriously hit or miss. ... Lots of people play around with silent positions in the genome, but nobody has derived a clear-cut system for predicting expression. ... Fiddling with codons in therapeutic proteins could have unpredictable effects on people’s health. ‘From our experiments now, we do not believe that you can do that to any protein and have the protein behave as it did in its native form’, says Chava Kimchi-Sarfaty” of the Food and Drug Administration (Katsnelson 2013*).

bullet There is more, but one investigator sums up the matter this way: the choice of synonymous codons is “related to many biological processes, such as DNA stability, nucleosome positioning, mRNA stability, mRNA splicing, nonsense mediated mRNA decay, translation initiation, translation elongation and co-translational protein folding” (Gu et al. 2013*) — and much else.

The major caveat to all the foregoing is that (as with so much current research in molecular biology) the situation is rapidly becoming more complex, with many factors involved that I have not mentioned here⁴. Cause-and-effect relations become ever muddier. It all underscores a truth we find illustrated on every hand: organisms display something like the principles of coherence we see in a series of moving images or in the progressive unfolding of the content of speech. The shifting, unstable causal relations depend from this primary reality, much as the lawful muscular activity of the vocal apparatus is more naturally seen as a result of the meanings being expressed than a cause of them.

I note in passing that, while a great deal is being learned about the effect of synonymous codons upon the dynamic form and functioning of both RNA molecules and proteins, rather less attention has been directed to their effect upon DNA. This seems surprising. Given what we’ve been learning about the myriad ways the cell engages the inert, nearly crystalline structure of DNA in order to incorporate it into the life of the organism, and given all the ways researchers have been characterizing the “ballet” and “choreography” of chromosomes in the three-dimensional space of the nucleus — a choreography that can be altered by the cell’s response to a single “letter” of the DNA sequence — it is safe to predict that the coming years will yield many revelations about the role of synonymous codons in shaping the functional performance of the chromosome.

A distortion of biological thinking. In any case, synonymous codons afford us insight into some of what’s gone wrong in biology. Take, for example, what we heard from Richard Dawkins. It is all wrong:

The “precise shape” of a protein, you will recall, was, in Dawkins’ view, “determined by the order of amino acids”, and this shape “never varies”. A less true statement could hardly be made. A great deal of the literature on proteins today has to do with all the ways the shape of particular proteins change, together with the timing or rhythm of the changes and their functional implications. And, as we have seen, the form of a protein can vary simply as a result of the choice of synonymous codons — a choice that doesn’t alter the amino acid sequence at all. Other influences not discussed here can come to bear upon protein shape as well.
Moreover, many proteins remain wholly or in part “unstructured” or “disordered”, with the ability to adopt a particular shape at need — and this flexibility is particularly important for the most important, most widely interacting “hub” proteins. Proteins, as I have remarked before, are the true shape-changers of the cell, responding and adapting to an ever-varying context — so much so that the “same” proteins with the same amino acid sequences can, in different environments, “be viewed as totally different molecules” with distinct physical and chemical properties (Rothman 2002, p. 265*).
Dawkins: “The sequence of amino acids in turn is precisely determined by the code symbols [codons] in a length of DNA”. Completely false. It is routinely understood today that RNA splicing (to take just a single process) commonly results in many different amino acid sequences — that is, many different proteins — despite the fact that these proteins derive from a single DNA sequence. The different, splicing-derived proteins are now known to affect nearly everything going on in the organism, and the lack of splicing, or improper splicing, in humans can make the difference between life and death.
Dawkins: “The whole translation, from strictly sequential DNA ROM [read-only memory] to precisely invariant three-dimensional protein shape, is a remarkable feat of digital information technology”. Since, as we have just seen, the strict, code-like aspects of “the whole translation” he refers to have vanished from biological understanding, his summary statement here is left hanging in mid-air. Further, the insistence on a digital code, which strongly infects Dawkins’ thinking to this day, shows how far he lags behind the amazing work over the past decade or two on the plastic and dynamic form of chromosomes as a decisive factor in gene expression.

Dawkins’ individual peculiarities aside, this kind of code-centered, machine- and computer-like thinking has dominated biology for many decades, and all the more during the era of molecular biology. It retains its hold upon biologists’ minds even today in the face of overwhelming counter-evidence — evidence ranging from the way physicists actually conceive the foundations of the molecular world, to the news now flooding out of molecular biological laboratories. This persistence of untenable thinking is sobering for anyone naïve enough to hope for more or less continuous scientific progress, represented not by the amassing of data, but by deeper understanding.

Era of the code: sweeping away “messy molecular complexities”. If you want to understand how the notion of a precise and unambiguous genetic code has subverted biological understanding, a good place to begin is with the origin of the modern idea of the code. The story has often been told how, after the 1953 discovery of the double helix with its iterated “letters” lining up (apparently) in perfect marching order along its spiraling length, many of our brightest scientists worked for the next decade or so at deciphering the code. How, they wanted to know, did a sequence of “letters” in DNA map to a sequence of amino acids in a protein?

A perfectly reasonable inquiry, if taken with real biological substance in mind. Not so reasonable when one felt compelled to reduce the biology of DNA and proteins to an “informational” puzzle in the narrow, computational sense of that term. In the latter case, the redundancy of 64 three-letter codons relative to just 20 amino acids became a pressing problem. A two-letter code, yielding 16 codons, would not be enough; a three-letter code seemed far too much, and therefore inefficient. A considerable time was wasted trying to come up with an elegant “mapping” that could make an engineer proud.

But none of those efforts succeeded. The relationships between DNA and protein were eventually discovered, not by the coding theorists, but by experimental biochemists. There proved to be 64 codons, 61 of which “coded for” 20 amino acids. (There are 3 “stop” codons, which, in an mRNA, normally terminate the translation into protein.) A good number of amino acids, as we have seen, were represented by multiple codons.

Looking back at that eventful, code-breaking decade, Brian Hayes, one-time editor of American Scientist, wrote:

It was hard not to feel a twinge of regret on coming to the end of the story and learning the right answer. Compared with the elegant inventions of the theorists, nature’s code seemed a bit of a kludge. (Hayes 1998*)

A “kludge” is computerese for “an inelegant, messy, quick-and-dirty way of making something work”. So why did the biological reality look like an unappealing kludge? Because of such things as the occurrence of synonymous codons. As a venture in computer programming, the cell’s efforts were disappointingly redundant (or, as engineers like to put it, the code was “degenerate”). Yet all this could have been dismaying only for scientists who wanted a sharp-edged logical clarity instead of the fluid, living transactions now being characterized in every field of molecular biological research.

On his part, Hayes seemed slightly at odds with itself. Immediately before offering the judgment just quoted, he said this:

What fascinated me about the code-breaking effort was how quickly a biochemical puzzle — the relation between DNA structure and protein structure — was reduced to an abstract problem in symbol manipulation. Within a few months, all the messy molecular complexities were swept away, and the goal was understood to be a mathematical mapping between messages in two different alphabets. ... The proposed solutions were judged largely by the criteria of information theory. Efficient storage and transmission of information seemed all-important. The coding theorists were trying to learn the language of the genes, but might as well have been designing a communications protocol for a computer network.

Hayes also quotes Francis Crick, co-discoverer of the double helix, as saying that the importance of much of the coding work “was that it was really an abstract theory of coding, and was not cluttered up by a lot of unnecessary chemical details”. Well, if Crick were still with us today and capable of appreciating the last decade’s work in molecular biology, presumably at least some of his peers would be saying, “Welcome to the details!” They would have good reason also to say (but probably would not), “Let’s try to recover our bearings after the last decades of badly misdirected effort”.

Hayes, himself, was writing in part to show the pitfalls when neat theorizing outruns scientific observation. “The code resembled none of the theoretical notions”. Nevertheless, writing in 1998, while the Human Genome Project was at full throttle, he still considered the essence of DNA to be, with minor qualifications, a “redundant” genetic code (those unnecessary synonymous codons!), adding that “No one could have guessed the awful truth — that nature is wildly profligate, that genomes are stuffed with gobs of ‘junk DNA,’ that storage efficiency just doesn’t seem to be an issue ...”

The “awful truth”, it now turns out, is the living, holistic, integral vitality of the organism — which can disappoint only when one is still holding up the information-processing machine, with its one-dimensional code, as the ideal of life.

We do not lack “storage efficiency” in DNA and all the other elements of the cell. Rather, we have what may seem to be an impossible over-abundance of efficient process. If we want to think about efficiency in a more fitting sense, we might consider the expressive potential of a dance troupe — so long as we do not reduce our understanding of individual movements to a few discrete, robot-like digital alternatives. (That kind of reduction is the irresistible temptation following upon “information” talk.) Everywhere in the organism we see metamorphosing form — “form forming”.

The truth of the matter is tending more and more to leak out, even if, for now, in safe metaphors that are immediately contradicted by an accompanying invocation (as in the following quotation) of such things as circuits and switches:

Early X-ray structures of RNA contained indications of the importance of conformational dynamics ... However, no one could have anticipated the existence of new genetic circuits that are based on RNA conformational switches, or that the ‘acrobatic’ nature of a biopolymer [that is, RNA] that consists of only four chemically similar nucleotides would be at the centre of a complex macromolecular structure such as the ribosome”. (Dethoff et al. 2012*)

A code, or living speech? Today, fifteen years after Hayes wrote his retrospective, references to codes still assault us from all sides, even though the meaning of what’s being described has nothing to do with code in any conventional sense. What we really see in cellular activity is significant form and transformation. And no doubt this is what biologists are half-painfully, half-excitedly working toward when they speak incongruously (with synonymous codons and much else in mind) about additional “codes” somehow being overlaid upon or overlapping, the standard genetic code, both in DNA and RNA.

The number of these novel codes has been mushrooming of late. Taking the case of DNA, there are codes, or, as is sometimes said, DNA-embedded “signals”, influencing DNA methylation, nucleosome positioning, the orientation of DNA on nucleosomes, the bendability of DNA, the separability of the two strands of the double helix, the distribution of chromosomes in the cell nucleus, the electrical characteristics of local regions of the chromosome, the binding of transcription factors to DNA, the organization of DNA into topological and other domains at different scales, the formation of open or closed chromatin, RNA splicing in its many aspects, the folding of both RNA and protein, the localization of RNA within the cell, and so on virtually without end.

And everything affects what given stretches of DNA mean — including how that once all-important (but now, we can see, simplistically conceived) genetic code is actually “read” and given meaning within its hugely complex context⁵.

It’s fine to speak of DNA “signals”, but the frequent interpretation of these as constituting various codes, or as one-way directives by which DNA controls the organism, makes no sense at all. DNA does not bark out orders. The DNA-embedded signals are completely interwoven with, and dependent upon, the cellular processes that must decide what to do with the encountered features of DNA. And such decisions always depend on context. It is no arbitrary metaphor to say that the resulting engagement is more sculptural than digital.

Whether we speak of many computer-like codes or just one, we badly misrepresent the organism. Can an integrated circuit function in part by changing the physical form of one or more of its transistors, and does this cause neighboring transistors to morph into new shapes? Is the overall performance of the electronics a function of widespread physical acrobatics rippling across and re-shaping the entire, billion-element circuit? Not at all. But this is what we see when we look at the physically gesturing chromosome, and something similar occurs with RNA and protein molecules.

What should we make of a code that has (as we heard above) all kinds of other information written over it? We might begin by tossing out computationally compromised terms such as “code” and “information”. I suggest this framework as one pathway for understanding: Instead of code, think of the regular and fairly reliable grammatical structure we can identify in proper speech. Then there are the many-dimensioned meanings — I would not say “that are overlaid upon the grammar”, but rather “from which we can abstract a more or less precise grammar”. The meanings must first be given in all their interpenetrating layers of denotation and connotation, or else there is no grammar to abstract. You can get from the meanings to the grammar, but you can never get from an empty grammatical structure to meanings. And no more can you get from an abstract genetic code to the significance of organic processes.

In other words, rather than taking the human software engineer as our model, I am suggesting it would be far better, and far closer to the richness of our lived experience, to imagine the human speaker. We can try to abstract from the living stream of speech certain dictionary-defined and unambiguous elements — and, within limits, the always incomplete effort serves useful purposes. But real speech shows how artificial is any attempt to carry that work of abstraction very far.

It was not I, but some of the greatest biologists, who first pressed the analogy between gene expression and conscious human expression. They did so by appealing to the thought constructs of the programmer. But what tells us we should restrict the analogy to a particularly abstruse, cramped, and historically late-developing capacity of human expression? You might think that our earlier and more general capacities for expression would be closer to the universal language of life.

Basic human speech is, in fact, a thick tapestry formed from many simultaneous meanings, connotations, and contextual influences, one participating in the other. The physical gesturings of sound emanating from the speaker’s throat are infinitely and subtly expressive, much as the gesturings of our chromosomes, which in turn belong to the essential language of gene expression. And while the idea of numerous overlaid codes of radically different sorts, or codes embodied in continually metamorphosing substance, is difficult to make any sense of, speech gives us exactly that sort of fullness.

With this in mind, we will find that the living, sculpting, forming and reforming activities at the molecular level of our cells speak for themselves. This remains true, moreover, even when we look at those aspects of the genetic code — such as “equivalent” or “synonymous” codons — that were not supposed to make any difference at all.

Notes

1. I say “approximately” because as soon as you pay attention to the actual chemistry, let alone the biology, of these letters, you see that many sorts of distinction need to be made. These are always, in one way or another, distinctions of form, and this article has to do with a few examples of such form. As a gentle reminder of the kind of thing we are dealing with in molecules, you may wish to look to look at the sketch provided by the twentieth-century cell biologist, Paul Weiss. It is also worth remembering that various chemical modifications such as DNA methylation, as well as the infinitely varying structure of chromatin, all contribute to giving the individual DNA “letter” its particular significance in its immediate context.

2. I have previously, if briefly, noted the shortcomings in these statements by Dawkins. See also below.

3. You can browse a brief summary of the various significances of mRNA folding in my collection of research notes, How the Organism Decides What to Make of Its Genes. Among other things, the folding of an RNA molecule can affect its stability and prospects for degradation.

4. For example, one of the most exciting areas of research, just now coming into focus, has to do, first, with transfer RNA (tRNA) abundances in relation to the biased use of synonymous_codons, and, second, with post-translational modifications of those tRNAs. These factors appear capable of modifying the organism’s responses in tissue-specific or condition-specific ways (Novoa and de Pouplana 2012*).

5. The same multilayered system of “codes” or “signals” (or, in the quote below, embedded “punctuation”) is, of course, found in RNA. Here’s what Shabalina et al., in their paper on synonymous nucleotides, have to say:

Synonymous nucleotide positions are essential for the maintenance and function of diverse regulatory signals located in the protein coding regions [of mRNA]. There are several levels of punctuation complexity and biological signals encoded by mRNAs. A prominent punctuation signal is periodic pattern of RNA secondary structure, which provides for a more ordered and stable structure of transcripts in the protein-coding regions and may also support maintenance of the reading frame during translation. This basic pattern is overlaid by stable conserved RNA secondary structure elements that may cause translation pausing or stalling. The functional significance of synonymous positions for the maintenance of local stable RNA structures, which are crucial for protein regulation of expression, is well recognized, especially at the initiation of translation. These stable conserved folding elements, the second class of mRNA punctuation elements, could affect translation and, ultimately, the protein structure and function, whereas higher-order RNA structures may directly define protein folding, especially at domain junctions. ... The third class of RNA punctuation signals are sites of intermolecular interactions providing, for example, regulation of translation ..., splicing sites, and microRNA target sites. Ribosome pausing or stalling, caused by the secondary structures, of messenger RNA or mRNA hybridization to rRNA, can affect a variety of co-translational processes, including protein folding and targeting. Direct base-pairing of mRNAs to rRNA clinger sites within ribosomes may function as upregulating and downregulating elements, providing an additional mechanism of translational control. Most of these diverse RNA punctuation signals exist in both prokaryotes and eukaryotes, and enrich regulation of the translation efficiency and protein folding.

The extraordinary complexity of transcriptomes that underpins the structural and functional diversity of mammalian proteomes is created by alternative splicing and transcription with the use of distinct types of RNA splicing and regulatory control elements. Synonymous codon positions allow further diversification of intra- and intermolecular mRNA hybridization affinity, creating previously unrecognized patterns of RNA punctuation and hidden language of mRNA–microRNA cross-talk, characteristic for the higher eukaryotes and responsible for the regulation of the biological complexity, tissue-specific and condition-specific expression. (Shabalina et al. 2013*)

Tags: DNA; DNA/junk; form/molecular; gene regulation; machine idea/code; translation

Sources: Dawkins, Richard (1996). The Blind Watchmaker, second edition. New York: W. W. Norton. First edition published in 1986.

Dethoff, Elizabeth A., Jeetender Chugh, Anthony M. Mustoe and Hashim M. Al-Hashimi (2012). “Functional Complexity and Regulation through RNA Dynamics”, Nature vol. 482 (Feb. 16), pp. 322-30. doi:10.1038/nature10885

Gu, Wanjun, Xiaofei Wang, Chuanying Zhai et al. (2013). “Biological Basis of miRNA Action When Their Targets Are Located in Human Protein Coding Region”, PLoS ONE vol. 8, no. 5 (e63403). doi:10.1371/journal.pone.0063403

Hayes, Brian (1998). “The Invention of the Genetic Code”, American Scientist vol. 86, no. 1 (Jan. – Feb.), pp. 8-14.

Katsnelson, Alla (2011). “Breaking the Silence”, Nature Medicine vol. 17, no. 12 (Dec.), pp. 1536-8. doi:10.1038/nm1211-1536

Kudla, Grzegorz, Andrew W. Murray, David Tollervey and Joshua B. Plotkin (2009). “Coding-sequence Determinants of Gene Expression in Escherichia coli”, Science vol. 324 (April 10), pp. 255-8. doi:10.1126/science.1170160

Novoa, Eva Maria and Lluís Ribas de Pouplana (2012). “Speeding with Control: Codon Usage, tRNAs, and Ribosomes”, Trends in Genetics vol. 28, no. 11 (Nov.), pp. 574-81. doi:10.1016/j.tig.2012.07.006

Rothman, Stephen (2002). Lessons from the Living Cell: The Limits of Reductionism. New York: McGraw Hill.

Shabalina, Svetlana A., Nikolay A. Spiridonov and Anna Kashina (2013). “Sounds of Silence: Synonymous Nucleotides as a Key to Biological Regulation and Complexity”, Nucleic Acids Research vol. 41, no. 4, pp. 2073-94. doi:10.1093/nar/gks1205

Subramaniam, Arvind R., Tao Pan, and Philippe Cluzel (2013). “Environmental Perturbations Lift the Degeneracy of the Genetic Code to Regulate Protein Levels in Bacteria”, PNAS vol. 110, no. 6 (Feb. 5), pp. 2419–24. doi:10.1073/pnas.1211077110

Further information: On the notion of biological codes, see Getting Over the Code Delusion: Biology’s Awakening and also Logic, DNA, and Poetry.

Regarding the flexible potentials of protein form, I dealt with so-called “intrinsically disordered proteins” in “Are Disordered Proteins Really Disordered?”

For a more realistic picture of the countless factors playing a role in gene expression, skim through the collection of notes I’ve called How the Organism Decides What to Make of Its Genes.

This document: https://bwo.life/mqual/genome_9_synonyms.htm

Steve Talbott :: Look What’s Happened to Genetic Synonyms!