This glossary is currently incomplete, and will be expanded in the future. Those who choose to dive into it now, however, may find that it serves excellently as a supplementary text for self-learning on some topics. For example, you could start with a basic term such as “DNA” or “double helix” or “chromatin” and then just follow the links according to your own needs and interests. Readers are invited to submit suggestions for improvement.
acetylation. Attachment of an acetyl group to a molecule, which is then said to be “acetylated”. Certain enzymes (known as acetyltransferases) can do this. See also histone modifications. Deacetylation is the corresponding removal of an acetyl group, accomplished by histone deacetylases.
acetyl group. A small chemical group with the formula, -COCH3. See acetylation.
acetyltransferase. An enzyme that can attach a acetyl group to another molecule. In an epigenetic context, the term generally refers to the attachment of an acetyl group to a histone (by a histone acetyltransferase).
actin. A very common protein that forms filaments. It forms part of the cellular “skeleton” (cytoskeleton) in the cytoplasm of all cells, and also plays a major role in muscle contraction. It has been found to be present in the cell nucleus as well, and to be required for certain chromosomal movements.
activator. A protein transcription factor with a positive effect on gene expression. (Compare repressor.) An activator may work in conjunction with one or more co-activators and various other factors. With or without the help of co-factors, the activator commonly plays a role in bringing RNA polymerase (the transcribing enzyme) to the gene promoter. DNA sequences bound by activators include enhancers and promoters. Note: the terms “activator” and “activate” can be used more generally to refer to any factor or process that helps to bring a gene to expression.
adaptation. This term is afflicted with a boatload of controversy, which we will ignore. Here’s the Wikipedia definition, offered in the context of evolution: “An adaptation in biology is a trait with a current functional role in the life history of an organism that is maintained and evolved by means of natural selection. An adaptation refers to both the current state of being adapted and to the dynamic evolutionary process that leads to the adaptation. Adaptations contribute to the fitness and survival of individuals”. The verbal form, to adapt means to acquire an adaptation. But outside the evolutionary context, one can also speak of adaptations by individual organisms — for example, to their local environment. This is closer to the common, non-evolutionary meaning of “adaptation”. Organisms possess a remarkable ability to adapt to all kinds of internal and external conditions they have never encountered before while maintaining their own particular character or way of being.
adenine. See nucleotide base.
algorithm. A step-by-step, effective procedure for calculation or automated reasoning. Sometimes roughly compared to a recipe. Adjective: algorithmic.
allele. Human genes occur in pairs, one on each chromosome of a chromosome pair. The paired genes are called “alleles” and can have a differing form and significance, as when (in the case of some flowers) the allele on one chromosome tends to produce, say, a red petal color and the other allele tends to produce a white petal color. (The actual color will typically depend on the relative effects of the two alleles, among many other things, both physiological and environmental.)
allostery. The binding of a molecule to an enzyme or other protein at one site, whereby the functioning of a different site on the protein is altered, usually because the bound molecule changes the conformation of the protein. Adjective: allosteric.
alternative splicing. Precursor mRNA can be spliced in different ways, so that a single gene may lead, via differently spliced RNAs, to different proteins. It is thought that a high-percentage of human genes are subject to the alternative splicing of their RNA transcripts, and the transcripts of some genes can be spliced in hundreds of different ways. See also RNA splicing and trans-splicing
amino acid. Amino acids are, among other things, constituent elements of protein. There are twenty different kinds of amino acids in human protein, and any number of amino acid molecules — up to many thousands — are arranged in sequence to form the main body of a protein.
aneuploidy. The occurrence of a particular chromosome in an unusual number — for example, one or three copies per cell, when the usual number (as in humans) is two. To be distinguished from polyploidy, which refers to the complete set of chromosomes in a cell. Adjective: aneuploid.
antigen. A substance that stimulates the generation of antibodies as part of an immune response.
antisense strand. (It is best to read the definition of sense strand first.) The antisense strand is the strand of the two-stranded double helix from which protein-coding RNA is generally transcribed. The codons of the RNA (which, because it is protein-coding, is messenger RNA, or mRNA) are canonical elements of the genetic code. But because of the principle of base pair complementarity, the codons of the antisense strand from which the mRNA was transcribed do not directly represent the genetic code. They are complements of the code; hence the term “antisense”. Compare the different meaning of “antisense” in antisense transcription.
antisense transcription. Transcription from the strand of the double helix that generally does not yield protein-coding RNA. While the usage can be confusing, this means that antisense transcription occurs from the sense strand of the double helix. Because of the principle of complementarity in transcription, the RNA produced from the sense strand will not directly embody the genetic code and therefore will not (at least as a general rule) be able to yield protein through translation. Recall also that the two strands of the double helix have a directional aspect, and are “pointed” in opposite directions. Therefore transcription of the two strands must run in opposite directions as well. Antisense transcription occurs along the sense strand in a direction opposite to the familiar protein-coding gene expression from the antisense strand. In sum, sense transcription occurs from the antisense strand, as the result of which (in the case of protein-coding genes) a messenger RNA is produced (via complementation) whose codons are sense rather than antisense, and therefore can be translated into protein. Antisense transcription occurs from the sense strand, as the result of which noncoding RNA is produced (via complementation) whose codons are antisense rather than sense and therefore cannot be translated into protein.
assortative mating. Nonrandom choice of mates in sexual reproduction. In particular, an organism might tend to favor a mate either like itself, or different from itself. In the former case, variation within the larger population will be increased, whereas in the latter case it will be decreased.
ATP. Adenosine triphosphate, a molecule playing a central role in the storage and transfer of energy within the cell. It is used, for example, by ATP-dependent chromatin remodeling complexes, which apply energy derived from ATP to the restructuring of nucleosomes.
basal transcription factor. See general transcription factor.
base. See nucleotide base.
base pair. Two bonded nucleotide bases joined to opposite strands of the DNA double helix. These paired bases form the “rungs” on the spiraling DNA “ladder”, and the bases in each pair are nearly always complementary to each other. See also nucleotide base.
base pair complementarity. The four DNA nucleotide bases are cytosine (indicated by the letter “C”), guanine (G), adenine (A), and thymine (T). It happens that these four bases normally form base pairs in only two ways: cytosine pairs with guanine and adenine pairs with thymine (C-G and A-T). The members of each pair are said to be “complementary” to each other. This means that a complete, double-stranded DNA molecule can be faithfully replicated by separating the strands and, to each one, adding a second strand while making sure that the nucleotide bases of the second strand are complementary to those of the first strand. (This is, in general, how the cell replicates its DNA prior to dividing.) Actually, this picture is a simplification, since there are other constituents of DNA beside the nucleotide bases. But because the nucleotide bases are thought of as containing the essential genetic code, one can picture the complementarity of bases as a means of preserving the fidelity of the code. RNA, while typically (but by no means always) single-stranded, also preserves this code: in the formation of an RNA molecule from the template of a DNA strand, the nucleotide bases of the forming RNA are added sequentially in the same complementary fashion as when a DNA strand is replicated. In RNA, however, the base uracil takes the place of thymine in DNA.
bind. To attach chemically; form a chemical bond with. See also binding site.
binding site. Typically refers to the particular sequence of nucleotide bases on a DNA or RNA molecule that a protein or RNA molecule can “target” and bind to — that is, can attach to. The affinity of a protein for such a binding site is given by the folded shape, distribution of electrical charges, and perhaps other characteristics of the protein molecule. The binding affinity of an RNA molecule for another RNA or DNA is a matter of sequence base pair complementarity.
bit. In information theory, a single bit represents the amount of information we gain when we learn the actual state of a device that can be in either of two equally likely states. For example, when we flip a coin and discover whether the result is “heads” or “tails”, we have gained one bit of information. Putting it differently: our uncertainty about two equally possible outcomes is resolved. Since (simplifying a little) DNA can have any one of four possible nucleotide bases at a given position, we gain two bits of information by learning the actual base (or “letter”, as many put it) at that position. It is also often said that the letter contains two bits of information. In any case, “two bits” is only roughly correct because, in the case of nucleotide bases, the choices are not equally likely to any high precision. Worse, the “same” letter is often not really the same, as when it can be either methylated or non-methylated.
blastocyst. The early stage of embryonic development when the morula has been transformed into a ball with a “hollow”, fluid-filled center. The term “blastula” rather than “blastocyst” is used for non-mammalian vertebrates.
cell membrane. The membrane that separates the interior of a cell from its environment. The separation, however, is never anything near absolute; traffic of many sorts continually moves across the membrane in both directions. (Plants have different bounding structures known as “cell walls”.) See also nuclear envelope
cellular lineage. Any succession of cells in a body produced by a series of cell divisions. The succession is a “thread” consisting only of single cells. That is, at each cell division, only one of the daughter cells is considered part of any given lineage; the other daughter cell becomes part of a separate lineage. In other words, there is a forking of lineages at each cell division. Compare evolutionary lineage.
chaperone. A molecule that assists in the folding or assembly of other molecules or complexes without itself becoming a part of the end product.
character. In the context of evolution and development, a “character” refers to a more or less discrete trait (feature) of an organism at any level of observation, from the molecular to the visible. Not only physical processes and aspects of form, but also behaviors qualify as characters.
chimeric gene. A gene formed from the fusion of two or more genes. Sometimes these are called “fusion genes”, with the term “chimeric gene” referring to the joining of parts of different genes.
chromatid. One of two copies of a duplicated chromosome, before the copies are separated and distributed between daughter cells in cell division.
chromatin. The complex of DNA, proteins, and RNA that constitutes chromosomes. The histones that form nucleosome “core particles, are the most abundant proteins in chromatin, but many other proteins — transcription factors, activators, repressors, chromatin remodeling complexes, and other sorts architectural proteins — play a role. Many of these proteins transiently associate with, and dissociate from, chromatin, which is highly dynamic in form and structure.
chromatin remodeling. The architectural re-structuring of chromatin. This re-structuring can take a number of forms: compaction or opening-up of the chromatin fiber; sliding nucleosomes along the DNA; making histone modifications; contributing to the assembly or disassembly of nucleosomes; and loosening or tightening the binding of DNA to nucleosome core particles. All of these changes play a substantial role in gene regulation.
chromosomal_crossover. A process during meiosis by which the two chromosomes of a chromosome pair exchange parts of themselves.
chromosome. A long, continuous length of DNA “packaged” by means of various RNAs, many different proteins (including histones), and other molecules. The chromosome contains many chemical sequences known (with ill definition) as “genes”. Humans have 46 chromosomes, which come in 23 pairs - one member of each pair being inherited maternally and one paternally. The “same” genes generally occur in both members of a pair; any two such corresponding genes are known as alleles of a single gene.
chromosome territory. A particular region of the cell nucleus characteristically occupied by a chromosome in a given tissue type at a given stage of development and under a particular set of conditions. The spatial organization of chromosome territories within a nucleus has a bearing on gene regulation.
circadian. Of or relating to an approximate 24-hour rhythm.
cleavage. The human egg cell is massive in relation to the size of its nucleus. After fertilization, the zygote begins to undergo cell division, producing two daughter cells which then also divide, and so on. This is not the usual sort of cell division, however; after a division, the daughter cells do not grow. The entire developing spherical clump of cells remains about the same size as the original egg. This type of cell division without growth is known as “cleavage”, and it continues for about four rounds of division (resulting in approximately 16 cells total), at which point a more normal ratio of nuclear size to cell size is obtained in all the cells. Then the switch from cleavage to normal cell division occurs.
co-activator. A protein or protein complex that, like an activator, encourages expression of a gene. But whereas the activator, like all transcription factors, recognizes and binds to a specific DNA sequence, a co-activator is not sequence-specific. Rather than binding directly to DNA, it binds to the activator. It may thereby aid, for example, in recruiting RNA polymerase to a gene promoter. See also co-repressor.
codon. “Words” of the genetic code consisting of three successive nucleotide bases, or “letters”. Each codon of the protein-coding portion of a gene is supposed to map (via a messenger RNA intermediary) to one amino acid in the protein coded for by the gene. See also synonymous codon.
competing endogenous RNAs (ceRNAs). RNAs — particularly mRNAs and long noncoding RNAs that mutually regulate each other by virtue of their competition for binding by microRNAs. When different RNAs have binding sites for the same microRNAs, and when, say, higher expression of one of these RNAs allows it to “soak up” many of the microRNAs, then the other RNAs that might have been bound by these microRNAs are not bound and therefore may produce more protein. (microRNAs tend to repress translation of mRNAs.) The situation, however, is dynamic: the “sponge” RNAs will tend to be degraded, with its population declining, which will of course change the “competitive” relationships. And when there are many different RNAs participating in a network mediated by various microRNAs, the complexity can quickly overwhelm our powers of analysis.
complementarity. See base pair complementarity
condensed chromatin. See the more technical term, heterochromatin.
conservation (evolutionary). In an evolutionary context, “conservation” typically refers to the preservation of the same DNA sequences (whether exactly or only approximately) across multiple related species. The more remote the common ancestor of the various species preserving a given sequence, the more “highly conserved” the sequence is said to be. One can speak in an analogous way of the evolutionary conservation of RNA or protein structures, or of molecular functions. Conservation is often taken to signify an adaptive role for the thing conserved. That is, changes subverting this role tend to be weeded out by natural selection; hence the conservation. (However, the reality turns out to be much more complex than this.)
convergent evolution. The acquisition by two unrelated, and perhaps very distantly related, evolutionary lineages of the same or similar trait. A common example of such a trait is the “camera-like” eye in cephalopods (octopus, squid), which is independently acquired in mammals.
co-repressor. A protein or protein complex that, like a repressor, discourages expression of a gene. But whereas the repressor, like all transcription factors, recognizes and binds to a specific DNA sequence, a co-repressor is not sequence-specific. Rather than binding directly to DNA, it binds to (and is often said to be recruited by) the repressor. It may thereby aid, for example, in blocking access of RNA polymerase to a gene promoter. See also co-activator.
crossover. See chromosomal crossover.
cytosine. See nucleotide base.
cytoskeleton. A dynamic network of protein filiments and tubules in the cytoplasm of a cell. The network gives the cell reasonably stable structure while also contributing crucially, via the network’s continual restructuring, to cell movement and to the transport of materials within the cell. As with nearly everything else in the cell, the cytoskeleton even plays a role in gene regulation.
development. In a cellular context, “development” refers most narrowly to the process by which originally undifferentiated cells (for example, stem cells) progressively become more specialized or differentiated through cell division. In a broader sense, the term refers to all the processes of growth and maturation in an organism. In this latter sense, we speak of an organism’s ontogeny.
differentiation. The movement from less specialized cellular forms to more specialized ones. We can also speak of "organ differentiation", referring to the way that organs, with their specialized cell types, develop from earlier stages of the organism lacking those specializations. These developments may occur without any changes in the genome - that is, in the genetic sequence of the cells’ DNA. Understanding of differentiation therefore requires a reckoning with epigenetic processes.
digital. A sequence such as that of a DNA or RNA molecule is said to be digital if it consists of discrete, repeatable elements that can be discriminated from each other unambiguously. After the discovery of the structure of the double helix, the genome was widely taken to consist essentially of digital information carried by a linear arrangement of “letters” (nucleotide bases) — information which should be specifiable with perfect, yes-or-no definiteness.
diploid. Possessing two sets of chromosomes — that is, possessing a pair of each type of chromosome, with one member of the pair inherited maternally and one inherited paternally. In mammals, all cells except the gametes are normally diploid. Compare haploid.
DNA. Deoxyribonucleic acid, a molecule that figures centrally in protein production. Constituting part of the material of chromosomes, it is commonly double-stranded in the form of a double helix. Connecting the two strands are base pairs consisting of nucleotide bases. Here you will find a conventional animated stick figure of DNA; it schematically represents a few isolated features abstracted from whatever reality the actual material chromosome presents in the cell.
DNA breathing (1). The rhythmic unwrapping and rewrapping of DNA around nucleosomal histones. This takes place at the DNA entry and exit sites — that is, where the DNA meets or leaves the histone core particle. The breathing takes place rapidly, on the order of milliseconds. Not to be confused with DNA breathing (2).
DNA breathing (2). The dynamic opening and closing of “bubbles” between the two strands of the DNA double helix. That is, for a certain length the two strands of the double helix become disconnected, and then later they reconnect. This is thought to be important for, among other things, the initiation of transcription, because RNA polymerase can only begin transcribing once the double helix has begun to be “unzipped”.
DNA methylation. The attachment of a methyl chemical group to particular nucleotide bases (usually cytosine) of the DNA molecule. Methylation is recognized by various regulatory factors and therefore plays a major role in gene regulation. In general, DNA methylation tends to have a repressive effect on gene expression, but this generality is qualified by many subtleties.
DNA replication. The process by which both strands of a double-stranded DNA molecule serve as templates for strand reproduction. The result is two double-stranded DNA molecules, each containing one strand from the original molecule and one newly synthesized strand.
double helix. The form commonly taken by DNA (and also by double-stranded RNA. Speaking very generally, it’s the form you get when you take two cords and twist them together, so that each one spirals around the other.
double-stranded RNA (dsRNA). RNA that, like normal DNA, has two complementary strands joined by nucleotide base pairs. dsRNA can be brought into cells by viruses, and it can also be produced natively. This can happen, for example, when a length of RNA happens to contain two adjacent, complementary, and probably rather short sequences of nucleotide bases. That is, when the RNA folds sharply (into a hairpin shape) at the point between the two sequences, it brings a series of complementary bases together, allowing them to form base pairs that hold the two strands together.
downstream. See upstream/downstream.
endosymbiosis. The now widely accepted theory of endosymbiosis holds that various organelles of cells in multicellular organisms originated through permanent incorporation into cells of other, once free-living, single-celled organisms, particularly bacteria.
epigenetic inheritance. Depending on context this can refer either to inheritance between generations of an organism, or between cell generations within an organism. In both cases the reference is to inherited traits that are mediated, not by the DNA sequence, but by epigenetic processes or conditions. Thus, something in the parents’ activity or environment may lead to epigenetic changes in their cells — and particularly in their germ cells — that are passed on to their offspring, producing traits in the offspring that cannot be accounted for by the parents’ DNA or any mutation of it.
epigenetics. Literally, that which is “added to” genetics. The term is most commonly taken to refer to heritable changes in gene expression that do not result from changes in actual DNA sequences. (“Heritable” here can refer not only to inheritance between parents and offspring, but also between parent and daughter cells.) Construed narrowly, epigenetic features include certain modifiable aspects of chromosomes — in particular, DNA methylation, covalent histone modifications, and patterns of DNA accessibility. (Regarding the latter, see euchromatin and heterochromatin.) More broadly, epigenetics also includes such things as gene regulation by small RNAs (for example, small, interfering RNAs and microRNAs), alternative RNA splicing, and various aspects of translation regulation. Since there is no boundary between these “layers” of regulation and still more encompassing layers, we could take “epigenetic” to refer to “all the structures and processes of the cell bearing on gene expression”. (It would be hard to find such structures or processes that are, as a matter of absolute principle, not heritable by daughter cells.) Nearly all the transformations involved in cellular differentiation are related to epigenetic processes. See also epigenome.
epigenome-wide association studies (EWAS). Studies designed to identify epigenomic features associated with a particular complex trait, such as most diseases. The results relate to statistical features of populations, and do not effectively predict the disease risk, for example, of any given individual either within or outside the studied population. One of the hopes driving epigenome-wide association studies, however, is that they will eventually yield a set of epigenomic features that can help explain the genesis and heritability of the examined traits. See also genome-wide association studies (GWAS).
epistasis. Very generally: interaction between different genes, or between genes and other genomic elements, that alters their roles in biological processes. However, different definitions of epistasis are employed in different disciplines. In population genetics, for example, a purely statistical definition of the term doesn’t imply anything in particular about actual biochemical interaction between genes. In reality, it is impossible to cleanly separate gene-gene interactions from all the broader cellular processes in which genes, along with countless other factors, are involved.
eukaryote. An organism whose cells contain a nucleus and, usually, other membrane-bound organelles. Eukaryotes thus include all multicellular organisms as well as protozoa and some other unicellular organisms. Adjective: eukaryotic. Compare prokaryote.
evolutionary lineage. Simplistically: A linear sequence of species running from an ancestor species to a descendent species via a series of speciation events. The situation is complicated by various factors, among them: (1) there is not always a clean separation of species in different “lineages”, and various sorts of hybridization (interbreeding between species) can occur; (2) horizontal gene transfer is widely known to occur; and (3) endosymbiosis is also thought to have occurred. So evolutionary lineages are, in general, not really linear. Compare cellular lineage.
EWAS. See epigenome-wide association studies.
exon. A segment of the DNA sequence of a gene; more specifically: a segment whose corresponding segment in the gene’s RNA transcript is retained until translation rather than deleted as part of the RNA splicing process. Or, in the case of noncoding DNA and its RNA transcripts: exons are those segments retained in the final functional form of the RNA. “Exon” can refer either to the DNA sequence or the corresponding transcribed RNA sequence. Compare intron.
expression. The production of RNA using a DNA sequence as a template. The DNA sequence is then said to have been “expressed” or "transcribed". The DNA sequence may represent a protein-coding gene (see also gene expression), or else it may be noncoding, in which case the expressed RNA is not translated into protein, but may have any of countless regulatory functions within the cell. Also, proteins may be said to be “expressed” from RNAs, and this in turn can be thought of as a step in the expression of protein from DNA. In the broadest sense, everything that affects the RNA or protein resulting from transcription and/or translation — for example, alternative RNA splicing, and RNA or protein degradative processes — may be said to bear on the expression of DNA sequences.
gamete. A haploid reproductive cell — an egg cell in the female or sperm in the male.
gene. Sorry, but you won’t pin me down on this one. “A gene is anything a competent biologist has chosen to call a gene” (philosopher of science Phillip Kitcher, 1992). “Our knowledge of the structure and function of the genetic material has outgrown the terminology traditionally used to describe it. It is arguable that the old term gene, essential at an earlier stage of the analysis, is no longer useful, except as a handy and versatile expression, the meaning of which is determined by the context” (geneticist Peter Portin, 1993). For a brief overview of the history of the concept of the gene, see this article by biologist Craig Holdrege. You may also wish to take a brief look at the most recent inscriptions added to the gravestone of the traditional concept of the gene.
gene expression. Most commonly this term is applied to protein-coding genes, where it refers to the production of messenger RNA (mRNA) using a DNA gene sequence as a template. The gene is then said to have been “expressed” or “transcribed”. The mRNA may (after various sorts of processing, such as splicing) be translated into a protein. Expression also has a more general meaning.
gene pool. The set of all genes in a particular population. The population under consideration is usually a reproductively isolated collection of members of a particular species.
general transcription factor. A protein that is like a transcription factor but without being specific to particular genes; rather it enables the actual process of transcription as such, regardless of the (protein-coding) genes being transcribed. Many of these factors are part of the pre-initiation complex. However, more recent research is showing that the word “general” is a misnomer; these factors can be more or less specific, playing different roles with different genes, or different classes of genes. General transcription factors are also known as “basal transcription factors”.
gene regulation. The management (by the cell and organism as a whole) of gene expression. This involves gene activation, gene silencing, the timing and extent of gene transcription, the “editing” of the resultant transcripts, the regulation of translation, and so on - everything that affects what the cell ultimately makes of the gene. Particular regions of DNA that participate in regulation - for example, by being targets for transcription factors - are known as “regulatory regions” or “regulatory sites”. Gene regulation is sometimes more broadly referred to as “transcription regulation”.
gene replication. See DNA replication.
gene silencing. Blocking of the processes that lead from a gene to its possible protein end products. This blocking can occur at many different points. Most generally: it can involve the prevention of gene transcription (“transcriptional silencing”), or the modification or destruction of the gene’s mRNA transcript so as to prevent translation (“post-transcriptional silencing”). More particularly, gene silencing can refer to the action of a DNA sequence known as a silencer
genetics. The study of genes — and, more broadly, DNA. It is often considered (erroneously) to be equivalent to the study of inheritance or of the sole, or at least primary, heritable substance.
genetic code. This term has many meanings both legitimate and illegitimate. Most basically, it refers to the sequence of nucleotide bases, or “letters”, in DNA and RNA, and to the way that successive groups of three such bases in a protein-coding gene or protein-coding RNA can (with various complications) correspond to the successive amino acids making up a protein. The gene and RNA are then said to “code for” that protein. Each three-letter group of a coding sequence is a codon. While the codes of a gene and its RNA are complementary to each other, they still effectively correspond to the same amino acid sequence, given their separate places in the overall process. See base pair complementarity.
genetic drift. "Random" genetic change that becomes established in a population despite having no particular adaptive value.
genetic recombination. The exchange of material between chromosomes, of which chromosomal crossover is one example. This occurs most regularly during meiosis, where it is known as “meiotic recombination”. But it can also occur in somatic cells.
genome-wide association studies (GWAS). Studies designed to identify genome sequence features associated with a particular complex trait, such as most diseases. The results relate to statistical features of populations, and do not effectively predict the disease risk, for example, of any given individual either within or outside the studied population. One of the hopes driving association studies, however, is that they will eventually yield a set of genome features that can be demonstrated to explain (more or less fully) the heritability of the examined traits. See also epigenome-wide association studies (EWAS).
genotype. Refers to the genetic constitution of a cell or organism, considered as being of a certain type or as corresponding to a certain character. Either the genome as a whole can be in view, or a particular gene (allele), or combinations of genes. Often contrasted with phenotype. Adjective: genotypic.
guanine. See nucleotide base.
GWAS. See genome-wide association studies.
haploid. In organisms normally possessing paired sets of chromosomes, “haploid” cells possess only a single set of unpaired chromosomes. In such organisms gametes are normally haploid. Compare diploid.
helical axis. If you imagine the two strands of a double helix wrapped around a wire core, this wire would represent the helical axis.
heritability. Any given trait within a population will tend to vary from one individual to another. The amount of this variation that can be traced to genetic differences (under various controversial assumptions and conditions, and often contrasted with the amount of variation traceable to environmental differences) is considered to be the heritability of the trait. This concept of heritability, relating as it does to populations rather than individuals, is a highly technical and statistical one. It should not be confused, for example, with the way we think of a child’s inheritance from its parents.
heterochromatin. Chromatin in its more tightly packed, less accessible, and less actively transcribed state, often containing fewer genes. nucleosomes and various chromatin-associated proteins play a major role in the compact structuring of heterochromatin. Compare euchromatin.
heterozygous. An organism is said to be heterozygous with respect to a particular gene if the two alleles of the gene are different, as when a pea plant has one allele for a white flower color and one allele for violet-colored flowers. (The actual trait in such cases depends upon the dominance relations between the two alleles.) Compare homozygous.
histone. A family of simple proteins, abundant in the cell nucleus and constituting a substantial part of the (mostly) protein/RNA/DNA complex known as chromatin — the physical substance of chromosomes. A group of eight histones - two each of four different kinds — makes up the histone core particle of a nucleosome. Linker histones also participate in chromatin.
histone code. The code presumed to be found in the collection of histone modifications. The idea is this: for any given nucleosome core particle there are many possible (co-valent) modifications of its constituent histones, leading to countless possible combinations of such modifications. It could be, then, that for each distinct combination — or, at least, for many such combinations — there is a specific gene-regulatory implication. For example, a particular combination might result in the binding of a specific chromatin remodeling complex. This mapping from specific combinations of histone modifications to specific effects would be the “code”. However, the idea that these modifications not only have regulatory significance but have it in a fixed, precise, and context-independent fashion has been increasingly discredited.
histone core particle. A collection of histones making up the protein core (often likened to a spool, but in this case an extraordinarily complex, irregular spool) around which DNA wraps. The resultant DNA-protein complex is called a nucleosome. The “standard” histone core particle contains two each of four histones: H2A, H2B, H3, and H4. However, variant histones play an important role in gene regulation, and each type of histone can undergo many modifications that bear on gene expression.
histone modification, often referred to as “histone post-translational modification”, because the changes occur after the translation that produces the histone protein. A histone modification consists of the addition or subtraction of any one of several chemical groups to an individual amino acid of a histone — especially a histone belonging to a nucleosome. The modified amino acid might be on either the histone tail or the main body of the histone. Depending on the chemical group involved, the modification is called methylation (addition of a methyl group), acetylation (addition of an acetyl group), phosphorylation, ubiquitination, sumoylation, and so on. These modifications can dramatically affect the electrical and other properties of nucleosomes, and they play a major role in gene regulation.
homeostasis. The process by which an organism or cell or any other organic entity dynamically maintains a properly functional, approximately stable state in the face of disturbances. For example, a warm-blooded animal tends to maintain a constant internal temperature through a wide range of adjustments, despite changes in the temperature of its environment. (This includes its inner environment, as when it is exercising its muscles and generating large amounts of heat.) The maintenance of homeostasis shows the organism to be capable of a kind of directed or goal-oriented activity.
homozygous. An organism is said to homozygous with respect to a particular gene if the two alleles of the gene are essentially the same, as when a pea plant has two alleles specifying a white color for flowers. Compare heterozygous.
hormone. A substance produced in particular cells (for example, in a gland) that can travel to other parts of the body and (often in very small quantities) influence those other parts. The hormone, which may be recognized by receptor molecules, is often said to carry a signal.
horizontal gene transfer. Transfer of genetic material “laterally” between organisms — including organisms of different species — rather than “vertically” from parents to offspring. This occurs often in one-celled organisms and, for example, helps to account for antibiotic resistance in bacteria. (Resistant bacteria pass their genes to nonresistant bacteria.) But it also can occur with more complex, multicellular organisms, where it is exemplified by genetic material acquired from (or given to) viruses and bacteria. Horizontal gene transfer, together with endosymbiosis tends to make the traditional evolutionary “tree” look more like a reticulated (web-like) structure.
induced pluripotency. The pluripotent state of a cell that has been artificially transformed from a non-pluripotent condition. The transformation can (with more or less exactitude) be effected (“induced”) by inserting certain genes associated with pluripotency, or by adding certain transcription factors to the cell, or by a growing number of other means.
information. In common usage — and a great deal of scientific usage — “information” has definitions all over the map, and is often not defined at all, but left extremely vague. Usage varies from particular mathematical constructions to its being a broad synonym for “meaning”. But a consensus technical usage in molecular biology (as well as computation) might run something like this: information is a quantity where a single bit represents the amount of information we gain when we learn the actual state of a device that can be in either of two equally likely states. For example, when we flip a coin and discover whether the result is “heads” or “tails”, we have gained one bit of information. Putting it differently: our uncertainty about two equally possible outcomes is resolved. Since (simplifying a little) DNA can have any one of four possible nucleotide bases at a given position, we gain two bits of information by learning the actual base (or “letter”, as many put it) at that position. It is also often said that the letter contains two bits of information. In any case, “two bits” is only roughly correct because, in the case of nucleotide bases, the choices are not equally likely to any high precision. All this aside, most references to “information” in the biological literature denote or strongly connote something like “the specific significance or meaning of an activity or structure for the organism”.
inheritance of acquired characteristics (Lamarckism). The idea that traits, or characters, can in some literal sense be passed from an organism to its offspring, not only as those traits are supposedly determined in a fixed way by genes, but also as they are altered by the activity of the parent organism during its life. The classic example for ridiculing this notion is the giraffe’s neck: no matter how much a giraffe stretches its neck during its lifetime in order to browse on higher leaves, this will not affect the inherited neck length of the giraffe’s offspring. But researchers today are exploring a rapidly increasing number of cases where acquired characteristics are passed on to offspring quite independently of genetic inheritance. This inheritance is often achieved by epigenetic means. “Lamarckism” refers to Jean-Baptiste Pierre Antoine de Monet, Chevalier de la Marck (1744-1829), who argued for the inheritance of acquired characteristics.
initiator (Inr). The initiator, one of the components of a gene promoter. In the absence of the TATA box — or in conjunction with it or with other promoter elements — Inr can provide a base for the constellation of the pre-initiation complex.
insulator. A DNA sequence that acts as a kind of boundary element, blocking the effects of certain regulatory elements. In particular, an insulator can block the role of an enhancer, or, more broadly, it can prevent the spread of highly condensed chromatin into neighboring regions, where the condensed chromatin might have the effect of suppressing gene expression. Insulators help make possible the independent regulation of nearby chromosome regions.
intron. A segment of the DNA sequence of a gene; more specifically: a segment whose corresponding segment in the gene’s RNA transcript is deleted from the transcript before translation. The deletion occurs as a result of the RNA splicing process. Or, in the case of noncoding DNA and its RNA transcripts: introns are those segments deleted before the final functional form of the RNA is achieved. “Intron” can refer either to the DNA sequence or to the corresponding transcribed RNA sequence. Compare exon.
in vitro. “In glass” — that is, in an artificial environment such as a test tube or laboratory dish, as opposed to in vivo.
in vivo. In a living context — more specifically, in the living cell or organism, as opposed to in vitro.
ionizing radiation. Radiation consisting of high-energy particles that can strip electrons from atoms, thereby changing their chemical reactivity, which in turn can cause biological damage.
jumping gene. Another name for a transposon.
Lamarckism. See inheritance of acquired characteristics.
LCR. See locus control region.
lineage, cellular. See cellular lineage.
lineage, evolutionary. See evolutionary lineage.
histone (most often the histone known as “H1”) that binds to and locks the entering and exiting DNA to the histone core particle of a nucleosome, thereby stabilizing the nucleosome and conducing to the formation of more regular arrays of compact chromatin.. A
locus control region (LCR). A DNA sequence that helps to regulate a cluster of related genes. These genes may be both nearby and far away on the same chromosome, or even on different chromosomes. The LCR plays a role in organizing the chromatin sections containing the genes and coordinating their expression.
long noncoding RNA (lncRNA). A non-protein-coding RNA that, by rather arbitrary definition, is greater than 200 nucleotide bases in length. lncRNAs come in a great variety — they may originate from DNA sequences that overlap protein-coding genes, they may be products of antisense transcription, they may be intronic or intergenic transcripts, or they may arise from pseudogenes or retrotransposons. Their functionality was long doubted, but we are now seeing a rapidly increasing number of lncRNAs with demonstrated functions. As one researcher put it, "They regulate every process under the sun”.
major groove. If you wrap two uniform cords around each other in the manner of a double helix, you can trace with your finger either of two grooves between the cords. Each groove spirals around with the cords. However, in the case of the DNA double helix, because of the bulky and not perfectly symmetrical nucleotide bases (letters of the genetic code) connecting the two cords like rungs of a ladder, you would find one groove to be wider than the other. That is, the distance in moving from one cord to the other (passing around the bulky material in the middle) would be greater when traversing one groove than when traversing the other. The wider of the two grooves is the major groove, and the narrower one is the minor groove.
MAR (matrix attachment region). A DNA sequence particularly well suited to serve as an anchoring site for tethering DNA to the nuclear matrix. The constellation of many such tetherings contributes to the looping structure of chromosomes.
mark. Geneticists commonly refer to the attached chemical groups resulting from DNA methylation or histone modification as “marks”. Using the word verbally, one can say, for example, that an enzyme “marks” DNA with methyl groups, while another enzyme removes such marks — that is, removes the methyl groups.
meiosis. A complex, multistage process in the reproductive organs by which certain cells of the germline (following an earlier DNA replication) divide twice, yielding four cells, each with half the number of chromosomes possessed by the original (pre-meiotic) cell. In animals, these resulting haploid cells are called gametes. Compare mitosis. Adjective: meiotic.
meiotic recombination. See genetic recombination.
membrane. A membrane is a thin, film-like structure that separates two fluids. It acts as a selective barrier, allowing some particles or chemicals to pass through, but not others (Wikipedia). Animal cells are bounded by a membrane (technically, a lipid bilayer) known as the “plasma” or “cell” membrane, and various organelles within the cell have their own bounding membranes.
membranome. A rather vague term referring to the collection of biological membranes in a cell or organism — particularly with reference to their informational role, which is to say (less tendentiously): with reference to their biological significance and functioning. “Membranome” may include in its connotations something like “digital code” (ala “genome”), but that is presumably only for a certain feel-good effect.
Mendel’s Laws. The first law — the Law of Segregation — states that gametes receive only one member of each parental chromosome pair. So a single gamete contains only one allele of any particular gene. The alleles segregate. The second law — the Law of Independent Assortment (also known as the Law of Inheritance) — states that distribution of alleles to gametes occurs independently for each gene. That is, if gene X has alleles x’ and x", and if Gene Y has alleles y’ and y", then the following combinations in gametes are possible: x’y’, x’y", x"y’, and x"y". This second law, as it happens, is not true in general; it is valid only where genes are not linked, as they often are (depending on meiotic recombination, among other things) when they reside on the same chromosome. When linked, the alleles of two genes will be passed along to gametes together rather than independently.
mesenchyme Loosely organized embryonic cells, or tissues, that typically develop into tissues of the blood and lymph systems, and also connective tissues such as bone and cartilage.
metabolism. A rather broad and variously used term referring in general to the breakdown and building up of substances in the body. For example, the constituents of food may be broken down, while proteins and nucleic acids are built up. The adjective is “metabolic”.
metabolite. A molecule that is produced during, or as a result of, metabolic processes. Usually the term refers to smaller molecules — not, for example, to proteins.
metabolome. The complete set of small-molecule metabolites in a cell or organism.
methylation. Attachment of a methyl group to a molecule, which is then said to be “methylated”. Certain enzymes (known as “methylases” or methyltransferases) can do this. See also histone modifications and DNA methylation. “Demethylation” is the corresponding removal of a methyl group, accomplished by demethylases.
methyl group. A small chemical group with the formula, -CH3. See methylation.
methyltransferase (or methylase). An enzyme that can attach a methyl group to another molecule. In an epigenetic context, the term generally refers to the attachment of a methyl group to DNA (by a DNA methyltransferase) or to a histone (by a histone methyltransferase).
microbiome (or microbiota). The community of all microorganisms that share our bodies with us. They have, over the past two decades, been discovered to play decisive roles in human well-being. And some, of course, are, or can become, pathogenic. Microbiomes are also spoken of in other contexts — for example, the microbiome of the mouse, or the microbiome of the oceans.
micro-RNA (miRNA). A small RNA, 21-23 nucleotide bases in length. Like the small RNAs involved in RNA interference, miRNAs are derived from double-stranded RNA (although not double-stranded RNAs that originate from viruses). And they, too, become associated with a protein complex known as a “RISC”, in cooperation with which they (more or less) prevent translation of messenger RNA (mRNA) molecules containing sequences complementary to their own. Preventing translation can be achieved by degrading the mRNA directly, or by blocking its translation by one means or another. One difference between miRNA and the siRNA involved in RNA interference is that the complementarity between the miRNA and the target mRNA need not be exact, so that a single miRNA molecule can target many different kinds of mRNA molecule. This effectively silences, or at least reduces the expression of, the genes producing those mRNAs. There are at least several hundred different miRNAs in humans.
mitochondrion. A membrane-bound organelle found in eukaryotes. The number of mitochondria in a given cell may vary from one to thousands, depending in part on the organism and the cell type. The organelle “packages” energy in adenosine triphosphate molecules for use in the many cellular processes requiring energy. But, as is the case with every part of the cell, mitochondria participate in many other processes, such as those involving growth, cell differentiation, the cell cycle, and cell death, as well as signaling.
mitosis. The complex process by which a cell, following a stage of DNA replication, divides, resulting in two diploid daughter cells — that is, cells with the same number of chromosomes as the parent cell. The DNA sequences in parent and daughters are generally (although never precisely, especially if you take into account such things as DNA methylation) identical. Compare meiosis. Adjective: mitotic.
morphogenesis. The physical structuring, or shaping, of an organism or part of an organism. This occurs most dramatically during the growth of the embryo as cells and tissues differentiate.
morula. The developing embryo, shortly after fertilization, when it consists of a spherical clump of cells produced by embryonic cleavage.
motif. A frequently occurring sequence of nucleotide bases in DNA (or RNA) that has functional significance, as when regulatory proteins “look” for it and bind to it. (The term “structural motif” is applied to the protein structures that recognize the nucleotide sequences.)
mRNA (messenger RNA). See messenger RNA (mRNA).
multipotent. Capable of developing into two or more closely related types of cell. For example, blood stem cells can develop into red cells, white cells, and platelets. Compare totipotent and pluripotent.
mutagenesis. The generation of mutations.
mutagenic. Tending to produce mutations.
mutation. A change in the DNA sequence of nucleotide bases — which is to say (in the usual terminology), a change in the genetic code. An organism containing a mutation is (when the mutation is what one has in view) said to be a “mutant”, and a substance tending to cause mutations is a “mutagen”. Also, a gene that has suffered a change is said to be “mutated”.
myosin. A contractile protein, or, rather, a family of proteins. It is the most common protein found in muscles, working together with actin to produce muscle contraction. Myosin consumes energy in driving movements along actin filaments.
natural selection. Conventionally, abstractly (and rather vacuously) defined as the process by which three conditions are supposed to produce evolutionary change: (1) phenotypic variation, (2) differential reproductive success resulting from that variation, and (3) reasonably consistent principles of inheritance of that variation, principles that are at least partially independent of environmental effects.
noncoding. DNA or RNA that does not code for a protein is said to be "noncoding". Noncoding DNA can have many regulatory functions and can even be transcribed into RNA, but the resultant RNA is also noncoding (will not be translated into a protein) and likewise can have many regulatory functions. Compare protein-coding gene and protein-coding RNA.
Nonsense-mediated mRNA decay (NMD). One of several complex pathways by which messenger RNA (mRNA) molecules are degraded. It particularly relates to mRNAs that have a premature stop codon (that is, a “nonsense” codon).
nuclear envelope. The membrane that encloses the cell nucleus, separating the genetic material and other nuclear contents from the rest of the cell. However, there is intimate communication across the envelope, and numerous "nuclear pores" offer passage between the nucleus and the larger cellular environment.
nuclear lamina. A fibrous network, together with associated proteins, located in the periphery of the cell nucleus, at the inner face of the nuclear envelope. At any given time some chromosome sites can be attached to the nuclear lamina, a situation that tends to correlate with reduced gene expression.
nuclear matrix. A poorly characterized and highly dynamic structural skeleton giving organizational structure to the cell nucleus.
nuclear pore. A narrow channel formed through the nuclear envelope by several hundred protein molecules. The channel allows carefully regulated molecular traffic between the cell nucleus and the cytoplasm. The pore is by no means a static structure, and some of its molecules also move through the nucleus performing other functions. The pore and its constituents play roles in gene expression.
nuclear receptor. A special class of receptor that is “receptive” to the influence of certain hormones and other molecules). As the name implies, nuclear receptors reside in the nucleus, where they can bind directly to DNA. When stimulated by an appropriate hormone or other molecule, the nuclear receptor may either activate or repress the expression of a selected set of genes. Nuclear receptors are therefore transcription factors.
nucleolus. A microscopically visible region, or “compartment”, within the cell nucleus. It contains a concentration of proteins and DNA loci required for the transcription) of appropriate DNA sequences into ribosomal RNA (rRNA). Like other nuclear compartments, or “bodies”, the nucleolus is not membrane-bound.
nucleoplasm. The highly viscous contents that, in the cell nucleus, correspond to the cytoplasm of the cellular region outside the nucleus. It is the medium in which the great macromolecules — preeminently the chromosomes — reside.
nucleosome. A group of (usually) eight histone proteins that together form a kind of "spool" (histone core particle) around which DNA is commonly wrapped somewhat less than two turns. (The length of DNA wrapped around a "standard" nucleosome is usually given as 147 base pairs. But there are many variations upon this standard length.) There are millions of nucleosomes in the human genome, and they are key elements in the compaction, or condensation, of DNA. Nucleosomes are a focus of many different aspects of gene regulation.
nucleosome free region. A stretch of DNA that is free of nucleosomes, perhaps because they have been disassembled and removed, or else have shifted their position as a result of the nucleosome core particle sliding along the DNA.
nucleosome_sliding. The process by which DNA slides around a nucleosome spool. The effect is to displace the spool linearly along the DNA. As a result, some DNA sequences that were wrapped around the nucleosome (and therefore less accessible to regulatory factors) are exposed as free or naked DNA, while other sequences, previously free, are bound to the nucleosome.
nucleotide base. A class of nitrogen-containing chemical groups that are constituents of DNA and RNA. The four main bases in DNA are adenine, guanine, cytosine, and thymine (A, G, C, and T, respectively - "letters" of the genetic code). In RNA, uracil (U) stands in the place of thymine. These bases combine in restricted ways to form complementary base pairs. This complementation is central to DNA replication and gene expression because of the way it allows the strands of DNA to be used as templates for replication or for production of RNA that preserves the sequential information employed by the cell in protein production.
nucleus. See cell nucleus.
oncogenic. Tending to cause cancerous tumors.
ontogeny. The development of an organism across its entire lifespan, from zygote to death. Adjective: ontogenetic. “Ontogenetics” is the study of ontogeny. Compare phylogeny, and see also development.
open chromatin. See the more technical term, euchromatin.
open reading frame. See reading frame.
organelle. A subunit of a cell, generally with a surrounding membrane separating it (to a degree) from other cellular contents. Perhaps the most prominent organelle in the cells of multicellular (and various other) organisms is the nucleus.
parasitic genetic element. A genomic sequence that is thought to be essentially alien to the organism, "using" the organism to advance its own survival, or, rather, the survival of other elements of its kind.
phenotype. An organism’s observable traits, considered as a whole or in part. It can include everything from biochemical characteristics to form and behavior. Often contrasted with genotype. Adjective: phenotypic.
phosphate group. A small chemical group with the formula, PO4. See phosphorylation.
phosphorylation. Attachment of a phosphate group to a molecule, which is then said to be "phosphorylated". Certain enzymes (known as kinases or phosphotransferases) can do this. See also histone modifications. Dephosphorylation is the corresponding removal of a phosphate group, accomplished by phosphatases.
phylogeny. The evolutionary history of natural groups of organisms. Adjective: phylogenetic. “Phylogenetics” is the study of phylogeny. Compare ontogeny.
pluripotent. Capable of developing into a considerable range of different cell types. For example, embryonic stem cells can transform themselves into many, but not all, tissue types during fetal development. Compare totipotent and multipotent. The corresponding noun is “pluripotency”. See also induced pluripotency
polyadenylation. The addition of a series of adenine bases to the end of a messenger RNA molecule, as part of the processing the molecule undergoes in order to become a mature mRNA. The adenine bases are sometimes referred to as a “poly(A) tail”. The presence and length of the tail (number of adenine bases) plays an important role in mRNA regulation and therefore in protein regulation.
polymer. A large molecule (macromolecule) composed of multiple repeated units which are identical or similar.
polyploidy. The occurrence of one or more additional complete sets of chromosomes in a cell, compared to the normal number. In humans the normal number is two, so that three or four sets of chromosomes per cell would be examples of polyploidy. To be distinguished from aneuploidy, which refers not to complete sets, but to particular chromosomes. Adjective: polyploid.
population genetics. “Population genetics is a field of biology that studies the genetic composition of biological populations, and the changes in genetic composition that result from the operation of various factors, including natural selection. Population geneticists pursue their goals by developing abstract mathematical models of gene frequency dynamics, trying to extract conclusions from those models about the likely patterns of genetic variation in actual populations, and testing the conclusions against empirical data” (Stanford Encyclopedia of Philosophy).
prokaryote. Organism whose cells lack a nucleus and other membrane-bound organelles. They are mostly, but not solely, unicellular organisms, and include bacteria and archaea. Adjective: prokaryotic. Compare eukaryote.
promoter. A regulatory DNA sequence, usually close to, and upstream from, the gene or genes it regulates. It serves as a binding site for transcription factors and for the protein complexes that initiate gene transcription, and it serves to identify the start site for transcription.
protein coding gene. A DNA sequence that can lead, via transcription and translation, to production of one or many different proteins. The sequence is said (stretching the truth) to code for the protein that eventuates from it. This stretches the truth because many other factors, such as RNA editing and RNA splicing play a role in determining the actual protein that results.
protein-coding RNA. RNA capable of producing protein via translation. This RNA, called "messenger RNA" (mRNA), is derived from protein-coding genes and may subsequently be altered by processes such as RNA editing and alternative splicing before being translated into protein.
protein structure. Several levels of protein structure are spoken of. “Primary structure” refers simply to the linear series of amino acids constituting the protein. This is a rather abstract notion of structure, since we’re not likely to see such a linear chain. “Secondary structure” refers to local folding, which yields forms referred to (very schematically) by such names as “alpha helix” and “beta sheet”. These structures in turn fold upon each other to form a larger mass — a “tertiary structure” — often a so-called “globule”. And, in multi-subunit proteins, various tertiary structures (either similar or distinct in composition) may be linked together in a “quaternary structure”.
proteome. The total collection of proteins in a cell, tissue, or organism at a particular time or under a particular set of conditions.
pseudogene. A DNA sequence usually related to a “normal” gene (perhaps via duplication), but — according to a perhaps rather too old definition — with one or more mutations, or a loss of associated regulatory DNA, preventing either its transcription or its translation into a protein. Pseudogenes were long thought to be nonfuctional. However, they are now known to serve diverse functions, including the generation of small, interfering RNAs and a role as decoys for microRNAs that would otherwise target the corresponding “normal” mRNAs.
reading frame. A way of looking at or, (if you are a transcription enzyme or ribosome) interacting with, a region of protein-coding DNA or RNA so that codons can be read “correctly”. Since each codon consists of three “letters” (nucleotide bases), it is necessary to know how any series of such bases should be divided into threes. There are three different ways this can be done (starting, say, with any of three successive bases), only one of which is the way that leads to the correct production of a protein. The correct way of looking is the reading frame. An open reading frame is a complete, protein-specifying reading frame, with a start codon, a series of codons corresponding to amino acids, and a stop codon.
RBP. See RNA-binding protein.
receptor. A protein (residing in cytoplasm or embedded in a cell membrane) to which a signaling molecule, such as a hormone, can attach. The result is typically a change in conformation of the protein, which in turn may lead to changes, sometimes dramatic, in the protein’s surrounding milieu. Compare nuclear receptor.
recombination. See genetic recombination.
regulatory factor. See gene regulation.
regulatory region. See gene regulation.
repetitive dna. DNA sequences, whether long or short, that occur repeatedly in the genome. They may occur in immediate succession, or with other sequences interspersed between them, and may also occur in inverted ("turned around end for end") form. Published figures vary, but repetitive sequences (including transposons) are often said to constitute somewhere between 40% and 50% of the human genome.
replication. See DNA replication.
repressor. A protein transcription factor with a negative effect on gene expression. (Compare activator.) A repressor may work in conjunction with one or more co-repressors or other factors. With or without co-factors, the repressor commonly blocks access to the gene promoter by RNA polymerase (the transcribing enzyme). Note: the terms “repressor” and “repress” can be used more generally to refer to any factor or process that helps to prevent expression of a gene.
retrotransposon. A type of transposon whose duplication and insertion in the genome is mediated by RNA. For example, certain RNA sequences originally transcribed from DNA are reverse transcribed back into DNA and inserted (often rapidly and in high numbers) into the existing DNA sequence. This is one of the methods by which the organism remodels its own genome.
ribonome. The entire collection of RNA molecules in the cell and organism at any one moment, along with the diverse proteins that associate with them.
ribonucleoprotein. Any of a vast variety of molecular complexes formed from RNA and protein molecules. In other words, a ribonucleoprotein is a nucleoprotein formed from RNA rather than DNA. The ribosome, one of the most elaborate molecular complexes in the organism, is an example of a ribonucleoprotein.
ribosomal RNA (rRNA). See under RNA.
RISC (RNA-induced silencing complex). A protein complex that plays a central part in RNA interference. The complex consists of several proteins together with a small interfering RNA (siRNA). The complex locates mRNA molecules containing sequences complementary to the siRNA, after which a protein in the complex cleaves the mRNA or otherwise damages it so as to prevent translation.
RNA. Ribonucleic acid. Like DNA, it contains a series of nucleotide bases (often thought of as "letters" of the genetic code). However, in RNA the uracil (U) base, or "letter", occurs instead of the thymine (T) of DNA. RNA is classically thought of as existing in three primary forms, each generated by one of the RNA polymerases: (1) mRNA (messenger RNA), produced from a protein-coding gene-template, preserves the gene’s code and is an intermediary between the gene and a corresponding protein. mRNA is normally single-stranded. (2) rRNA (ribosomal RNA) forms part of the structure of the molecular complex called a ribosome (of which there can be millions in a single cell), which in turn interacts with (“translates”) mRNA to produce protein. (3) tRNA (transfer RNA) brings to this process of protein production the actual amino acids corresponding to the successive codons of the mRNA being translated, adding each one to the growing protein molecule. More recently, a great variety of RNA types, both small and large, both protein-coding and noncoding, have been discovered. They play a major role in many epigenetic processes.
RNA cleavage. The "cutting up" of RNA molecules into smaller pieces. Those pieces may be protein-coding or regulatory, and the latter include both large regulatory RNAs and small regulatory RNAs such as micro-RNAs and small interfering RNAs.
RNA editing. The process by which particular nucleotide bases ("letters") are inserted, deleted, or altered in not-yet-fully-processed RNAs (“precursor RNAs”).
RNA gene. A gene that is transcribed to produce RNA that is not protein-coding. Such RNAs can perform regulatory roles in the cell.
RNA interference (RNAi). Regulation of gene expression — and especially the silencing of genes — by processes involving small RNA molecules about 21 - 25 nucleotide bases long. This RNA is known as “small interfering RNA” or siRNA. (More recent literature sometimes includes the activity of microRNAs under the heading of “small interfering RNA”.) A protein complex incorporating an siRNA and called a RISC locates mRNA molecules with a sequence complementary to that of the siRNA and proceeds to cleave the mRNA or otherwise prevent it from being translated. This is known as "post-transcriptional silencing" because it effectively silences genes (preventing the production of protein from them) only after the genes have been transcribed. However, more and more other roles are being discovered for siRNAs — for example, in DNA methylation, chromatin remodeling and in the activation of genes (“small RNA-induced gene activation”).
RNA polymerase. The enzyme (protein) that transcribes DNA (protein-coding genes, but also various noncoding sequences) into RNA. In humans, different RNA polymerases (I, II, and III) transcribe different sorts of DNA sequences. Actually, however, these molecules work only as part of continually changing molecular complexes, and are modified even during their passage along a specific DNA sequence.
RNA splicing. The process by which introns are removed from a pre-mRNA transcript and the remaining exons are joined together. Splicing occurs in the cell nucleus, preliminary to translation of the transcript (in the cytoplasm) or, in the case of noncoding transcripts, preliminary to the achievement of the functional RNA end-product. The nature of the splicing will determine what protein variant, or “isoform”, a protein-coding RNA actually leads to. The splicing is usually carried out by a large RNA-protein complex known as a “spliceosome”. See also alternative splicing and trans-splicing.
rRNA (ribosomal RNA). See under RNA.
segregation. In mitosis: The process by which, following DNA replication, chromosome pairs are properly distributed between the two diploid daughter cells of a cell division. In meiosis: The process by which, following DNA replication, the chromosomes of each chromosome pair are allocated (via two cell divisions) to different haploid gametes. By this means the two alleles of a given gene are separated, with each gamete containing one or the other.
sense strand. The strand of the two-stranded double helix that generally cannot be transcribed into protein-coding RNAs. It is called the “sense” strand because, if you look at the codons of its protein-coding regions, you will find that they directly embody the genetic code, whereas the corresponding nucleotide bases on the opposite (antisense) strand (from which mRNAs are generally produced), do not directly represent the genetic code. This “reversal” of expectation relates to the fact that the act of transcribing a gene into an mRNA works by the principle of complementarity: only when, by transcription, a codon of the gene is given its complementary form in an mRNA can it be read off as the standard genetic code for an amino acid. But it has essentially the same complementary from on the sense strand of DNA, so it can be read off from (“makes sense”) there, too. It will help to compare all this to the definition for antisense strand. Also see antisense transcription.
sequence. A contiguous group of nucleotide bases ("letters") in a DNA or RNA molecule. Particular sequences may be significant in many different respects. For example: (1) they can define specific locations recognizable by transcription factors and other regulatory molecules; (2) they can influence the structure and stability of the double helix and the associated chromatin; (3) they can influence the positioning of nucleosomes and in general affect the chromatin structure; (4) they can code for proteins; and (5) they can code for regulatory RNAs. "Sequence" can also refer to the linear chain of amino acids constituting the main structure of a protein. Such protein sequences correlate (more or less) with DNA and RNA sequences by means of the genetic code.
signal. A molecule, commonly a protein, that initiates a signaling process and therefore is often spoken of as having a meaning (or carrying a message) related to the outcome of the process. Outcomes include such things as cell migration, cell multiplication, cell death, change in cell shape, and gene expression or repression.
signaling. A broad term referring to various aspects of complex molecular communication within the cell and organism. Signaling pathways are coherent sequences of molecular interactions by which an initial interaction — say, the binding of a cell membrane receptor by a hormone signal — leads to a more or less defined result, or group of results, "downstream". One result, for example, might be the activation of a set of genes. Signaling pathways often branch, leading to an amplification or diversification of consequences in what is known as a signaling cascade.
signaling cascade. See signaling.
signaling pathway. See signaling.
signal transduction. The passage of a signal from one context to another, or its transformation from one molecular form to another. For example, a signal is transduced when the original signaling molecule — say, a hormone from a distant part of the body — binds to a receptor embedded in a cell membrane, causing the release of a different molecule or set of molecules in the cell interior, or cytoplasm.
small interfering RNA (siRNA). A small RNA 21-25 nucleotide bases in length that plays a key role in RNA interference. siRNAs are derived from double-stranded RNA molecules, which are often brought into cells by viruses. The double-stranded RNA is cleaved into small lengths, and a product of this cleavage is assimilated to a RISC protein complex, at which time the two strands of the RNA are separated and one is discarded. See further under RNA interference.
soma. A general term for all the somatic cells of the body.
somatic cells. All the cells of the body outside the germline. Somatic tissues are tissues made up of somatic cells. Such cells are normally diploid, in contrast to the haploid gametes. Somatic cells are sometimes referred to generally as “soma”.
spliceosome. A dynamic molecular complex consisting primarily of a few small, noncoding RNAs and over 300 proteins. It is instrumental in performing the intricate, extraordinarily sensitive task of RNA splicing.
stem cell. A more or less undifferentiated (nonspecialized) cell capable of dividing indefinitely as a stem cell, or else of differentiating into more specialized cell types. Embryonic stem cells are primitive stem cells in the embryo capable of differentiating into most of the cell types of the body. Adult stem cells, normally found in adult tissues, can differentiate into at least a few different cell types. And induced pluripotent stem cells result from the reversion of a differentiated cell to a pluripotent form. This last is usually used with respect to humanly engineered cases, but many reversions can also occur naturally.
supercoil. If you twist two strands around a linear wire core, you will have a double helix that coils, or spirals, around an axis represented by the wire. (The wire is invoked here only to identify the axis of the double helix.) If now you twist that whole arrangement so that the axis coils on itself, you have what is called a supercoil. Further, there are two directions in which you can perform this second level of twisting. One is "with" the original twist of the double helix (which yields positive supercoiling), and the other is against this original twist (negative supercoiling). If the ends of the two strands are fastened together so that they are not free to slide around each other, then negative supercoiling will tend to force the strands apart, or "open" them up, while positive supercoiling will have the opposite effect. (It’s best to try this with real cords!)
TATA box. A DNA sequence having these nucleotide bases ("letters") as its core: TATAAA. The TATA box is one of the several elements contained in gene promoters. Whereas it was once thought to be a more less canonical element of promoters, it is now believed to be present in less than 25 percent of human promoters. Recognition of the TATA box by the TATA-binding protein is an initial step in the formation of the pre-initiation complex.
thymine. See nucleotide base.
topoisomerase. Enzymes (proteins) that cut the strands of a DNA molecule and then reconnect the strands. The effect may be to release the tension of supercoiling or to untangle knots. Some topoisomerases cut just one of the strands of the double helix, allow it to wind or unwind around the other strand, and then reconnect the severed ends. Other topoisomerases cut both strands, pass a loop of the chromosome through the gap thus created, and then seal the gap again.
trait. See character.
transcribing enzyme. See RNA polymerase.
transcript. The RNA molecule that is the product of gene transcription. Transcripts begin as "primary" or "precursor" transcripts, which then can be spliced, edited, or otherwise transformed before (in the case of many RNAs) being translated into a protein.
transcription. The process by which an RNA polymerase (in cooperation with many other cellular elements) uses a DNA gene template to form an RNA molecule such as a messenger RNA (mRNA). The gene is said to have been "transcribed", and the RNA is a "transcript".
transcription factor. A protein that binds directly to a recognized DNA sequence, thereby playing a role in gene regulation. Transcription factors called activators may increase a gene’s expression, while repressors may decrease expression.
transcription factory. A locale within the nucleus within which regions of one or more chromosomes and various transcription-associated factors are thought to co-localize, facilitating the transcription of a particular set of genes.
transcriptome. The collection of all RNA transcripts (and sometimes their amounts) contained in a single cell, or collection of cells, or tissue, organ, organism, or species — or even a collection of species, as when one speaks of the transcriptome of all the microorganisms in the human gut.
transduction. In relation to signaling: the transformation of a signal, as when the binding of a hormone to a receptor at the cell membrane results in formation of a protein complex in the cytoplasm, which in turn carries out some function in the cell. In this case, the hormone "signal" is said to be transduced into the protein complex. This might be just one step in a multi-step signaling pathway.
transgene. A gene (or genomic segment), artificially constructed or taken from one organism, and inserted into the genome of a different organism. The term can also be used to refer to a transfer of this sort that is naturally achieved. A transgenic organism is one that has been subjected to such an insertion.
translation. The production of a protein from mRNA. This protein is often said to be "coded for" by the gene from which the mRNA was transcribed, but it is well known that diverse activities of the cell can result in any one of many - up to thousands - of different proteins being produced from a particular gene, or DNA sequence. The elaborate molecular complex most centrally involved in translation is the ribosome.
trans-splicing. The splicing together of entirely different gene transcripts to form a translation-ready mRNA. The genes may reside on the same or on different chromosomes. See also RNA splicing and alternative splicing.
transposon. A DNA sequence that can move within the genome. It may be cut from one place and moved to another — a recontextualization that may have profound functional effects — or it may be copied and inserted elsewhere, in which case it adds to the total content of the genome. By virtue of their copy-and-paste role, transposons can figure greatly in the creation of repetitive DNA. See also retrotransposon.
tRNA (transfer RNA). See under RNA.
tumor suppressor gene. A gene from which a protein is derived that helps to protect a cell against cancer. The protein may do this, for example, by preventing or damping cell division (cells that are becoming cancerous tend to divide without proper restraint) or by promoting cell death in the event of DNA damage.
upstream/downstream. DNA consists of the two strands of a double helix. The orientation of the chemical constituents of these strands gives a directionality to the strands and enables one to distinguish the two ends, which are referred to as the 5’ and the 3’ ends. The two strands of a double helix are oriented oppositely, so that the 5’ end of one strand is adjacent to the 3’ end of the other strand. Gene transcription typically proceeds from the 5’ end of the gene (that is, from the end of the gene closer to the 5’ end of the chromosome) toward the 3’ end. The stretches of DNA lying beyond the gene and toward the 5’ end of the chromosome are said to be "upstream" from the gene, while the DNA lying toward the 3’ end of the chromosome are "downstream" - in the direction of usual transcription. Promoters lie adjacent to their genes on the upstream side, where transcription begins. "Upstream" and "downstream" are also used with non-technical meanings and without particular reference to DNA, as when one refers to chemical reactions "downstream" from some initiating reaction.
uracil. See nucleotide base.
variant histone. A "non-standard" form of any one of the four different types of histone making up a nucleosome core particle. For example, the H2A.Z histone can substitute for the canonical H2A, with the effect of destabilizing the core particle and making it more susceptible to sliding. There are also variant linker histones.
Steve Talbott :: Glossary for “Evolution As It Was Meant To Be