Foresight in Genome Evolution

Lynn Helena Caporale. American Scientist. Volume 91, Issue 3. May/Jun 2003.

When Charles Darwin and Alfred Russel Wallace remarked on the great variation among the individual members of so many species around the world, from pigeons to beetles, they could only imagine what the source of such variation might be. They certainly never heard of the double-helical structure of DNA or the sequence of a genome. Clearly, a great deal has come to light about the basis of heredity since their lifetimes. When evolutionary theory incorporated the notions of genes and mutation, biologists assumed that mutations are completely random because, as University of Utah biologists W. Joe Dickinson and Jon Seger recently wrote, natural selection “lacks foresight, and no one has described a plausible way to provide it.” The process of evolution began to be described in terms of “random mutation” followed by natural selection. In this case, random can mean randomly distributed across a DNA sequence or random with respect to whether the change might be constructive or destructive, or both.

If the challenges that confronted genomes always were unprecedented and erratic, it would be hard to disagree with the statement that selection must lack foresight and mutations must be random with respect to their potential effect on survival. But certain types of challenges confront organisms over and over again. For example, our immune system is engaged in an arms race where it has to identify and dispose of pathogens while, in parallel, successful pathogens evolve to hide from our immune system. For challenges that recur, selection can favor mutation strategies that are better than random. This is not the same as knowing the precise genetic change that will overcome the challenge at hand, but mutation strategies that are better than random can provide a survival advantage compared to completely random mutation.

The probability of any given change in genome sequence, that is, of each mutation, reflects myriad biochemical properties encoded in that genome, from the accuracy of the proteins that copy and repair DNA to the availability of each of the DNA bases: adenine, thymine, guanine and cytosine (known as A, T, G and C). At a more local level, variations in the physical and chemical properties of a particular stretch of DNA sequence can have profound effects on the accuracy of the enzymes that rush past to copy it, at a rate of 80 to 500 bases each second, and on the systems that repair mismatched helices and DNA damage. Hence, the probability of distinct types of mutation varies intrinsically along a DNA sequence.

One striking example where the sequence context affects what mutations are likely involves the virus T4, which infects bacteria. A DNA letter insertion, deletion and two substitutions in a specific strip of the viral genome may appear to be random, but these mutations occur much more frequently than other changes. Close inspection reveals that the strands in that strip have a sequence that is almost a palindrome-a symmetrical sequence that reads the same from left to right on one DNA strand as from right to left on the complementary strand-and so each strand of the helix can loop out to pair with itself. The frequent mutations apparently result when the sequence forms a loop that is “corrected” to a more exact palindrome.

As our databases fill with the sequences of more and more genomes, the new “molecular” naturalists increasingly will discover variations among biochemical mechanisms that affect mutation, much as Darwin and Wallace found diversity among beaks and wings. Much like intrinsic variations in anatomical structures, intrinsic variations in the probability, type and location of genetic changes are subject to the pressure of natural selection, through the ways they affect the survival of their descendants. Darwin wrote, “[W]hy should we doubt that variations in any way useful to beings … would be preserved, accumulated, and inherited?” And so we can ask, why should we doubt that variations in the probability of particular types of mutations that may be useful to beings would thus be preserved, accumulated, and inherited?

A Genome’s Implicit Range

Some mutations occur with a probability that is orders of magnitude higher than other mutations and in fact can fairly be called predictable; given a routine combination of time and population size, such mutations almost certainly will turn up. It is common to refer to one particular sequence of DNA as an organism’s genome, and we expect that progeny of this organism normally inherit the same sequence, except when mutation intervenes. Yet within populations of only thousands of bacteria, certain mutations are predictable. For example, repeats such as CCCCCC … or AGTCAGTCAGTCATGC … increase and decrease in length with notable frequency as the two strands of the double helix slip and misalign when the DNA is being copied or repaired.

In other words, among a population of bacteria that can trace their inheritance back to a single individual, a range of lengths of each mutable repeat in the genome will inevitably arise. Such mutations not only are predictable but are also reversible-reversible because, as these repeats continue to slide back and forth, the parental length will reappear regularly among the population of descendants. Therefore, the genome encodes more than a specific repeat length that can mutate. It actually encodes a specific repeat length explicitly and a range of repeat lengths implicitly. To say it another way, a range of lengths is an inherited property of the genome.

Changes in the lengths of repeats are more than a biochemical curiosity; they have biological consequences. They can affect how often a gene is transcribed or even shift the reading frame within a gene, which is translated three letters at a time. Frequent slips disrupt a gene’s reading frame, damaging and occasionally resurrecting the function of the gene; a population of descendants will include both individuals with and individuals without active forms of the slippery gene. In Neisseria meningitidis, a species of bacteria that we carry in our throats and that can cause meningitis, the length of a string of Gs in a region promoting expression of the outer membrane protein porA affects the amount of the protein produced. Spacers of 11, 10 or 9 Gs lead, respectively, to high, medium or no detectable levels of expression.

The length of a repeat can also affect how sensitive a gene is to being turned on and off by specific molecules in its environment. In the case of one gene in the bacterium Escherichia coli, as a nearby string of Ts shortens in length from seven to three, the gene becomes less and less sensitive to one of the molecules that usually regulates its activity. Because their frequent changes in length tend to adjust gene activity, such repeats have been described as genomic “tuning knobs.”

Natural selection has acted both on the location of slippery repeats in genomes and on their propensity for slipping. In the bacteria Haemophilus influenzae and N. meningitidis, genes associated with slippery repeats such as CAATCAATCAATCAAT … or CTCTTCTCTTCTCTT … include those that are involved in evading hosts’ immune systems and in sticking to our various tissues; bacteria that can vary these properties quickly are likely to have a survival advantage and so be more “fit.” The amount of diversity some species can generate through different combinations of repeat lengths is impressive; one genome survey of Neisseria suggests that there may be nearly 100 genes that have the potential to vary this way.

Natural selection can affect not only how frequently the repeats of a given gene change in length; it also can have genome-wide effects. The overall probability of slips can change when the proteins involved in copying DNA mutate, affecting the future probability of specific types of mutations at myriad places throughout the genome for generations. Another more complex reversible mutation is the graphically named “flip-flop” system in which a segment of DNA is cut out, inverted and pasted back into the helix. The orientation of this invertible piece controls whether the adjacent gene will be on or off.

The range of genomes encoded implicitly through the many possible combinations of repeat lengths and flip-flops extends the range of conditions in which a population can live; because of this flexibility, descendants are not committed to evolve along a path that only the circumstances of the moment may favor. Thus, although a genome encodes a single sequence, it can encode the ability to generate a predictable diversity of genomes among its descendants, extending their potential range. Its progeny inherit multiple sequences, one explicitly and others implicitly.

Ramping Up Evolution

When we think about natural selection, we think about, for example, selecting for a change in the genome that makes a bird’s beak a little better at cracking seeds or enables a starving bacterium to digest an available sugar or to destroy a new antibiotic. But the potential reach of natural selection includes individual mutations that have genome-wide effects on the likelihood and type of subsequent genetic changes; for example, mutations in polymerase, an enzyme that copies DNA, can affect the tendency of repeats to shrink and grow.

Some genetic changes can be selected because they make it easier to add new information to a genome, over and over, whatever that information may encode. For example, bacteria have a selective advantage when they are able to share information with one another on mobile blocks of DNA, which may encode antibiotic resistance or the ability to take up and use new food sources. This ability is like building a bacterial Internet, in which genomes gain access to information that has evolved in other genomes. The enzymes and DNA recognition sites necessary to transfer genetic information between bacteria emerge under what has been termed “second order” selection, for they provide a selective advantage to generations of descendants that have access to myriad other traits on which selection can act more directly.

This movement of large blocks of DNA plays a role in bacterial evolution that dramatically rivals what we usually think of as mutations-changes in a single A, T, G or C along a strand of DNA. When a pathogenic strain of E. coli isolated from hamburger that sickened people in Michigan was compared with a harmless laboratory strain, there were 75,168 individual differences among their As, Ts, Gs and Cs, but the DNA that had come into the genome “sideways” through transfer of large blocks of DNA added up to 1.34 million base pairs that were unique to the pathogenic strain and that included information responsible for making people sick. Compared with bacteria that can survive starvation or new antibiotics only if they’ve hit a lucky change in their individual As, Ts, Gs and Cs, bacteria that have access to information that has evolved in other genomes have a clear selective advantage under stressful circumstances.

Pathogens and Hosts

Pathogens that are spread by mosquitoes and ticks have to float around in our blood so that they can be accessible to sipping insects that carry them to new “hosts.” But blood can be a dangerous place for a pathogen, the equivalent of standing in the full glare of the immune system’s searchlight. Because it takes the immune system a little time to get organized to recognize and attack a new pathogen, the microbe can hide by changing its coat regularly. Just when the immune system begins to attack invaders with, say, “blue” coats, some in the group have switched their coats to “yellow” and so survive. One pathogen that can exchange patches on the surface of its coat in this way is the spirochete (screw-shaped bacterium) that causes Lyme disease, Borrelia burgdorferi. The change in the surface protein does not rely on purely random mutation. Most changes distributed randomly in the genome would damage perfectly useful genes, whereas it is specifically changes in the coat protein that protect the spirochetes from the immune system. Natural selection has favored the evolution of biochemical machinery that recognizes portions of the genome coding for exposed regions in the surface protein and alters them. This changeable area is bracketed by conserved sequences-an exact repeat of 17 bases, TGAGGGGGCTATTAAGG-that apparently mark the zone subject to change.

The evolution of specialized machinery to abet adaptation is not confined to microorganisms. Animals such as cone snails, scorpions and snakes evidently also use focused variation to generate new components of their venom.

People, too, can change blocks of DNA around in a strategic way, for example when we generate a diverse collection of antibodies to fight the unknowable range of pathogens that may intrude on our bodies. In the genomes we inherit from our parents, our antibody genes are encoded in unassembled pieces, as if our parents went to sleep before putting together a holiday gift. The antibody-making kit we inherit includes a selection of pieces of DNA, called variable regions, each of which encodes the ability to bind to something different, such as a portion of a particular pathogen. We also inherit specific tags that mark these variable regions in our genome as well as enzymes that recognize the tags. In cells that later give rise to antibody– producing cells, these enzymes cut the variable regions out of storage, one at a time, and paste them into another location in the DNA where they can be actively expressed, as part of our antibody molecules, when our bodies encounter a pathogen that the particular variable region recognizes.

When the pathogen-binding variable regions are relocated, they are inserted into a location in the genome that has an interesting feature: DNA placed here experiences a higher mutation rate than DNA at other spots in our genome. Even within the relocated DNA, mutations are not distributed randomly. The higher chance of mutation is focused on the very parts of the pathogen-binding region that code for the cavity involved in holding onto a pathogen. In other words, a mechanism has evolved that focuses mutations at hotspots where such variation can generate diverse antibodies.

It is possible to change the location of the mutation hotspots experimentally by altering the DNA sequence of the variable regions. Such experiments show that the tendency to have high mutation rates in a particular region is embedded in the genome in two ways: Some contexts have a high mutation rate, and specific hotspot sequences tend to be mutated when placed in that context.

The infrastructure that creates the great variation in our immune system emerged through second-order selection-that is, not just by selecting for each antibody one by one, but rather by selecting for the ability to generate a whole repertoire of antibodies, with variable regions that have the potential to bind to and protect us against new pathogens that our ancestors never encountered. Our immune systems must have such an ability, because the pathogens are evolving too.

Not all the antibodies that the immune system generates will protect us, so in that sense our antibodies contain a great deal of random variation. Yet their diversity relies on mutations that are not randomly distributed throughout the genes encoding the antibodies. In evolution only some variations may be favored by natural selection, but that does not mean that we can assume that variations are all generated by completely random processes.

Biologists are only beginning to understand the biochemistry of how our immune system focuses mutations in useful places. An enzyme that scans variable regions in their relocated homes may recognize and damage specific hotspot sequences in the DNA. The damaged sequences may then be “repaired” in a sloppy way, resulting in targeted mutations. This sloppy repair process draws on mechanisms that the genome uses regularly to repair other DNA sequences that are somehow damaged, for example because of chemical effects or exposure to radiation.

In fact, the rate of mutation in a genome is affected by the interaction of many biochemical activities. As Evelyn Witkin, formerly of Rutgers University, put it, “[T]he prevailing notion [used to be] … that mutations were instantaneous events … [T]he mutagen went ‘zap!’ and that was that.” However, the location and probability of distinct mutations depend on both the genome’s sequence and on the cell’s specific biochemical environment-including the full complement of genes expressed-at the time genetic changes take place.

Genomes vary a great deal in their sensitivity to specific types of damage. Some genomes survive in conditions that seem to be inescapably damaging. For example, the bacterium Deinococcus radiodurans was first discovered living in an irradiated can of meat. The stability of the DNA of organisms living in hot springs also depends on mechanisms that repair and protect their DNA; otherwise, they would risk multiple mutations per gene every generation. But even outside what seem to be challenging environments, the mutation rate of any genome would be high without mechanisms that repair spontaneous damage, such as when bases fall off or lose pieces at body temperature.

If the ability to promote variation can evolve at sites ranging from those that encode bacterial coats to those encoding portions of our immune system, such facilitated adaptation is likely to be important in other locations that still await discovery.

Theme and Variations

Sometimes the best way for an organism to evolve a new ability is to make a copy of a gene that already is in the genome and to tinker with the duplicate. Starting with a gene that already does something useful is like starting close to the finish line. Families of genes that are related to one another have evolved this way, by copying and tinkering, thereby directing the capabilities already evolved in one gene toward a new target. For example, if a gene encodes the ability to detect one color of light, a gene that encodes the ability to detect a different color of light could evolve through small variations in a copy of the original light-detector gene. Some gene families have as many as 1,000 members; the process of duplication and alteration is an efficient route to useful new genes.

Does the “tinkering” that generates new members of gene families involve altogether random changes, or is it possible that, much like the mutation hotspots in antibodies’ pathogen-binding regions, selection favors hotspots for change within particular portions of the duplicated genes? Perhaps parts of a duplicated gene that direct it against a specific function are more likely to change, whereas the shared function of the gene family members may be shielded from mutation. Consistent with this idea, the rates of distinct types of genetic changes are in fact unevenly distributed across the mammalian genome, but the link between the uneven distribution of mutations and its biological sources and effects remains to be investigated.

Another interesting facet of genomes is the fact that there is more than one way to encode almost every amino acid. The genetic code is called degenerate because there are 64 possible three-letter units of DNA available to code for only 20 amino acids. This fact allows additional information to evolve in a DNA sequence along with the information specifying an amino acid sequence. For example in the spirochete that causes Lyme disease, a unit of five amino acids (glutamate-glycinealanine-isoleucine-lysine) is repeated in coat proteins; an enzyme recognizes the DNA that encodes this repeated unit in order to vary patches that allow the Lyme bacterium to change its coat. Because of the degeneracy of the genetic code, the five amino acids can be encoded, theoretically, in nearly 200 different ways, yet only one of these 200 choices is used. If the DNA sequence changes to one of the nearly 200 others, the amino acids may not change, but the DNA patch-varying machinery will no longer be able to be directed to the right spot in the genome where it can act to change the spirochete’s coat.

Foresight Emerges via Experience

The number of distinct ways any genome might mutate randomly is vast. But the pressures of natural selection, generation after generation, can increase the chance that individuals with a fortuitous tendency to make certain biologically useful mutations survive. In other words, natural selection can act on variations in the probability of different types of mutations in different places in the genome much as it can act on variation in beaks and wings. For example, biochemical systems that enable the movement of intact pieces of DNA between bacteria have been selected as more useful than random changes in DNA sequence. Genomes that intrinsically tend to make changes that turn out to be more adaptive tend to have more surviving progeny, generation after generation, than randomly mutating genomes.

I want to emphasize that saying that, through natural selection, mutation can become no longer random is not the same as claiming that a genome will know that if it replaces a particular A with a G, it will be able to digest a specific sugar. Certainly, what I have said does not imply that all mutations are not random with respect to function, or that all mutations are helpful.

Still, some genomes have evolved that encode not just one “explicit” sequence, but rather a cloud of what I have called implicit genomes, which gives their progeny access to a combinatorial assortment of properties in a way that is reversible across generations. This endows descendants with the ability to tolerate a range of environmental conditions that is wider than the range of the explicit genome. Further, genomes can gain access to additional, intact information through mechanisms that move DNA within and between genomes.

Genomes actively generate diversity, for example through the cutting and pasting of chromosomes during the formation of mature sperm and eggs, for there is an advantage to diversity in itself. The community will be more likely to survive the sweep of a new pathogen if individuals are different from one another. Similarly, if the food supply or the environment changes, diversity can protect the population against the new pressures.

Genomes cannot predict the future any more than we can, but based on what has happened to us in the past, genomes, like us, learn about what is likely to happen in the future. Mechanisms that diversify and stabilize the genome themselves feel the pressure of natural selection. The ability to anticipate recurring challenges is in fact a major challenge of evolution.

From the variation in bacterial surface proteins to the vertebrate immune response, it is clear that a great deal of genetic change is generated in a way that is better than random in its potential effect on survival. Indeed, as I have argued, some potentially useful mutations are so probable that they can be viewed as being encoded implicitly in the genome. As we examine our genomes and those of our fellow creatures, I anticipate evolutionary theory will evolve to include the understanding that under selective pressure, the probability of different classes of mutation can change, with consequences for survival.