Aug 3, 20236 min read

Directed Evolution: Going from Millions of Years to a Matter of Days

Updated: Aug 4, 2023

Evolution, in general, functions on a truly massive time scale. For example, between the time the first fish swam during the Cambrian period and the appearance of the first jawed fish in the Devonian approximately 100 million years had passed. And though the slow evolution of complex forms of life such as ourselves from the first combinations of amino acids into small peptides is an awe-inspiring achievement, nature took more than four billion years to make that achievement.

The ability to persist sequences of amino acids, proteins, in a form transferrable to progeny along with the fact that small changes in the sequence can produce significant changes in a protein’s function have facilitated that unimaginably long journey. For example, the FOXP2 gene has been shown to be critical in function with regard to the language abilities of humans. However, the differences between the amino acid sequences in the proteins produced in humans, other great apes, and mice is actually surprisingly small. Discarding a change that has only been observed in a family with a high rate of communication deficiencies, the only difference between the FOXP2 protein in humans and its orthologue in mice is three amino acids. In fact, the chimpanzee, gorilla, and rhesus macaque FOXP2 proteins are all identical to each other and carry only on difference from the mouse and two differences from the human protein… however, the difference in the communication capabilities between greater apes, humans, and mice are obviously profound.

While the Earth provides a dynamic environment for life, the timescale of large-scale changes tends to be in the millions of years. Evolution, the mechanism by which life responds to environmental changes both large and small, also operates on average, on this scale and it is known as Deep Time. This is no wonder as evolution is constrained to act through minute random changes within the sequences of the proteins that make life happen. In general, the natural evolution of a protein involves a few mutations in the base pairs, known as few Single Nucleotide Polymorphisms (SNPs), out of the several thousand that make up the code for just a small protein. Couple this with the observed rate of at which mutation occurs during DNA replication for mitosis being 1×10-9, with the fact that most mutations have no effect on the function of the protein, and the reasons for the time scale of functional gain and the stability of the genome both become readily apparent.

Now imagine the evolution of a new protein function not over millions of years, but rather in just a few days. A new function emerging not in response to an environmental change challenging life, but rather in response to a functional goal set by a human standing in a lab. This is not the start of a science fiction writer’s prophetic tale. This is Directed Evolution, a technique that is now commonly used to enhance existing and even create new proteins in labs all over the world.

Fundamental to genetics and evolution is the fact that genes contain the code for the sequence of amino acids that make up a single protein (in general). The reason SNPs within the code of a given protein generally have no effect on the function of that protein stems from what is termed degeneracy within the genetic code. Degeneracy is, while each amino acid is specified by a three-nucleotide sequence known as a codon, more than one codon exists for the same amino acid (though no codon specifies more than one amino acid). For example, the amino acid Arginine is indicated by six different codons: CGU, CGC, CGA, CGG, AGA, AGG.

We can describe rather well the process of protein synthesis from the transcription of the code onto RNA messengers to its assembly or “translation” where ribosomes string together amino acids as specified. However, past this point, our ability to predict the structure and function of a protein from its code is just recently arising north of nil. While I stated earlier that a SNP will most often have no effect on a protein’s function, if that change results in an amino acid change within the resultant protein the effect can be significant and is often deleterious.

This is where Directed Evolution takes center stage as it gives the ability to change the function of proteins that exist in nature, and even create totally new proteins that do not, despite this lack of understanding. The possibilities for advancement, not only in medicine, but also in industrial applications such as biofuels are endless. Endless may seem a bit of an overstatement, however, when you consider that if you made one example for every possible sequence of even a small protein 100 amino acids long, you would have made more proteins than there are atoms in the entire universe.

Fortunately, we don’t have work in those kinds of numbers since John Maynard Smith showed that functional sequence segments are actually clustered when proteins of the same length are arranged to minimize the distance between similar sequences. And this is necessarily so since, as Smith realized, evolution operates primarily on single point changes so life probably wouldn’t have evolved if this was not the case (at least not on time scales that would fit in with the age of the universe).

Philip Romero and Frances Arnold at Caltech write that though nature has been searching for optimal amino acid sequences to allow life to respond optimally to its environment for billions of years, still only an infinitesimally small fraction of those sequences has been explored. This, even considering function clustering leaves a playing field that is so large that it is hard to fathom when considering what we can achieve with new sequences. Life could only respond with changes to an existing sequence, however, in Directed Evolution, we select the starting sequence, and instead of a change in environment defining the goal, we do. It has become a common scene in bio labs to find a researcher to take a protein and decide that instead of binding to molecule A, she wants it to bind to molecule B. Then, following natural evolution’s methods, she introduces a few pseudo-random SNPs from the original protein again and again to produce a number of sequence variants. These variants are then examined to see which moved closer to her goal; became more “fit” to perform the task. The most fit is then selected as the parent protein for the next round of variant generation and the process iterates. Amazingly, most of the time it takes only 5 iterations to find the most fit sequence and end up with a brand-new protein often in just a few days. Now that’s a productivity gain.

I say pseudo-random because the SNPs may be random, but they are only made in a specific range in the sequence; one that is known to code for the part of the protein that is responsible for the function that is being evolved. One of the most interesting things I’ve learned about proteins is that they are actually modular. That is there are discrete functional units, called domains or moieties, within proteins that carry out specific actions such as binding to another molecule or causing the cleavage of another protein at a specific location. Amazingly, these domains can be “inserted” into other proteins thus adding that function to a protein that did not have it as it is found in nature. The resultant protein is referred to as a Chimera and it is created by splicing the DNA sequences of interest from source genes together to create a new one. The new gene is then inserted into the DNA of a bacterium, usually E Coli., and the bacteria then creates the protein through the normal mechanism described earlier. When the bacteria have proliferated to the point that there are a sufficient number of bacteria containing the new protein, the bacteria are lysed, spilling their cytoplasm into the solution in which they live. That solution is purified and refined until it contains only the new protein and voila, you have a vial of your very own new protein.

These practices are now common place in our biology labs all over the world. The proteins the scientists are creating have been used in many applications from bio fuels to medicine. The intersection of chimeric protein design and directed evolution may well end up being the revolution that gives us the ability to end cancer, viral and bacterial infections, and virtually any other pathogenic source of human suffering.

Directed Evolution: Going from Millions of Years to a Matter of Days

Recent Posts