Another interesting paper from
@arcinstitute. This one combines protein language models with directed evolution to rapidly engineer proteins.
Here is how it works, step-by-step:
1. Select the protein you want to engineer. Give the protein sequence to four different protein language models which, together, "score" the likely fitness of each amino acid mutation. Get a ranked list of 50-100 protein mutations that these models *predict* might improve function. (This is all done on the computer.)
2. Take the top 15 predicted "hits" and then synthesize the pairwise combination for each of them. For example, if a mutation at amino acid #100 boosted activity, but so did mutations at positions #120, #135, and so on, then you'd make a protein with each of these "double" mutations. If you took the top 15 hits, then this is only 105 total proteins to synthesize.
3. Make and test all the double mutants in the wet-lab. Measure the activity of each double mutant. So, for example, if you wanted to engineer a protein to be "brighter," then you would put each double mutant in a microplate well and measure this directly. (This step basically captures epistatic relationships; it helps the models figure out which mutations are beneficial or damaging.) Feed the single double mutant activity data to a neural network, called MULTI-evolve. The model extrapolates these data to infer *additional* mutations that might be synergistic, like combinations of 5-7 amino acid swaps.
4. Take the top three predictions for proteins with 5-, 6-, or 7 amino acid mutations, based on the neural network. Synthesize these proteins using a new DNA assembly approach, also reported in this paper, called MULTI-assembly. (The gist is that you anneal together a bunch of short oligos, each carrying one of the mutations, in a tube to reconstruct each of the full genes. This yields correctly built sequences 40-70% of the time.)
5. Finally, express the proteins in cells, measure their activities, and benchmark them against the wild-type protein.
The researchers used this method for various proteins. For one protein, called dCasRx (a Crispr protein that targets RNA instead of DNA), they used it to create a variant with 9.8-fold better activity, and they validated this across three different human genes.
You can also optimize proteins for two different properties at the same time. The authors used their method, for example, to engineer an antibody targeting CD122 for both binding affinity AND its expression yield in cells.
TL;DR This is a new way to speed up directed evolution. Instead of using random mutations to search through a huge amount of "biological space" (remember that a single protein, with just 100 amino acids, has 20^100 possible combinations), these researchers use AI models to navigate this search space for proteins more quickly. Can we use the same approach to make entire gene circuits?