BackgroundDarwinian evolution is an enormously powerful concept that drove biology towards astonishing diversity, complexity and beauty. An efficient way to harness this power would highly accelerate the engineering of novel biomolecules for human benefit and fundamentally advance synthetic biology.
Towards this vision, our team developed a comprehensive evolution toolbox comprising standardized parts, protocols, interactive models and AI-based software, all assembled into a unique workflow tightly interconnecting in vivo and silico directed evolution.
Our project grounds on the phage-assisted continuous evolution (PACE) method (Esvelt et al, Nature, 2011), which couples the survival of quickly evolving M13 bacteriophages carrying a gene-of-interest to directed selection within E. coli hosts in a customized bioreactor. Despite its powerful concept, the application of PACE has thus far been hampered by its need of a continuous culture and flow setup demanding costly and highly sensitive hardware strongly limiting its application scope.
Our PREDCEL (phage-related discontinuous evolution) toolboxTo reduce complexity and increase flexibility of the PACE method, we created PREDCEL (for phage-related discontinuous evolution), a simple, low-cost in vivo evolution protocol completely independent of any specialized equipment. In PREDCEL, the phage gene pool is simply propagated batch-wise on highly mutagenic E. coli selection strains grown in standard flasks.
Our self-contained PREDCEL toolbox comprises: (1) A golden-gate cloning standard for simple production of geneIII-deficient M13 phages carrying any gene-of-interest; (2) An accessory plasmid construction kit for selection of the phage gene pool via conditional geneIII complementation in E. coli; (3) An optogenetic selection stringency modulator (Fig 1.)
Figure 1: Increase of Phage Propagation under Blue Light using pBLind-geneIII CassettePhage titers of SP Opto EL222 and a non-binding variant propagated on AP_light in the dark and under blue light irradiation after four passages were determined by plaque assays. Host cell cultures infected with SP Opto EL222 and cultured under blue light conditions demonstrated a more than 3-fold higher phage titer than the culture cultivated in the dark (left side). The infection with phages containing the non-binding variant of EL222 exhibited no significant difference in phage titer (right side). It was notable that for this variant the phage titer was slightly decreased upon light irradiation. The respective plaque assays are shown below the bar chart.
Toolbox Characterization and Integrated ModelingTo lay a thorough foundation for in vivo evolution with PREDCEL, we decided to apply bottom-up engineering and carefully characterize the individual components of our system in isolation first. Employing T7 polymerase as simple and modular platform, we established the required protocols for (i) cloning and generation of transgene-encoding phages, (ii) their propagation and selection on custom-made E. coli selection strains as well as (iii) accelerated in vivo mutation to expedite the evolutionary process. ODE-based modeling and corresponding computer simulations were thereby used to quantitatively investigate and optimize the parameters of our experimental system, i.e. phage propagation times, (Fig 2.) mutagenesis-controlling inducer/inhibitor concentrations and medium consumption.
Figure 2: Basic logarithmic phage and E. coli titer plot with 100 % wildtype fitness.The blue lines correspond to the different E. coli populations. Exponential growth of E. coli and constant fitness of 100% of the wildtype, equal in all phages was assumed. After ten minutes infected E. coli start producing phage, corresponding to a stagnation in infected E. coli and an increase in phage concentration.
Apart from such simple-to-control parameters, our models also suggested that the fitness of the initial (parental) phage gene pool regarding the function to be evolved strongly impacts the robustness as well as speed of PREDCEL-mediated evolution. We quickly realized that evolving truly novel functions on a given protein would be very challenging with any in vivo directed evolution method including PREDCEL and PACE, as they always depend on initial activity to build upon.
Meet AiGEM, our Artificial Intelligence for Genetic Evolution MimickingTo address these profound limitations and enable true evolutionary jumps from no/minimal activity towards high activity, our team created AiGEM, the Artificial Intelligence for Genetic Evolution Mimicking software. AiGEM significantly reduces the time and cost required for directed evolution of functional proteins by pre-optimizing the parental gene pool for a selected function in silico. The heart of our AiGEM software is DeeProtein, a deep neuronal network trained on ~10 million protein sequences and able to infer sequence-function relationships from raw sequence data with high accuracy. To validate DeeProtein’s predictive power, we generated a set of ~30 beta-lactamase mutants and found that the observed catalytic efficiencies (i.e. maximum antibiotic inhibitory concentrations) were reflected in the corresponding DeeProtein activity scores. (Fig. 2)
Figure 3: The DeeProtein classification score for screened \(\beta\)-lactamase variants correlates with the MIC of Carbenicillin.The average DeeProtein classification scores assigned to samples in the MIC-score bins are depicted as black dots. The red line is the fitted linear model. Samples assigned with a high classification score tend to sustain higher carbenicillin concentrations, whereas a low classification score is assigned to variants with a low MIC.
Figure 4: Comparison Between the Wildtype Proteins and GUS_T509LOur assay demonstrated that the wildtype GUS has no activity on a GAL substrate. The mutant predicted by GAIA however, exhibits extraordinary enzymatic activity on the GAL substrate.
Then, by interfacing DeeProtein with our genetic algorithm GAIA, we – for the first time - established a fully closed, in silico evolution cycle driven by an intelligent network and able to pre-optimize proteins for a selected activity. In other words, AiGEM can fast-forward directed evolution by means of intelligent computing. To demonstrate its predictive power, we used AiGEM for in silico evolution of novel beta-galactosidases from human beta-glucuronidase parental sequences. (Fig. 3)
Our ApplicationsHaving laid out and validated our concept of an interfaced in vivo and in silico directed evolution, we finally aimed at more deeply exploring its potential for applications in basic research and biochemical production.
Improving selected protein-protein interactionsEngineering selected protein-protein interaction is a major goal in synthetic biology and used, e.g. for the construction of split reporter-bases biosensors or improved antigens in context of vaccine development. Using split T7 polymerase as example, we aimed at studying whether we can improve protein-protein interactions (in this case the auto-reassembly of the two split fragments) in PACE and PREDCEL. Following only 3 days of evolution, we obtained numerous, recurrent split T7 mutants in the gene pool. Remarkably, some mutations we positioned right at the prospective interface of the two T7 fragments, hinting at the successful evolution of improved split T7 variants.
Figure 5: Mutational pattern of the evolved split T7 variant.Following three days of in vivo evolution with PACE (Link PACE page), a plague assay was performed and the split T7 insert of five individual phage clones was analyzed by sanger-sequencing. We observed a recurrent mutation (T877P) in three out of the five clones, suggesting an evolutionary advantage (i.e. increased fitness) of the corresponding split T7 mutant as compared its non-mutated counterpart.
Engineering novel enzymes for organosilicon productionFinally, we investigated the potential of our evolution toolbox for the development of novel enzymes for biochemical production. Organosilicons are a molecular class of high relevance for industry and with great potential for drug development as we learned during discussions with experts in the field. Although biological systems do not employ carbo-silicon chemistry in nature, promiscuous enzymes such as cytochromes exist, which are in principle capable of catalyzing carbon-silicon bond formations albeit with low efficiency.
We developed a complete PREDCEL workflow capable of optimizing cytochromes for a specific purpose by redirecting their catalytic activity towards a desired, but naturally unfavored reaction product. The workflow employs riboswitches for detection of the enzymatic reaction product and corresponding variant selection, which we design fully computationally in our MAWS (Making Aptamers Without Selex) 2.0 software, an improved MAWS algorithm originally introduced by iGEM Team Heidelberg 2015. We show that the in silico predicted riboswitches are capable of specifically detecting selected, organosilicon compounds
Figure 6:Light emission detection of the NanoLuc reaction for different riboswitch activators and concentrations. Addition of compound (3) to the reaction resulted in increased enzyme activity as indicated by the two bars on the left-hand side compared to the compound (1) reaction
Figure 7:Sequencing results of eight plaques after 6 iterations of the PREDCEL workflow with MP4 illustrating recurrently appearing mutations of the CYP1A2 gene. Recurrent mutations with amino acid exchange are indicated in red, without amino acid exchange in orange. Single mutations with amino acid exchange are shown in yellow, and without amino acid exchange in blue.
Figure 8:Gas chromatogram for the reaction of educt (2) and (5) to the product (4). 9.2 minutes retention time indicates product formation.
Figure 9:Mass chromatogram shows the breakdown of the product (4) ethyl 2-(dimethyl(phenyl)silyl)propanoate. The product itself corresponds to a mass of 236 dalton
Integrated Human PracticesThe most powerful technology can only have a positive impact on humanity, if it is widely accepted, safe, and applied in a responsible manner with the aim of making our world a better place to live. No single person or expert group is smart enough precisely foresee the impact – positive and negative - of a developing technology. Therefore, we consider integrated human practices of particular importance in context of foundational advance projects like ours, which aim at developing technologies with the potential to shape our future - and the future of our (future) kids.
From the early beginning on, we thus openly discussed the aforementioned concepts and ideas with experts from the different fields and reached out to the broad public to listen to their hopes and concerns. The extensive feedback we received pushed us to address three major, occurring issues: (i) Making our technology safe, (ii) stimulating its responsible use and (iii) apply it to address urgent human needs. (i) To safeguard in vivo evolution experiments, we created SafetyNet, which is part of our AiGEM (Artificial Intelligence for Genetic Evolution Mimicking) software. SafetyNet checks any user supplied input sequence for “sleeping” hazardous potential. Thereby, we can strongly decrease the risk of evolving hazardous proteins unintendedly. (ii) We integrated a questionnaire (“Ready-to-PREDCEL?”) into our evolution toolbox guide to stimulate the responsible use of our technology. (iii) In the wet lab, we chose projects with the highest ecological/medical potential, and focused in the application of our toolbox on the engineering of enzymes for ecofriendly synthesis of organosilicons, e.g. as novel pharmaceuticals.