Difference between revisions of "Team:Heidelberg/Design"

Line 163: Line 163:
 
which introducing random errors in a parental protein sequence/sequence pool; note, that
 
which introducing random errors in a parental protein sequence/sequence pool; note, that
 
errors can also be limited to selected sequences windows </li>
 
errors can also be limited to selected sequences windows </li>
<li> We select sequences using <a href="https://2017.igem.org/Team:Heidelberg/Software/DeeProtein" class="innerlink">DeeProtein</a>, a <a href="https://2017.igem.org/Team:Heidelberg/Software/DeeProtein#nn101">deep neural network</a> created and trained on 10
+
<li> We select sequences using <a href="https://2017.igem.org/Team:Heidelberg/Software/DeeProtein" class="innerlink">DeeProtein</a>, a <a href="https://2017.igem.org/Team:Heidelberg/Software/DeeProtein#nn101" class="innerlink">deep neural network</a> created and trained on 10
 
million protein sequences representing more than 800 protein classes. DeeProtein is able to
 
million protein sequences representing more than 800 protein classes. DeeProtein is able to
 
robustly infer sequence-function relationships from raw sequence data. GAIA, in turn,
 
robustly infer sequence-function relationships from raw sequence data. GAIA, in turn,

Revision as of 03:15, 2 November 2017

Our design
Interfacing In Vivo and In Silico Directed Evolution As Novel Engineering Paradigm
Our Project Design as animated videos
Video 1:
Background and general project design
Video 2:
Details on in vivo directed evolution with PACE and PREDCEL

Background

„Harnessing evolution for the development of novel proteins and biomolecules for human benefit“ - This naïve vision stood at the beginning of our this year’s iGEM project. But how should we do that? While doing our literature research, we came across an exciting method named PACE - phage assisted continuous evolution. This method was invented by Kevin Esvelt and David Liu at Harvard university in 2011 (Esvelt et al, Nature, 2011). The idea behind this method is rather simple: Mimic natural evolution in bioreactor, accelerate it and direct it towards a specific goal. More precisely, PACE implements a fully closed, extremely fast evolution cycle comprising of replication, mutation and selection to enable in vivo directed evolution of proteins. In brief, the protein to be evolved is encoded by an M13 bacteriophage and transferred from one E. coli host cell to the next by means of phage replication and propagation. Importantly, the E. coli hosts express mutator genes (Badran et al, Nature Communictaions, 2015) to increase the phage mutation rate during replication, thereby creating a highly diverse gene pool. Finally, synthetic circuits in the E. coli host cell couple phage propagation efficiency to the fitness (i.e. function) of the protein to be evolved, by controlling the expression of an essential phage gene. Details on the PACE background can be found on the PACE subpage. By chance, we had the opportunity to meet Kevin Esvelt, the PACE co-inventor, who gave a talk in Heidelberg on June 1 st 2017, i.e. before we started our wet lab work. Kevin was kind enough to visit our team and discuss his PACE method in detail. Besides its power for directed protein evolution and conceptual beauty, we identified two major limitations of the current PACE setup:
  1. It requires a complex, custom-made bioreactor (Figure 1) including sophisticated flow control setup, which is challenging to assemble and difficult to run robustly due to multiple failure points (potential phage contamination; risk for phage-washout if flow rates are too high; Host cell biofilm formation proper phage evolution)
  2. PACE is almost entirely limited to improving an already existing protein function. To evolve truly novel functions, evolutionary stepping stones are required, which can be so far created only by slowly adapting selection pressure towards the activity to be evolved (Pu, Zinkus-Boltz & Dickinson, Nat. Chem Biol, 2017). In practical terms, this means that evolving novel functions requires an extremely complex experimental setup comprising multiple, fine-tuned synthetic selection circuits applied in sequential order over many days. In most cases, it evolving a novel functionality from scratch remains a practically impossible task to be implemented in PACE.

As result its complexity and various failure points, only three groups worldwide have successfully established PACE until today. Of note, two of these groups are headed by the PACE inventors Kevin Esvelt and David Liu themselves, while the third group is run by Prof. Liu’s former post doc Bryan Dickinson. In the course of our iGEM project, we were in contact with all three of these groups. We are extremely grateful for their advice and for them sharing their constructs with us, both of which was extremely important for us to get going.
Figure 1: Bioreactor and flow setup required to perform PACE
Of note, in iGEM, only a single team has accepted the challenge and tried to set up PACE, namely the TU Dresden 2015 team. The team thereby constructed a first PACE bioreactor prototype and tested the flow controls. However, due to the lack of time, they could not run real PACE experiments. The team concluded: “All in all, the initial experiments helped us to grasp the idea of how such experiments can look like”

Motivation, Initial Design & Engineering Principles

The goal of our work was to make in vivo directed evolution of proteins faster, easier, more robust and expand its application scope. The primary, novel application we envisioned was the in vivo directed evolution of improved and/or novel enzymes, which would be of enormous benefit for diverse industries (e.g. chemical/pharmaceutical production, biomaterial production etc.), but remains particularly challenging with conventional directed evolution or rational engineering strategies. To reach this ambitious goal, we had to leave existing paradigms behind and rethink the concept of directed evolution. As result, we came up with a novel, highly innovative engineering concept: interfacing in vivo and in silico directed evolution by coupling PACE to deep learning of protein sequence –function relationships (Figure 2). The idea behind this concept is to fast-forward directed evolution using an intelligent algorithm. More precisely, our algorithm should enable us to skip the otherwise required evolutionary stepping stones during directed in vivo evolution by pre-evolving proteins with novel, desired target functions in silico.
Figure 2: Interfacing in silico and in vivo directed evolution as novel engineering paradigm.
As side note: As you will see below, we finally replaced the complex PACE-mediated evolution by the much easier PREDCEL protocol developed by us later on. But first things first.

The engineering approach our team used to reach our goal was to first divide the in silico and in vivo evolution problems into their basic components, work them out independently and validate them thoroughly before finally combining them into a united workflow. We quickly realized that the general components required for the in vivo and in silico directed evolution systems conceptually identical: Evolution always consists of a closed cycle running through three major steps: (i) reproduction, (ii) mutation and subsequent (iii) selection of sequence variants. The corresponding in vivo evolution counterparts in PACE are well described and hence easy to identify:
  1. Reproduction is achieved by propagating M13 bacteriophages on E. coli host cells.
  2. Mutation is achieved during phage replication by expression of mutagenic genes.
  3. Selection is induced using synthetic circuits that couple the expression of geneIII (an essential M13 gene) to the function (“fitness”) of the phage-encoded transgene to be evolved.

All three steps were tested individually by (i) propagating geneIII-deficient M13 phages on E. coli host cells in a PACE apparatus in the absence of selection pressure and mutagenesis; (ii) overexpressing mutagenic genes in E. coli and investigate the spontaneous formation of resistant colonies due to the mutagenesis-induced genetic drift; and (iii) comparing the propagation of phages with known transgene “fitness” on corresponding synthetic circuits

Figure 3: Overview of the PREDCEL protocol
Finally, we all three steps were successfully united into a completed PACE cycle to evolve improved split T7 polymerase variants. The corresponding in silico evolution cycle counterparts were practically not existing and had to be invented from scratch. To this end, we designed an innovative software suite named AiGEM, for Artificial Intelligence for Genetic Evolution Mimicking. AiGEMs core functionalities thereby precisely correspond to individual steps of a typical directed evolution cycle (see AiGEM website for details on the implementation):
  1. We mutate parental sequences using GAIA, the Genetic Artificial intelligence algorithm, which introducing random errors in a parental protein sequence/sequence pool; note, that errors can also be limited to selected sequences windows
  2. We select sequences using DeeProtein, a deep neural network created and trained on 10 million protein sequences representing more than 800 protein classes. DeeProtein is able to robustly infer sequence-function relationships from raw sequence data. GAIA, in turn, converts the DeeProtein analysis into a genetic fitness score and applies a threshold to select the surviving sequence variants
  3. Reproduction is then simply achieved by starting a new computational cycle with the evolved sequence variants from the previous in silico evolution round.

  4. Importantly, as part of our integrated human practices activities, we added a fourth step to our in silico evolution software, which does not have a counterpart in the in vivo directed evolution world:
  5. We created SafetyNet, a DeeProtein based web application to infer “sleeping” hazardous any parental (and evolved) sequence. Thus, SafetyNet safeguard the directed evolution process and monitors and avoids the unintentional evolution of hazardous or dangerous protein sequences

To validate our in silico evolution software, we designed two complementary wet lab experiments, both of which were successfully performed.
  1. We aimed at demonstrating (and eventually successfully showed), that DeeProtein can infer the impact of GAIA-induced mutations on protein function (i.e. the degree of activity). To this end, we designed ~30 beta-lactamase mutants using GAIA and correlated the DeeProtein-based “activity” scores with the in vivo measured minimum inhibitory antibiotics concentration (MIC).
  2. We aimed at demonstrating (and eventually successfully showed) that we can evolve novel functions in silico, practically from scratch. To this end, we used GAIA transfer beta- galactosidase activity onto a beta-glucoronidase parental sequence by in silico evolution.

Improved, finalized Design – Learn, Evolve, PREDCEL!

While setting up the PACE apparatus and protocols, we ran into a serious of recurrent problems, most importantly phage washout (i.e. complete phage loss after few hours of continuous evolution) and phage contamination (due to the complex flow setup, contaminations are difficult to avoid). We also realized, that the PACE apparatus was very static, strongly limiting its applications scope. For evolution of proteins inducible by chemical for instance, one would ideally want to instantly alternate between evolution in presence of the chemical inducer and as well as corresponding selection strains. This is impossible in a continuous flow setup. Therefore, we sought out to create a more simple and more flexible PACE alternative, which can be quickly implemented by any trained biologist without the need for special equipment or knowledge. Inspired by a recent publication on phage-mediated selection of gene libraries (Brödel et al, Nature Communication, 2016; Brödel et al, Nature Protocols, 2017), we created a simple protocol named PREDCEL (for phage-related discontinuous evolution), which uses simple batch-wise, manual transfer of the evolving phage gene pool (Figure 3). In essence, PREDCEL reduces the entire complexity of PACE to simple, standard laboratory procedures, all the while gaining entire flexibility to easily swap conditions (strains, inducers etc.) between individual rounds of evolution. We also provide an optogenetic tool for simple adaptation of the selection pressure during PREDCEL runs.

To validate our PREDCEL method, we performed directed evolution of split T7 polymers towards improved auto-reassembly (i.e. protein interaction of the split domains). Having laid the required, solid foundation, were then able to outline a simple, fully generalizable workflow for directed evolution enzymes and successfully tested it by re-directing the catalytic activity of a promiscuous cytochrome, Cyp1A2, towards a naturally unfavored product. Key to this workflow is MAWS 2.0, a software originally introduced by iGEM Team Heidelberg 2015, enabling the in silico design of aptamers and corresponding riboswitches (Figure 4). We demonstrate the functionality of the improved MAWS 2.0 software, by successful design and validation of a riboswitch detecting an organosilicon product synthesized by us in vitro using an engineered cytochrome C. These riboswitches are then used for detecting the desired reaction product and mediate selection pressure during evolution by controlling the expression of M13 geneIII essential for M13 phage reproduction (Figure 4).
Figure 4: A fully generalizable Workflow for Engineering of improved and novel enzymes using AiGEM-mediated in silico evolution interfaced with PREDCEL-mediated in vivo evolution.


In summary, we implemented a unique in vivo – in silico evolution cycle highly accelerating the directed evolution of protein for human benefit. Our standardized PREDCEL protocol, PREDCEL parts collection, online Toolbox guide and accompanying RFC smoothly deliver our evolution toolbox to the end user. Our finalized project design is summarized in Figure 5 (a fully interactive version of this project overview figure is available under project overview). Our Results Page guides you through all subprojects carried out to establish, validate and apply our evolutionary toolbox. Taken together, we provide a foundational advance by introducing an innovative in vivo and in silico evolution interface as novel engineering paradigm to synthetic biology.