Difference between revisions of "Team:Heidelberg/Design"

Line 2: Line 2:
 
{{Heidelberg/navbar}}
 
{{Heidelberg/navbar}}
 
<html>
 
<html>
 +
<style>
 +
.innerlink {
 +
color: #fbb74b !important;
 +
font-weight: 900 !important;
 +
}
 +
 +
  .innerlink:hover {
 +
text-decoration: underline !important;
 +
}
 +
</style>
 
<div style="background-color: white !important; padding: 0 !important; padding-top: 100px !important; margin: 0 !important;">
 
<div style="background-color: white !important; padding: 0 !important; padding-top: 100px !important; margin: 0 !important;">
 
   <div class="page-title" style="background-color: white !important;">
 
   <div class="page-title" style="background-color: white !important;">
Line 51: Line 61:
 
synthetic circuits in the E. coli host cell couple phage propagation efficiency to the fitness (i.e. function)
 
synthetic circuits in the E. coli host cell couple phage propagation efficiency to the fitness (i.e. function)
 
of the protein to be evolved, by controlling the expression of an essential phage gene. Details on the
 
of the protein to be evolved, by controlling the expression of an essential phage gene. Details on the
PACE background can be found on the PACE subpage.
+
PACE background can be found on the <a href="https://2017.igem.org/Team:Heidelberg/Pace" class="innerlink">PACE subpage</a>.
 
By chance, we had the opportunity to meet Kevin Esvelt, the PACE co-inventor, who gave a talk in
 
By chance, we had the opportunity to meet Kevin Esvelt, the PACE co-inventor, who gave a talk in
Heidelberg on June 1 st 2017, i.e. before we started our wet lab work. Kevin was kind enough to visit our
+
Heidelberg on June 1 st 2017, i.e. before we started our wet lab work. Kevin was kind enough to <a href="https://2017.igem.org/Team:Heidelberg/Human_Practices" class="innerlink">PACE subpage</a> visit our
team and (link to human practices) discuss his PACE method in detail. Besides its power for directed
+
team</a> and discuss his PACE method in detail. Besides its power for directed
 
protein evolution and conceptual beauty, we identified two major limitations of the current PACE setup:
 
protein evolution and conceptual beauty, we identified two major limitations of the current PACE setup:
 
Video 1: Background and general
 
Video 1: Background and general
Line 61: Line 71:
 
Video 2: Details on in vivo directed
 
Video 2: Details on in vivo directed
 
evolution with PACE and PREDCEL
 
evolution with PACE and PREDCEL
 
+
<br>
 
<ol class="content">
 
<ol class="content">
 
<li>It requires a complex, custom-made bioreactor (Figure 1) including sophisticated flow control
 
<li>It requires a complex, custom-made bioreactor (Figure 1) including sophisticated flow control
Line 75: Line 85:
 
functionality from scratch remains a practically impossible task to be implemented in PACE.</li>         
 
functionality from scratch remains a practically impossible task to be implemented in PACE.</li>         
 
</ol>
 
</ol>
 
+
<br>
 
As result its complexity and various failure points, only three groups worldwide have successfully
 
As result its complexity and various failure points, only three groups worldwide have successfully
 
established PACE until today. Of note, two of these groups are headed by the PACE inventors Kevin
 
established PACE until today. Of note, two of these groups are headed by the PACE inventors Kevin
Line 87: Line 97:
 
</div>
 
</div>
  
Of note, in iGEM, only a single team has accepted the challenge and tried to set up PACE, namely the TU
+
Of note, in iGEM, only a single team has accepted the challenge and tried to set up PACE, namely the <a href="https://2017.igem.org/Team:TU_Dresden" class="innerlink">TU Dresden 2015 team</a>. The team thereby constructed a first PACE bioreactor prototype
Dresden 2015 team (link to their wiki). The team thereby constructed a first PACE bioreactor prototype
+
 
and tested the flow controls. However, due to the lack of time, they could not run real PACE
 
and tested the flow controls. However, due to the lack of time, they could not run real PACE
  
 
experiments. The team concluded: “All in all, the initial experiments helped us to grasp the idea of how
 
experiments. The team concluded: “All in all, the initial experiments helped us to grasp the idea of how
such experiments can look like” (link to https://2015.igem.org/Team:TU_Dresden/Project/Conclusions)
+
such experiments can look like”  
 
<h2>Motivation, Initial Design &amp; Engineering Principles</h2>
 
<h2>Motivation, Initial Design &amp; Engineering Principles</h2>
 
The goal of our work was to make in vivo directed evolution of proteins faster, easier, more robust and
 
The goal of our work was to make in vivo directed evolution of proteins faster, easier, more robust and
Line 138: Line 147:
 
<br>
 
<br>
 
<br>
 
<br>
 +
<div class="content" style="padding-top: 20px; padding-bottom: 20px;padding-left: 20px; font-weight: 700 !important; text-align: center !important;width:40%;height:auto;float: right;">
 +
<img src="https://static.igem.org/mediawiki/2017/e/e5/T--Heidelberg--2017_PREDCEL_procedure_turk.jpg">
 +
Figure 3: Overview of the PREDCEL protocol
 +
</div>
 
Finally, we all three steps were successfully united into a completed PACE cycle to evolve improved split
 
Finally, we all three steps were successfully united into a completed PACE cycle to evolve improved split
 
T7 polymerase variants.
 
T7 polymerase variants.
Line 146: Line 159:
 
correspond to individual steps of a typical directed evolution cycle (see AiGEM website for details on the
 
correspond to individual steps of a typical directed evolution cycle (see AiGEM website for details on the
 
implementation):
 
implementation):
(i) We mutate parental sequences using GAIA, the Genetic Artificial intelligence algorithm,
+
<br>
 +
<ol class="content">
 +
<li>
 +
We mutate parental sequences using GAIA, the Genetic Artificial intelligence algorithm,
 
which introducing random errors in a parental protein sequence/sequence pool; note, that
 
which introducing random errors in a parental protein sequence/sequence pool; note, that
errors can also be limited to selected sequences windows
+
errors can also be limited to selected sequences windows </li>
(ii) We select sequences using DeeProtein, a deep neuronal network created and trained on 10
+
<li> We select sequences using DeeProtein, a deep neuronal network created and trained on 10
 
million protein sequences representing more than 800 protein classes. DeeProtein is able to
 
million protein sequences representing more than 800 protein classes. DeeProtein is able to
 
robustly infer sequence-function relationships from raw sequence data. GAIA, in turn,
 
robustly infer sequence-function relationships from raw sequence data. GAIA, in turn,
 
converts the DeeProtein analysis into a genetic fitness score and applies a threshold to
 
converts the DeeProtein analysis into a genetic fitness score and applies a threshold to
select the surviving sequence variants
+
select the surviving sequence variants </li>
(iii) Reproduction is then simply achieved by starting a new computational cycle with the
+
<li> Reproduction is then simply achieved by starting a new computational cycle with the
evolved sequence variants from the previous in silico evolution round.
+
evolved sequence variants from the previous in silico evolution round.</li>
 
+
<br>
 
Importantly, as part of our integrated human practices activities, we added a fourth step to our in silico
 
Importantly, as part of our integrated human practices activities, we added a fourth step to our in silico
 
evolution software, which does not have a counterpart in the in vivo directed evolution world:
 
evolution software, which does not have a counterpart in the in vivo directed evolution world:
  
(iv) We created SafetyNet, a DeeProtein based web application to infer “sleeping” hazardous
+
<li>We created SafetyNet, a DeeProtein based web application to infer “sleeping” hazardous
 
any parental (and evolved) sequence. Thus, SafetyNet safeguard the directed evolution
 
any parental (and evolved) sequence. Thus, SafetyNet safeguard the directed evolution
 
process and monitors and avoids the unintentional evolution of hazardous or dangerous
 
process and monitors and avoids the unintentional evolution of hazardous or dangerous
protein sequences
+
protein sequences</li>
 
+
</ol>
 +
<br>
 
To validate our in silico evolution software, we designed two complementary wet lab experiments (link
 
To validate our in silico evolution software, we designed two complementary wet lab experiments (link
 
software validation), both of which were successfully performed.
 
software validation), both of which were successfully performed.
1) We aimed at demonstrating (and eventually successfully showed), that DeeProtein can infer the
+
<ol class="content">
 +
<li> We aimed at demonstrating (and eventually successfully showed), that DeeProtein can infer the
 
impact of GAIA-induced mutations on protein function (i.e. the degree of activity). To this end,
 
impact of GAIA-induced mutations on protein function (i.e. the degree of activity). To this end,
 
we designed ~30 beta-lactamase (Link software validation) mutants using GAIA and correlated
 
we designed ~30 beta-lactamase (Link software validation) mutants using GAIA and correlated
 
the DeeProtein-based “activity” scores with the in vivo measured minimum inhibitory antibiotics
 
the DeeProtein-based “activity” scores with the in vivo measured minimum inhibitory antibiotics
concentration (MIC).
+
concentration (MIC).</li>
2) We aimed at demonstrating (and eventually successfully showed) that we can evolve novel
+
<li> We aimed at demonstrating (and eventually successfully showed) that we can evolve novel
 
functions in silico, practically from scratch. To this end, we used GAIA transfer beta-
 
functions in silico, practically from scratch. To this end, we used GAIA transfer beta-
 
galactosidase activity onto a beta-glucoronidase (link software validation) parental sequence by
 
galactosidase activity onto a beta-glucoronidase (link software validation) parental sequence by
in silico evolution.
+
in silico evolution.</li>
 +
</ol>
 
<h2>Improved, finalized Design – Learn, Evolve, PREDCEL!</h2>
 
<h2>Improved, finalized Design – Learn, Evolve, PREDCEL!</h2>
<div class="content" style="padding-top: 20px; padding-bottom: 20px;padding-left: 20px; font-weight: 700 !important; text-align: center !important;width:40%;height:auto;float: right;">
+
 
<img src="https://static.igem.org/mediawiki/2017/e/e5/T--Heidelberg--2017_PREDCEL_procedure_turk.jpg">
+
Figure 3: Overview of the PREDCEL protocol
+
</div>
+
 
While setting up the PACE apparatus and protocols, we ran into a serious of recurrent problems, most
 
While setting up the PACE apparatus and protocols, we ran into a serious of recurrent problems, most
 
importantly phage washout (i.e. complete phage loss after few hours of continuous evolution) and
 
importantly phage washout (i.e. complete phage loss after few hours of continuous evolution) and

Revision as of 15:25, 1 November 2017

Our design
Interfacing In Vivo and In Silico Directed Evolution As Novel Engineering Paradigm
Video 1:
Background and general project setup
Video 2:
Details on in vivo directed evolution with PACE and PREDCEL

Background

„Harnessing evolution for the development of novel proteins and biomolecules for human benefit“ - This naïve vision stood at the beginning of our this year’s iGEM project. But how should we do that? While doing our literature research, we came across an exciting method named PACE - phage assisted continuous evolution. This method was invented by Kevin Esvelt and David Liu at Harvard university in 2011 (Esvelt et al, Nature, 2011). The idea behind this method is rather simple: Mimic natural evolution in bioreactor, accelerate it and direct it towards a specific goal. More precisely, PACE implements a fully closed, extremely fast evolution cycle comprising of replication, mutation and selection to enable in vivo directed evolution of proteins. In brief, the protein to be evolved is encoded by an M13 bacteriophage and transferred from one E. coli host cell to the next by means of phage replication and propagation. Importantly, the E. coli hosts express mutator genes (Badran et al, Nature Communictaions, 2015) to increase the phage mutation rate during replication, thereby creating a highly diverse gene pool. Finally, synthetic circuits in the E. coli host cell couple phage propagation efficiency to the fitness (i.e. function) of the protein to be evolved, by controlling the expression of an essential phage gene. Details on the PACE background can be found on the PACE subpage. By chance, we had the opportunity to meet Kevin Esvelt, the PACE co-inventor, who gave a talk in Heidelberg on June 1 st 2017, i.e. before we started our wet lab work. Kevin was kind enough to PACE subpage visit our team and discuss his PACE method in detail. Besides its power for directed protein evolution and conceptual beauty, we identified two major limitations of the current PACE setup: Video 1: Background and general project setup Video 2: Details on in vivo directed evolution with PACE and PREDCEL
  1. It requires a complex, custom-made bioreactor (Figure 1) including sophisticated flow control setup, which is challenging to assemble and difficult to run robustly due to multiple failure points (potential phage contamination; risk for phage-washout if flow rates are too high; Host cell biofilm formation proper phage evolution)
  2. PACE is almost entirely limited to improving an already existing protein function. To evolve truly novel functions, evolutionary stepping stones are required, which can be so far created only by slowly adapting selection pressure towards the activity to be evolved (Pu, Zinkus-Boltz & Dickinson, Nat. Chem Biol, 2017). In practical terms, this means that evolving novel functions requires an extremely complex experimental setup comprising multiple, fine-tuned synthetic selection circuits applied in sequential order over many days. In most cases, it evolving a novel functionality from scratch remains a practically impossible task to be implemented in PACE.

As result its complexity and various failure points, only three groups worldwide have successfully established PACE until today. Of note, two of these groups are headed by the PACE inventors Kevin Esvelt and David Liu themselves, while the third group is run by Prof. Liu’s former post doc Bryan Dickinson. In the course of our iGEM project, we were in contact with all three of these groups. We are extremely grateful for their advice and for them sharing their constructs with us, both of which was extremely important for us to get going.
Figure 1: Bioreactor and flow setup required to perform PACE
Of note, in iGEM, only a single team has accepted the challenge and tried to set up PACE, namely the TU Dresden 2015 team. The team thereby constructed a first PACE bioreactor prototype and tested the flow controls. However, due to the lack of time, they could not run real PACE experiments. The team concluded: “All in all, the initial experiments helped us to grasp the idea of how such experiments can look like”

Motivation, Initial Design & Engineering Principles

The goal of our work was to make in vivo directed evolution of proteins faster, easier, more robust and expand its application scope. The primary, novel application we envisioned was the in vivo directed evolution of improved and/or novel enzymes, which would be of enormous benefit for diverse industries (e.g. chemical/pharmaceutical production, biomaterial production etc.), but remains particularly challenging with conventional directed evolution or rational engineering strategies. To reach this ambitious goal, we had to leave existing paradigms behind and rethink the concept of directed evolution. As result, we came up with a novel, highly innovative engineering concept: interfacing in vivo and in silico directed evolution by coupling PACE to deep learning of protein sequence –function relationships (Figure 2). The idea behind this concept is to fast-forward directed evolution using an intelligent algorithm. More precisely, our algorithm should enable us to skip the otherwise required evolutionary stepping stones during directed in vivo evolution by pre-evolving proteins with novel, desired target functions in silico.
Figure 2: Interfacing in silico and in vivo directed evolution as novel engineering paradigm.


The engineering approach our team used to reach our goal was to first divide the in silico and in vivo evolution problems into their basic components, work them out independently and validate them thoroughly before finally combining them into a united workflow. We quickly realized that the general components required for the in vivo and in silico directed evolution systems conceptually identical: Evolution always consists of a closed cycle running through three major steps: (i) reproduction, (ii) mutation and subsequent (iii) selection of sequence variants. The corresponding in vivo evolution counterparts in PACE are well described and hence easy to identify: (i) Reproduction is achieved by propagating M13 bacteriophages on E. coli host cells. (ii) Mutation is achieved during phage replication by expression of mutagenic genes. (iii) Selection is induced using synthetic circuits that couple the expression of geneIII (an essential M13 gene) to the function (“fitness”) of the phage-encoded transgene to be evolved. All three steps were tested individually by (i) propagating geneIII-deficient M13 phages on E. coli host cells in a PACE apparatus in the absence of selection pressure and mutagenesis; (ii) overexpressing mutagenic genes in E. coli and investigate the spontaneous formation of resistant colonies (Link the MP testing experiment) due to the mutagenesis-induced genetic drift; and (iii) comparing the propagation of phages with known transgene “fitness” on corresponding synthetic circuits (Link Cathys Phage Propagation Experiment).

Figure 3: Overview of the PREDCEL protocol
Finally, we all three steps were successfully united into a completed PACE cycle to evolve improved split T7 polymerase variants. The corresponding in silico evolution cycle counterparts were practically not existing and had to be invented from scratch. To this end, we designed an innovative software suite named AiGEM, for Artificial Intelligence for Genetic Evolution Mimicking. AiGEMs core functionalities thereby precisely correspond to individual steps of a typical directed evolution cycle (see AiGEM website for details on the implementation):
  1. We mutate parental sequences using GAIA, the Genetic Artificial intelligence algorithm, which introducing random errors in a parental protein sequence/sequence pool; note, that errors can also be limited to selected sequences windows
  2. We select sequences using DeeProtein, a deep neuronal network created and trained on 10 million protein sequences representing more than 800 protein classes. DeeProtein is able to robustly infer sequence-function relationships from raw sequence data. GAIA, in turn, converts the DeeProtein analysis into a genetic fitness score and applies a threshold to select the surviving sequence variants
  3. Reproduction is then simply achieved by starting a new computational cycle with the evolved sequence variants from the previous in silico evolution round.

  4. Importantly, as part of our integrated human practices activities, we added a fourth step to our in silico evolution software, which does not have a counterpart in the in vivo directed evolution world:
  5. We created SafetyNet, a DeeProtein based web application to infer “sleeping” hazardous any parental (and evolved) sequence. Thus, SafetyNet safeguard the directed evolution process and monitors and avoids the unintentional evolution of hazardous or dangerous protein sequences

To validate our in silico evolution software, we designed two complementary wet lab experiments (link software validation), both of which were successfully performed.
  1. We aimed at demonstrating (and eventually successfully showed), that DeeProtein can infer the impact of GAIA-induced mutations on protein function (i.e. the degree of activity). To this end, we designed ~30 beta-lactamase (Link software validation) mutants using GAIA and correlated the DeeProtein-based “activity” scores with the in vivo measured minimum inhibitory antibiotics concentration (MIC).
  2. We aimed at demonstrating (and eventually successfully showed) that we can evolve novel functions in silico, practically from scratch. To this end, we used GAIA transfer beta- galactosidase activity onto a beta-glucoronidase (link software validation) parental sequence by in silico evolution.

Improved, finalized Design – Learn, Evolve, PREDCEL!

While setting up the PACE apparatus and protocols, we ran into a serious of recurrent problems, most importantly phage washout (i.e. complete phage loss after few hours of continuous evolution) and phage contamination (due to the complex flow setup, contaminations are difficult to avoid). We also realized, that the PACE apparatus was very static, strongly limiting its applications scope. For evolution of proteins inducible by chemical for instance, one would ideally want to instantly alternate between evolution in presence of the chemical inducer and as well as corresponding selection strains. This is impossible in a continuous flow setup. Therefore, we sought out to create a more simple and more flexible PACE alternative, which can be quickly implemented by any trained biologist without the need for special equipment or knowledge. Inspired by a recent publication on phage-mediated selection of gene libraries (Brödel et al, Nature Communication, 2016; Brödel et al, Nature Protocols, 2017), we created a simple protocol named PREDCEL (for phage-related discontinuous evolution), which uses simple batch-wise, manual transfer of the evolving phage gene pool (Figure 3). In essence, PREDCEL reduces the entire complexity of PACE to simple, standard laboratory procedures, all the while gaining entire flexibility to easily swap conditions (strains, inducers etc.) between individual rounds of evolution. We also provide an optogenetic tool for simple adaptation of the selection pressure (Link to Optogenetics) during PREDCEL runs.

To validate our PREDCEL method, we performed directed evolution of split T7 polymers towards improved auto-reassembly (i.e. protein interaction of the split domains). Having laid the required, solid foundation, were then able to outline a simple, fully generalizable workflow for directed evolution enzymes and successfully tested it by re-directing the catalytic activity of a promiscuous cytochrome, Cyp1A2, towards a naturally unfavored product (Link Cytochrome engineering) . Key to this workflow is MAWS 2.0, a software originally introduced by iGEM Team Heidelberg 2015, enabling the in silico design of aptamers and corresponding riboswitches (Figure 4). We demonstrate the functionality of the improved MAWS 2.0 software, by successful design and validation of a riboswitch detecting an organosilicon product (Link organosilicones) synthesized by us in vitro using an engineered cytochrome C. These riboswitches are then used for detecting the desired reaction product and mediate selection pressure during evolution by controlling the expression of M13 geneIII essential for M13 phage reproduction (Figure 4).
Figure 4: A fully generalizable Workflow for Engineering of improved and novel enzymes using AiGEM-mediated in silico evolution interfaced with PREDCEL-mediated in vivo evolution.


In summary, we implemented a unique in vivo – in silico evolution cycle highly accelerating the directed evolution of protein for human benefit. Our standardized PREDCEL protocol, PREDCEL parts collection, online Toolbox guide and accompanying RFC smoothly deliver our evolution toolbox to the end user. Our finalized project design is summarized in Figure 5 (a fully interactive version of this project overview figure is available under project overview). Our Results Page guides you through all subprojects carried out to establish, validate and apply our evolutionary toolbox. Taken together, we provide a foundational advance by introducing an innovative in vivo and in silico evolution interface as novel engineering paradigm to synthetic biology.