Line 14: | Line 14: | ||
<h1 id="bLac">Evolution of \(\beta\)-Lactamases</h1> | <h1 id="bLac">Evolution of \(\beta\)-Lactamases</h1> | ||
<h3>Motivation</h3> | <h3>Motivation</h3> | ||
− | The deep neural network DeeProtein | + | The deep neural network DeeProtein represents the heart of our AiGEM software suite. DeeProtein was trained on about 8 million sequences of the UniProt database to grasp the complex sequence to function relation in proteins. It is able to categorize a sequence multimodal into 886 classes of gene-ontology (GO) terms of the molecular function GO graph. As gene ontology terms are labels for protein functionality, we hypothesize that it is possible to assert protein activity based on the learned representation of the sequence to function relation in DeeProtein. To back this claim we set out for a comprehensive validation of the DeeProtein classification score. We applied GAIA, interfaced with a DeeProtein variant specifically trained on \(\beta\)-lactamases, to predict a set of \(\beta\)-lactamase sequences matching the following criterion: The set should contain a broad range of variants with higher and lower DeeProtein classification scores compared to the wildtype.<br> |
− | As a measure for enzyme activity we | + | As a measure for enzyme activity we state the minimal inhibitory concentration (MIC) of carbenicillin for each candidate in the set. We demonstrate a correlation between the MIC of carbenicillin and the average DeeProtein classification score of the screened candidates. Further we improve the performance of our deep neural network by incorporating the generated wet-lab data into our training process. |
}}}} | }}}} | ||
{{Heidelberg/templateus/Contentsection| | {{Heidelberg/templateus/Contentsection| | ||
{{#tag:html| | {{#tag:html| | ||
<h3>Experimental Design - Software</h3> | <h3>Experimental Design - Software</h3> | ||
− | We ran GAIA | + | We ran GAIA separately for each mutative window for up to 1000 generations in a single mutation mode: The rate of mutation was limited to one amino acid substitution per generation with one initial mutation. Additionally, we ran GAIA in double mutation mode over the combined frames for the same number of generations.<br> |
− | + | For each generation, the top five suggested candidates were saved and added to the pool of suggested sequences. Subsequently, we selected a subset of the proposed sequence pool for wet lab validation under the premise of covering a broad activity spectrum. Thus, we selected variants scoring higher than the wildtype and mutants scoring lower than the wildtype. | |
{{Heidelberg/templateus/Tablebox| | {{Heidelberg/templateus/Tablebox| | ||
Table 1: Selected single and double mutants proposed by GAIA. | | Table 1: Selected single and double mutants proposed by GAIA. | | ||
Line 168: | Line 168: | ||
</tbody></table> | </tbody></table> | ||
}}| | }}| | ||
− | Of the pool of proposed \(\beta\)-lactamase sequences we selected 9 single and 16 double mutations for | + | Of the pool of proposed \(\beta\)-lactamase sequences we selected 9 single and 16 double mutations for wet-lab validation. |
}} | }} | ||
}}}} | }}}} | ||
Line 174: | Line 174: | ||
{{#tag:html| | {{#tag:html| | ||
<h3>Results - Software</h3> | <h3>Results - Software</h3> | ||
− | We calculated a MIC-score from the measured | + | We calculated a MIC-score from the measured data points for all candidates, by first applying a threshold on the measured relative OD600s at 0.8. As the OD600 was measured in a plate reader it can only be seen as relative measure of growth. Thus, we consider any OD600 below 0.8 as inhibited (0) and higher ODs as growing (1). From the thresholded values, we then calculated the MIC-score as the number of consecutive observed data points until the first carbenicillin concentration where the OD fell below 0.8. Next, the DeeProtein scores were averaged among the respective MIC-score intervals and the mean DeeProtein score was plotted against the MIC of carbenicillin. |
{{Heidelberg/templateus/Imagesection| | {{Heidelberg/templateus/Imagesection| | ||
https://static.igem.org/mediawiki/2017/a/a9/T--Heidelberg--2017_DeeProtein_ClassifierVSmic.svg | | https://static.igem.org/mediawiki/2017/a/a9/T--Heidelberg--2017_DeeProtein_ClassifierVSmic.svg | | ||
Figure 2: The DeeProtein classification score for screened \(\beta\)-lactamase variants correlates with the MIC of Carbenicillin. | | Figure 2: The DeeProtein classification score for screened \(\beta\)-lactamase variants correlates with the MIC of Carbenicillin. | | ||
− | + | he average DeeProtein classification scores assigned to samples in the MIC-score bins are depicted as black dots. The red line is the fitted linear model. Samples assigned with a high classification score tend to sustain higher carbenicillin concentrations, whereas a low classification score is assigned to variants with a low MIC. | |
}} | }} | ||
}}}} | }}}} | ||
Line 193: | Line 193: | ||
<h1 id="GUS2GAL">Reprogrammation of \(\beta\)-Glucuronidase</h1> | <h1 id="GUS2GAL">Reprogrammation of \(\beta\)-Glucuronidase</h1> | ||
<h3>Motivation</h3> | <h3>Motivation</h3> | ||
− | + | The main objective of the AiGEM software suite is the improvement of directed evolution experiments. As the protein space is tremendous in size and impossible to assert with brute force or random walk methods, a directed evolution tool needs to reduce the combinatory complexity of the protein space. Using <a href="https://2017.igem.org/Team:Heidelberg/Software/DeeProtein#Representation">DeeProtein</a>, we learned the complex relation between protein sequence and protein function, thus the properties of the thin manifold of functional protein sequences. To harness this learned representation in a <a href="https://2017.igem.org/Team:Heidelberg/Software/GAIA#Generative">generative approach</a> we developed our directed evolution tool <a href="https://2017.igem.org/Team:Heidelberg/Software/GAIA">GAIA</a> (Genetically Artificially Intelligent Algorithm). GAIA deploys the pre-trained DeeProtein models as scoring function, to assert the class probability of a certain protein function distribution during the evolution process. We hypothesize that by maximization of the class probability of the goal protein function through introduction of amino acid substitutions on the entry sequence its function gets shifted towards the goal term.<br> | |
− | To demonstrate the capabilities of GAIA we set out to | + | To demonstrate the capabilities of GAIA we set out to reprogram the E. Coli \(\beta\)-glucuronidase (GUS) towards \(\beta\)-galactosidase (GAL) activity <i>in silico</i>. Sequences were predicted by GAIA and subsequently the enzyme kinetics were asserted in the wet-lab. |
}}}} | }}}} | ||
{{Heidelberg/templateus/Contentsection| | {{Heidelberg/templateus/Contentsection| | ||
{{#tag:html| | {{#tag:html| | ||
<h3>Experimental Setup - Software</h3> | <h3>Experimental Setup - Software</h3> | ||
− | We prepared | + | We prepared our experiments by performing <a href="https://2017.igem.org/Team:Heidelberg/Validation/#MD">equilibration molecular dynamics</a> simulations on the wildtype and a known variant <x-ref>matsumura2001vitro</x-ref>. Based on equilibration molecular dynamics simulations of the wildtype GUS and the mutant introduced by Matsumura et al.<x-ref>matsumura2001vitro</x-ref>, we determined three mutative windows on the GUS sequence. The limitation of mutations to certain sequence windows was necessary to facilitate the cloning procedure of the mutants in the wet-lab. Subsequently, the GUS with its defined mutative regions was submitted to GAIA with the objective of maximization of the \(\beta\)-galactosidase-activity GO-term (GO:0004565). GAIA was run for 1000 generations and the top five candidates of every generation were added to the candidate library. Thus, we picked five candidates from the library to test them in the wet-lab. |
}}}} | }}}} | ||
Revision as of 00:27, 1 November 2017
Software Validation
From AiGEM to the bench and back
Evolution of \(\beta\)-Lactamases
Motivation
The deep neural network DeeProtein represents the heart of our AiGEM software suite. DeeProtein was trained on about 8 million sequences of the UniProt database to grasp the complex sequence to function relation in proteins. It is able to categorize a sequence multimodal into 886 classes of gene-ontology (GO) terms of the molecular function GO graph. As gene ontology terms are labels for protein functionality, we hypothesize that it is possible to assert protein activity based on the learned representation of the sequence to function relation in DeeProtein. To back this claim we set out for a comprehensive validation of the DeeProtein classification score. We applied GAIA, interfaced with a DeeProtein variant specifically trained on \(\beta\)-lactamases, to predict a set of \(\beta\)-lactamase sequences matching the following criterion: The set should contain a broad range of variants with higher and lower DeeProtein classification scores compared to the wildtype.As a measure for enzyme activity we state the minimal inhibitory concentration (MIC) of carbenicillin for each candidate in the set. We demonstrate a correlation between the MIC of carbenicillin and the average DeeProtein classification score of the screened candidates. Further we improve the performance of our deep neural network by incorporating the generated wet-lab data into our training process.
Experimental Design - Software
We ran GAIA separately for each mutative window for up to 1000 generations in a single mutation mode: The rate of mutation was limited to one amino acid substitution per generation with one initial mutation. Additionally, we ran GAIA in double mutation mode over the combined frames for the same number of generations.For each generation, the top five suggested candidates were saved and added to the pool of suggested sequences. Subsequently, we selected a subset of the proposed sequence pool for wet lab validation under the premise of covering a broad activity spectrum. Thus, we selected variants scoring higher than the wildtype and mutants scoring lower than the wildtype.
Table 1: Selected single and double mutants proposed by GAIA. Of the pool of proposed \(\beta\)-lactamase sequences we selected 9 single and 16 double mutations for wet-lab validation.
Mutation | Fragment | Classification_score |
G236L | F | 0.56584185 |
G216C_P217F | E | 0.56683183 |
L218D | E | 0.56554008 |
L167G_P164L | D | 0.56678349 |
P164G | D | 0.56673431 |
P164H | D | 0.56748658 |
D129L_N130I | C | 0.56734717 |
D129L | C | 0.56697899 |
E102F | B | 0.56683093 |
Y103D | B | 0.56666845 |
K71W | A | 0.56605697 |
M67G | A | 0.56762165 |
M67G_F70G | A | 0.56777865 |
K71W_G236L | A+F | 0.56503928 |
M67G_P164G | A+D | 0.5675621 |
M67G_P164H | A+D | 0.56818455 |
E102F_P164H | B+D | 0.56749099 |
E102F_P164G | B+D | 0.56676418 |
E102F_G236L | B+F | 0.5658384 |
E102F_M67G | B+A | 0.56763196 |
D129L_P164G | C+D | 0.56688529 |
D129L_M67G | C+A | 0.56781197 |
D129L_L218L | C+E | 0.56570756 |
L218D_P164G | E+D | 0.56537378 |
L218D_K71W | E+A | 0.56467879 |
WT | None | 0.56685823 |
WT | None | 0.56685823 |
Results - Software
We calculated a MIC-score from the measured data points for all candidates, by first applying a threshold on the measured relative OD600s at 0.8. As the OD600 was measured in a plate reader it can only be seen as relative measure of growth. Thus, we consider any OD600 below 0.8 as inhibited (0) and higher ODs as growing (1). From the thresholded values, we then calculated the MIC-score as the number of consecutive observed data points until the first carbenicillin concentration where the OD fell below 0.8. Next, the DeeProtein scores were averaged among the respective MIC-score intervals and the mean DeeProtein score was plotted against the MIC of carbenicillin.Reprogrammation of \(\beta\)-Glucuronidase
Motivation
The main objective of the AiGEM software suite is the improvement of directed evolution experiments. As the protein space is tremendous in size and impossible to assert with brute force or random walk methods, a directed evolution tool needs to reduce the combinatory complexity of the protein space. Using DeeProtein, we learned the complex relation between protein sequence and protein function, thus the properties of the thin manifold of functional protein sequences. To harness this learned representation in a generative approach we developed our directed evolution tool GAIA (Genetically Artificially Intelligent Algorithm). GAIA deploys the pre-trained DeeProtein models as scoring function, to assert the class probability of a certain protein function distribution during the evolution process. We hypothesize that by maximization of the class probability of the goal protein function through introduction of amino acid substitutions on the entry sequence its function gets shifted towards the goal term.To demonstrate the capabilities of GAIA we set out to reprogram the E. Coli \(\beta\)-glucuronidase (GUS) towards \(\beta\)-galactosidase (GAL) activity in silico. Sequences were predicted by GAIA and subsequently the enzyme kinetics were asserted in the wet-lab.
Experimental Setup - Software
We prepared our experiments by performing equilibration molecular dynamics simulations on the wildtype and a known variantEquilibration Molecular Dynamics Simulations
In order to assert the effects of the introduced mutations on the \(\beta\)-glucuronidase structure, we performed equilibration molecular dynamics (MD) simulations on the wild-type and the mutated variant introduced by matsumura et al.The apo structure was protonated at pH 6.5 using the modeller class of the openMM library, then the protein was solvated in a cubic box of 10nm with the tip3p water model. The system was then equiibrated for charge with sodium and chloride ions.
Table 2: Calculated RMSD-values after equilibration for the wildtype \(\beta\)-glucuronidase and the Matsumura variant. {{{3}}}
Structure | RMSD (Angstroem) |
---|---|
Wildtype (3lpg) | 1.412 |
Matsumura mutant | 1.728 |