Line 248: | Line 248: | ||
<h2>Results</h2> | <h2>Results</h2> | ||
Hi I bims, 1 result. | Hi I bims, 1 result. | ||
− | + | }}}} | |
}} | }} | ||
{{Heidelberg/references2}} | {{Heidelberg/references2}} | ||
{{Heidelberg/footer}} | {{Heidelberg/footer}} |
Revision as of 14:36, 31 October 2017
Software Validation
From GAIA to the bench and back
Evolution of \(\beta\)-Lactamases
Motivation
The deep neural network DeeProtein is the heart of our AiGEM software suite. DeeProtein was trained on ~8 million sequences of the uniprot database to grasp the complex sequence to function relation in proteins. It is able to categorize a sequence multimodal into 886 classes of gene-ontology (GO) terms of the molecular function GO graph. As gene ontology terms are labels for protein functionality, we hypothesize that is is possible to assert protein activity based on the learned representation of the sequence to function relation in DeeProtein. To back this claim we set out for a comprehensive validation of the DeeProtein classification score. We applied GAIA, interfaced with a DeeProtein variant specifially trained on beta-Lactamases, to predict a set of beta-Lactamase sequences matching the following criteria: The set should contain a broad range of variants with higher and lower DeeProtein classification scores compared to the wildtype.As a measure for enzyme activity we assert the minimum inhibitory concentration (MIC) of carbenicillin for each candidate in the set. We demonstrate a correlation between the MIC of carbenicillin and the average DeeProtein classification score of the screened candidates. Further we improve the performance of out deep neural network by incorporating the generated wetlab data into our training process.
Experimental Design - Software
We ran GAIA seperately for each mutative window for up to 1000 generations in a singl mutation mode: The mutationrate was limited to 1 amino acid substitution per generation with 1 initial mutation. Additionally we ran GAIA in double mutation mode over the combined frames for the same number of generations.For each generation the top five suggested candidates were saved and added to the pool of suggested sequences. Subsequently we selected a subset of the proposed sequence pool for wet lab validation under the premise of covering a broad acitvity sepctrum. Thus we selected variants scoring higher than the wildtype and mutants scoring lower than the wildtype.
Table 1: Selected single and double mutants proposed by GAIA. Of the pool of proposed beta Lactamase sequences we selected 9 single and 16 double mutations for wetlab validation.
Mutation | Fragment | Classification_score |
G236L | F | 0.56584185 |
G216C_P217F | E | 0.56683183 |
L218D | E | 0.56554008 |
L167G_P164L | D | 0.56678349 |
P164G | D | 0.56673431 |
P164H | D | 0.56748658 |
D129L_N130I | C | 0.56734717 |
D129L | C | 0.56697899 |
E102F | B | 0.56683093 |
Y103D | B | 0.56666845 |
K71W | A | 0.56605697 |
M67G | A | 0.56762165 |
M67G_F70G | A | 0.56777865 |
K71W_G236L | A+F | 0.56503928 |
M67G_P164G | A+D | 0.5675621 |
M67G_P164H | A+D | 0.56818455 |
E102F_P164H | B+D | 0.56749099 |
E102F_P164G | B+D | 0.56676418 |
E102F_G236L | B+F | 0.5658384 |
E102F_M67G | B+A | 0.56763196 |
D129L_P164G | C+D | 0.56688529 |
D129L_M67G | C+A | 0.56781197 |
D129L_L218L | C+E | 0.56570756 |
L218D_P164G | E+D | 0.56537378 |
L218D_K71W | E+A | 0.56467879 |
WT | None | 0.56685823 |
WT | None | 0.56685823 |
Results - Software
We calculated a MIC-score from the measured datapoints for all candidates, by first applying a threshold on the measured relative OD600s at 0.8. As the OD600 was measured in a platereader it can only be seen as relative meassure for the growth. Thus we consider any OD600 below 0.8 as inhibited (0) and higher ODs as growing (1). From the thresholded values we then calculated the MIC-score as the number of consecutive observed datapoints until the first Carbenicilline concentration where the OD fell below 0.8. Next the DeeProtein scores were avergaged among the respective MIC-score intervals and the mean DeeProtein score was plotted against the MIC of carbenicilline.Reprogrammation of \(\beta\)-Glucuronidase
Motivation
The main objective of the AiGEM software suite is the improvement of directed evolution experiments. As the protein space is tremendous in size and impossible to assert with brute force or random walk methods a directed evolution tool needs to reduce the combinatory complexity of the protein space. With DeeProtein we learned the complex relation between protein sequence and protein function, thus the properties of the thin manifold of functional protein sequences. To harness this learned representation in a generative approach we developed our directed evolution tool GAIA (Genetically Artificially Intelligent Algorithm). GAIA deploys the pretrained DeeProtein models as scoring function, to assert the class probability of a certain protein function distribution during the evolution process. We hypothesize that by maximization of the class probability of the goal protein function through introduction of amino acid substitutions on the entry sequence its function gets shifted towards the goal term.To demonstrate the capabilities of GAIA we set out to reprogramm the E. Coli beta glucuronidase (GUS) towards beta galactosidase (GAL) activity in silico. Sequences were predicted by GAIA and subsequenlty the enzyme kinetics were asserted in the wet-lab.
Experimental Setup - Software
We prepared out experiments by performing equilibration molecular dynamics simulations on the wildtype and a known variantEquilibration Molecular Dynamics Simulations
In order to assert the effects of the introduced mutations on the beta-Glucuronidase structure, we performed equilibration molecular dynamics (MD) simulations on the wild-type and the mutated variant introduced by matsumura et al.The apo structure was protonated at pH 6.5 using the modeller class of the openMM library, then the protein was solvated in a cubic box of 10nm with the tip3p water model. The system was then equiibrated for charge with sodium and chloride ions.
Table 2: Calculated RMSD-values after equilibration for the wildtype beta-Glucuronidase and the Matsumura variant. {{{3}}}
Structure | RMSD (Angstroem) |
---|---|
Wildtype (3lpg) | 1.412 |
Matsumura mutant | 1.728 |