Difference between revisions of "Team:Heidelberg/Validation"

Line 13: Line 13:
 
{{#tag:html|
 
{{#tag:html|
 
<h1 id="bLac">Evolution of \(\beta\)-Lactamases</h1>
 
<h1 id="bLac">Evolution of \(\beta\)-Lactamases</h1>
<h3>Motivation</h3>
+
<h2>Motivation</h2>
 
The deep neural network DeeProtein represents the heart of our AiGEM software suite. DeeProtein was trained on about 8 million sequences of the UniProt database to grasp the complex sequence to function relation in proteins. It is able to categorize a sequence multimodal into 886 classes of gene-ontology (GO) terms of the molecular function GO graph. As gene ontology terms are labels for protein functionality, we hypothesize that it is possible to assert protein activity based on the learned representation of the sequence to function relation in DeeProtein. To back this claim we set out for a comprehensive validation of the DeeProtein classification score. We applied GAIA, interfaced with a DeeProtein variant specifically trained on \(\beta\)-lactamases, to predict a set of \(\beta\)-lactamase sequences matching the following criterion: The set should contain a broad range of variants with higher and lower DeeProtein classification scores compared to the wildtype.<br>
 
The deep neural network DeeProtein represents the heart of our AiGEM software suite. DeeProtein was trained on about 8 million sequences of the UniProt database to grasp the complex sequence to function relation in proteins. It is able to categorize a sequence multimodal into 886 classes of gene-ontology (GO) terms of the molecular function GO graph. As gene ontology terms are labels for protein functionality, we hypothesize that it is possible to assert protein activity based on the learned representation of the sequence to function relation in DeeProtein. To back this claim we set out for a comprehensive validation of the DeeProtein classification score. We applied GAIA, interfaced with a DeeProtein variant specifically trained on \(\beta\)-lactamases, to predict a set of \(\beta\)-lactamase sequences matching the following criterion: The set should contain a broad range of variants with higher and lower DeeProtein classification scores compared to the wildtype.<br>
 
As a measure for enzyme activity we state the minimal inhibitory concentration (MIC) of carbenicillin for each candidate in the set. We demonstrate a correlation between the MIC of carbenicillin and the average DeeProtein classification score of the screened candidates. Further we improve the performance of our deep neural network by incorporating the generated wet-lab data into our training process.
 
As a measure for enzyme activity we state the minimal inhibitory concentration (MIC) of carbenicillin for each candidate in the set. We demonstrate a correlation between the MIC of carbenicillin and the average DeeProtein classification score of the screened candidates. Further we improve the performance of our deep neural network by incorporating the generated wet-lab data into our training process.
Line 19: Line 19:
 
     {{Heidelberg/templateus/Contentsection|
 
     {{Heidelberg/templateus/Contentsection|
 
{{#tag:html|
 
{{#tag:html|
<h3>Experimental Design - Software</h3>
+
<h2>Experimental Design - Software</h2>
 +
<h3>Definition of Mutationsites</h3>
 +
{{Heidelberg/templateus/Imagesection|
 +
          |
 +
            Figue [/]: Defined mutation sites on the beta-Lactamase sequence|
 +
The plot depicts the OD600 of the ß-lactamase mutants that could survive at 1280 µg/ml in the first assay, under more stringent conditions. The majority of these ß-lactamases have a slightly weaker activity than the wildtype, but we could determine five variants with a higher enzymatic activity. <a href"https://static.igem.org/mediawiki/2017/b/bc/T--Heidelberg--2017_Validation_HEATMAP_HIGH.png">View in fullsize</a>.
 +
}}
 +
 
 +
}}}}
 +
{{Heidelberg/templateus/Contentsection|
 +
{{#tag:html|
 +
<h3>GAIA preferences and Variant selection</h3>
 
We ran GAIA separately for each mutative window for up to 1000 generations in a single mutation mode: The rate of mutation was limited to one amino acid substitution per generation with one initial mutation. Additionally, we ran GAIA in double mutation mode over the combined frames for the same number of generations.<br>
 
We ran GAIA separately for each mutative window for up to 1000 generations in a single mutation mode: The rate of mutation was limited to one amino acid substitution per generation with one initial mutation. Additionally, we ran GAIA in double mutation mode over the combined frames for the same number of generations.<br>
 
For each generation, the top five suggested candidates were saved and added to the pool of suggested sequences. Subsequently, we selected a subset of the proposed sequence pool for wet lab validation under the premise of covering a broad activity spectrum. Thus, we selected variants scoring higher than the wildtype and mutants scoring lower than the wildtype. A table displaying the scores for each mutation is depicted at the <a href="https://2017.igem.org/wiki/index.php?title=Team:Heidelberg/Validation#Appendix">end of the page</a>.
 
For each generation, the top five suggested candidates were saved and added to the pool of suggested sequences. Subsequently, we selected a subset of the proposed sequence pool for wet lab validation under the premise of covering a broad activity spectrum. Thus, we selected variants scoring higher than the wildtype and mutants scoring lower than the wildtype. A table displaying the scores for each mutation is depicted at the <a href="https://2017.igem.org/wiki/index.php?title=Team:Heidelberg/Validation#Appendix">end of the page</a>.

Revision as of 13:04, 1 November 2017


Software Validation
From AiGEM to the bench and back
Deep learning is an extremely powerful method for representation learning. With the AiGEM (Artificial intelligence for Genetic Evolution Mimicking) suite we set out to harness this power in the context of proteins. By capturing the complex relation of protein sequence to protein function, intend to shift the directed evolution process in silico. Our software tool GAIA (Genetic Artifically Intelligent Algorithm) is thereby the evolving component interfaced with the deep neural network DeeProtein. In order to validate both parts of the AiGEM suite, we first demonstrate the correlation of the DeeProtein classification score of in silico evolved \(\beta\) lactamases with the minimium inihibitory concentration of carbenicillin. Second we in silico reprogram a \(\beta\)-glucuroniase into a galactosidase and assert the enzyme kinetics in the wet-lab.

Evolution of \(\beta\)-Lactamases

Motivation

The deep neural network DeeProtein represents the heart of our AiGEM software suite. DeeProtein was trained on about 8 million sequences of the UniProt database to grasp the complex sequence to function relation in proteins. It is able to categorize a sequence multimodal into 886 classes of gene-ontology (GO) terms of the molecular function GO graph. As gene ontology terms are labels for protein functionality, we hypothesize that it is possible to assert protein activity based on the learned representation of the sequence to function relation in DeeProtein. To back this claim we set out for a comprehensive validation of the DeeProtein classification score. We applied GAIA, interfaced with a DeeProtein variant specifically trained on \(\beta\)-lactamases, to predict a set of \(\beta\)-lactamase sequences matching the following criterion: The set should contain a broad range of variants with higher and lower DeeProtein classification scores compared to the wildtype.
As a measure for enzyme activity we state the minimal inhibitory concentration (MIC) of carbenicillin for each candidate in the set. We demonstrate a correlation between the MIC of carbenicillin and the average DeeProtein classification score of the screened candidates. Further we improve the performance of our deep neural network by incorporating the generated wet-lab data into our training process.

Experimental Design - Software

Definition of Mutationsites

Figue [/]: Defined mutation sites on the beta-Lactamase sequence
The plot depicts the OD600 of the ß-lactamase mutants that could survive at 1280 µg/ml in the first assay, under more stringent conditions. The majority of these ß-lactamases have a slightly weaker activity than the wildtype, but we could determine five variants with a higher enzymatic activity. View in fullsize.

GAIA preferences and Variant selection

We ran GAIA separately for each mutative window for up to 1000 generations in a single mutation mode: The rate of mutation was limited to one amino acid substitution per generation with one initial mutation. Additionally, we ran GAIA in double mutation mode over the combined frames for the same number of generations.
For each generation, the top five suggested candidates were saved and added to the pool of suggested sequences. Subsequently, we selected a subset of the proposed sequence pool for wet lab validation under the premise of covering a broad activity spectrum. Thus, we selected variants scoring higher than the wildtype and mutants scoring lower than the wildtype. A table displaying the scores for each mutation is depicted at the end of the page.

Experimental Design - Wet-lab

Construct Design

To create a plasmid that is compatible with other biobrick parts, we decided to clone the ß-lactamase cassette into the pSB1C3 backbone. The ampicillin resistance gene was obtained from pSB1A3 and was introduced into the biobrick backbone via Golden Gate cloning. The BsaI restriction site, which is located inside the ß-lactamase coding sequence was previously removed to facilitate easy and fast cloning. The second antibiotic resistance cassette, derived by the pSB1C3 backbone is important, because it enabled us to select for the correct constructs, even when the ß-lactamase gene was not active or activity heavily reduced. The mutations, we cloned were located within six different windows. Cloning of these variants were performed by Golden Gate. The different parts of the plasmid were PCR amplified and the mutations were placed in primer extensions. Subsequently, the mutated plasmid was assembled with Golden Gate cloning. In this way, we generated single, double and triple mutants of the gene.

Determination of the Minimal Inhibitory Concentration

The activity of an antibiotic resistance can be determined via the minimal inhibitory concentration (MIC). The MIC is the minimum concentration is the concentration, at which the growth of an microorganism, in our case E. coli can survive. At mic50, for instance, the growth of a culture is slowed down by fifty percent. Typical methods to determine MICs are broth or agar dilutions. The principle is as simple as effective. The microorganism is cultured in liquid medium or on plates with different concentrations of the inhibiting agent JM10291. With this method, concentrations which are tolerated by the organism and those who inhibit growth can be determined. However, such an assay gives no information about the kind of toxicity e.g. whether the chemical is cytotoxic or cytostatic.
In our case, we want to evaluate differences in the activity of a well characterized protein. Consequently, such an dilution method is ideal. It is easy to setup, very robust and gives exactly the information that is needed for the characterization.
To start the assay, colonies were picked into LB medium with 100 µg/ml chloramphenicol, but without carbenicillin and were grown to the stationary phase for 20 h. Then, deep well plates with 500 µl LB broth per well were prepared. The medium was supplemented with 100 µg/ml chloramphenicol and for each mutant, eight wells with different carbenicillin concentrations, ranging from 10-1280 µg/ml were prepared. Each well was inoculated with 2 µl of the proculture. The new cultures were incubated another 8 h. Finally, the OD600 of each well was measured. By this measure we were able to assert the growth ability of each variant.

Results

Results - Wet-lab

The MIC of 37 cloned variants, a positive control with the wildtype ß-lactamase and a negative control, without an ampicillin resistance were tested within a carbenicillin concentration range between 10 µg/ml and 1280 µg/ml. When the OD600 was measured, a value of 0.08 was set as cutoff between bacteria that were resistant to a specific concentration and those that were not. The negative control could not grow under all conditions. Thus the lowest tested concentration of 10 µg/ml was already above the MIC of E. coli. 13 of the 37 tested variants did not show any significant ß-lactamase activity and did not grow at any carbenicillin concentration. As expected, the majority, eleven of thirteen, were double and triple mutants, whereas only two single mutants lead to the destruction of enzyme activity. Interestingly, these two single mutations, D129L and K71W appear in 10 of the 13 catalytically dead mutants, which underlines their meaning for enzyme activity. Furthermore, it is noteworthy, that one double mutant, D129L, L218D has a MIC above 640 µg/ml. This suggests that L218D can somehow compensate for the negative effect of D129L.
Several variants, 11, had their MIC in the range that was tested in our assay. We proved that it is possible to gradually influence enzyme activity with help of our software. Another 11 mutants, as well as the wildtype ß-lactamase could grow under all conditions. Interestingly, only half of the variants with high ß-lactamase activity (6 of 11), were single mutants, whereas the remaining five variants had two or three mutations. Within this set of mutations, M67G and P164G/H are very prominent. These exchanges don’t have drastic effects on the enzyme activity. However these mutations appear also in variants with weak or no activity.
As a result, we decided to perform a second test for these candidates with higher antibiotic concentrations. The three highest carbenicillin concentrations of the first data set were included and the range was extended to a maximum concentration of 19.2 mg/ml. While several candidates turned out to have a weaker activity than the wildtype protein, five of them showed improved properties. The two best candidates could even grow at 19.2 mg/ml. The variants were contained one point mutation each, E102F and M67G. These most benefitial mutations, appear in many of our better candidates as well, which underlines there functionality.
Figue [/]: Gowth Behaviour of ß-Lactamase Mutants under Different Carbenicillin Concentrations
The plot shows the OD600 of the different ß-lactamase mutants at different carbenicillin concentrations. The pool of mutants is very heterogenous. Some proteins are not active at all, some show different grades of activity and 11 mutants, as well as the wildtype enzyme can grow in all conditions. View in fullsize.
Figue [/]: Gowth Behaviour of ß-Lactamase Mutants under Elevated Carbenicillin Concentrations
The plot depicts the OD600 of the ß-lactamase mutants that could survive at 1280 µg/ml in the first assay, under more stringent conditions. The majority of these ß-lactamases have a slightly weaker activity than the wildtype, but we could determine five variants with a higher enzymatic activity. View in fullsize.

Results - Software

We calculated a MIC-score from the measured data points for all candidates, by first applying a threshold on the measured relative OD600s at 0.08. As the OD600 was measured in a plate reader it can only be seen as relative measure of growth. Thus, we consider any OD600 below 0.08 as inhibited (0) and higher ODs as growing (1). From the thresholded values, we then calculated the MIC-score as the number of consecutive observed data points until the first carbenicillin concentration where the OD fell below 0.08. Next, the DeeProtein scores were averaged among the respective MIC-score intervals and the mean DeeProtein score was plotted against the MIC of carbenicillin for all double mutants. The average DeeProtein score correlates with the MIC with an correlation coefficient of 0.6 as displayed in Figure [/].
Figure 2: The DeeProtein classification score for screened \(\beta\)-lactamase variants correlates with the MIC of Carbenicillin.
he average DeeProtein classification scores assigned to samples in the MIC-score bins are depicted as black dots. The red line is the fitted linear model. Samples assigned with a high classification score tend to sustain higher carbenicillin concentrations, whereas a low classification score is assigned to variants with a low MIC.
To improve the predictive power of DeeProtein, we subsequently incorporated the collected data into our training set. with the aim to better catch the effect of single amino acid substitutions on enzyme activity.

Discussion

Reprogrammation of \(\beta\)-Glucuronidase

Motivation

The main objective of the AiGEM software suite is the improvement of directed evolution experiments. As the protein space is tremendous in size and impossible to assert with brute force or random walk methods, a directed evolution tool needs to reduce the combinatory complexity of the protein space. Using DeeProtein, we learned the complex relation between protein sequence and protein function, thus the properties of the thin manifold of functional protein sequences. To harness this learned representation in a generative approach we developed our directed evolution tool GAIA (Genetically Artificially Intelligent Algorithm). GAIA deploys the pre-trained DeeProtein models as scoring function, to assert the class probability of a certain protein function distribution during the evolution process. We hypothesize that by maximization of the class probability of the goal protein function through introduction of amino acid substitutions on the entry sequence its function gets shifted towards the goal term.
To demonstrate the capabilities of GAIA we set out to reprogram the E. Coli \(\beta\)-glucuronidase (GUS) towards \(\beta\)-galactosidase (GAL) activity in silico. Sequences were predicted by GAIA and subsequently the enzyme kinetics were asserted in the wet-lab.

Experimental Setup - Software

We prepared our experiments by performing equilibration molecular dynamics simulations on the wildtype and a known variant matsumura2001vitro. Based on equilibration molecular dynamics simulations of the wildtype GUS and the mutant introduced by Matsumura et al.matsumura2001vitro, we determined three mutative windows on the GUS sequence. The limitation of mutations to certain sequence windows was necessary to facilitate the cloning procedure of the mutants in the wet-lab. Subsequently, the GUS with its defined mutative regions was submitted to GAIA with the objective of maximization of the \(\beta\)-galactosidase-activity GO-term (GO:0004565). GAIA was run for 1000 generations and the top five candidates of every generation were added to the candidate library. Thus, we picked five candidates from the library to test them in the wet-lab.

Equilibration Molecular Dynamics Simulations

In order to assert the effects of the introduced mutations on the \(\beta\)-glucuronidase structure, we performed equilibration molecular dynamics (MD) simulations on the wild-type and the mutated variant introduced by matsumura et al. matsumura2001vitro. All simulations were performed in openMM eastman2010openmm with the amber99 forcefield pearlman1995amber. The B-chain of the \(\beta\)-glucuronidase (PDB-code: 3lpg) served as the wildtype and basis for in silico mutagenesis. First all selenomethionines in the sequence were corrected to methionines and the ligand was excluded. Subsequently mutations were introduced in pyMOL delano2002pymol : D508G, T509A, S557P, N566S, K568T to obtain the described GUS variant matsumura2001vitro.
The apo structure was protonated at pH 6.5 using the modeller class of the openMM library, then the protein was solvated in a cubic box of 10nm with the tip3p water model. The system was then equiibrated for charge with sodium and chloride ions.
Figure 2: Introduced mutations have little effect on the protein folding in equilibration MD
Superimposed structures for the wildtype \(\beta\)-glucuronidase (A) (PDB-code: 3lpg) and the mutated version (B) introduced by Matsumura matsumura2001vitro. In both depictions the grey structure is the first frame of the equilibration trajectory and the blue structure is the protein after equilibration MD. The catalytic residues are colored yellow, the symbolic ligand (was not included in MD simulations) in green. Matsumura mutations are colored magenta. The grey and blue structures vary just slightly, thus the introduced mutations did not affect the overall protein folding.
Subsequently the system was heated from 0 to 300K in 100K intervalls. Each intervall was NPT-simulated for 10000 steps with a stepsize of 2fs. After the heating was completed the system was NPT simulated for another 10000000 steps (20ns) for equilibration. The last frame of the resulting trajectory was then aligned to the input structure (figure [/]) and a the global root-mean-square deviation of atomic positions (RMSD) was calculated (table 2).

Table 2: Calculated RMSD-values after equilibration for the wildtype \(\beta\)-glucuronidase and the Matsumura variant. {{{3}}}

Structure RMSD (Angstroem)
Wildtype (3lpg) 1.412
Matsumura mutant 1.728
As the RMSD-values are comparable there is no devastating effect of the mutations suggested by Matsumura matsumura2001vitro on the protein folding. Thus we consider the region where the mutations were introduced as mutable regions.

Definition of Mutationsites

To facilitate the cloning process in the wetlab and accelerate the production, we limited the mutagnesis on the glucuronidase sequence to three patches (Figure [/]). The defined patches are located around the active site with fragement A and C partly forming the enzymatic pocket. All positions mutated by Matsumura are contained in the defined patches.
Figure [/]: Patches defined for mutagenesis on the \(\beta\)-glucuronidase sequence.
Displayed is the wildtype \(\beta\)-glucuronidase, the catalytic residues are colored in yellow. The sequence patches defined for mutagenesis are colored in green: Fragment A, 351-371, blue: Fragment B, 506-512 and magenta: Fragment C, 548-568. All patches are located closely at the active site, where fragments A and C partly form the pocket.

Variant Selection

As our resources were limited, it was not feasible for us to synthesize all variants suggested by GAIA. Thus we had to carefully select the variants to be asserted in the wet lab. Matsumura et al. matsumura2001vitro reported a set of single amino acid substitutions and a reprogrammed GUS with 5 amino acid substitutions. In order to compare the predictions made by GAIA with the variants reported by Matsumura, we incorporated single amino acid mutations on positions matching those of the Matsumura variants. For two of these positions, D508 and T509 we additionally assembled the exact variants reported by Matsumura et. al for direct comparison to the predictions made by GAIA.
For the sequence prediction we ran GAIA over 2000 generations and constrained the mutation sites to the determined sequence areas. The number of mutations per sequence was limited to 10 with an initial mutation rate of 10. The mutation rate had a linear decay of 1 mutation every 300 generations, to facilitate conversion and backmutations were allowed throughout the whole run.
Next, the predicted variants were manually curated and investigated for steric conflicts. The single amino acid substitutions were selected with respect to the overall mutation frequencies in these postitions. The resulting set of variants is listed in table 2.

Table 2: Set of synthesized GUS variants. {{{3}}}

Variant Fragment

D508V

B
D508G B
T509L B
T509A B
K568G C
N556S C
V355M, F357D, N358L, G364L, D508L,
T509A, F551G, D553F, F554E, G565L
A+B+C
We synthesized a set of 6 GUS variants. In order to be able to compare the power of in silico evolution by GAIA the set comprises single amino acid mutations on the positions reported by Matsumura et al matsumura2001vitro. These positions are D508, T509, K568 and N556. For two of these positions, D508 and T509 we additionally assembled the exact variants reported by Matsumura et. al for direct comparison to the predictions made by GAIA.

Experimental Design - Wet-lab

Cloning of different GUS Variants

An expression plasmid of the wiltype GUS was obtained from AddGene. In this plasmid, GUS is provided under a lac-promoter and a His-tag is attached to the C-terminus of the protein. First, the wildtype GAL was cloned into the same plasmid. Second, different mutants of the glucuronidase were created. With this method, seven different mutants were generated.

Expression and Purification

After cloning, one colony of each transformation was inoculated as preculture overnight at 37°C and 220 rpm. The main cultures were inoculated with the respective volume of the overnight cultures to a final density of 0.01 at the starting point. The inducer IPTG was added after cultures reached an OD600nm of 0.6 - 0.8 and were afterwards incubated for 20 hours at 25°C, 220 rpm. Cells were centrifuged and lysed as described in the protocol(ref). Lysates were subjected to the Ni-NTA cartridge and affinity purified throughout several washing steps. Purified proteins were obtained after elution with 500 mM Imidazole solution followed by dialysis. The protein concentration was determind using both, the photospectrometer and a Bradford assay.

Activity Assay

To determine the activity of the different mutants, an assay with p-nitrophenyl-galactopyranoside (PNPG) as substrate was set up. PNPG is a perfect substrate for GAL. The enzyme hydrolyzes the p-nitophenyl group from the galactopyranosid. The yellow color of the product can be measured at 420 nm. The enzyme was added to different concentrations of the substrate, ranging from 0.047 mM to 3 mM final conentration and the reaction was followed over time for 45 minutes at 21 °C.
With the data, generated from this assay, the kinetic parameters according to Michaelis Menten were calculated via nonlinear regression.

Results

Results - Wet-lab

When the enzymes were cloned and purified, kinetic Assays were performed to reveal their activity. We tested the mutants, as well as both wildtype enzymes, GUS and GAL as controls. As expected GUS showed no enzyme activity at all. No significant product formation could be measured. Among the inactive proteins were two mutants, which were previously published by Matsumura et al. matsumura2001vitro. As the published activity increase was only 1.5-2 fold and our enzyme concentration was relatively low, it is no surprise, the we couldn’t determine these activities. For the wildtype GAL we could determine a Km value of 0.3466 mM and a Kcat value 49.38 1/s. The Km is higher and Kcat is lower, than similar values in the literature (see Juersetal et al.JUERSETAL..2003), which is mainly due to the reaction conditions, like assay buffer and temperature. The kinetic parameters that were determined can be seen in table 4.

Table 4: Kinetic Key Values of wildtype GAL (GAL_wt) and the best GUS mutant T509L

Protein Km[mM] Kcat[1/s] Km/Kcat[1/(M*min)]
Gal_wt 0.3466 49.38 8.5E6
GUS_T509L 0.1834 25.02 8.1E6
Figue [/]: Colorimetric Assay for the Determination of GUS Mutant Activity
The photo was taken after 45 min reaction time. One can clearly see that the wildtype GUS as well as most of the mutants show no activity. GUS_T409L instead has an intense yellow color.
Figue [/]: Time Course Measurement of Galactodidase Activity
The product concentration is plotted against the educt concentration for various time points after reaction initiation. The time points are given in seconds.Both Enzymes, the wildtype GAL and GUS_T509L show strong enzymatic activity on PNPG.
Figue [/]: Determination of Catalytic Constants
For the last time point that was measured, 2640 s, product concentration is plotted versus substrate concentration. Km and Kcat were determined via nonlinear regression. They prove a high overall activity of GUS_T509L.
Figure [/]: Comparison Between the Wildtype Proteins and GUS_T509L
Our assay demonstrated that the wildtype GUS has no activity on a GAL substrate. The mutant predicted by GAIA however, exhibits extraordinary enzymatic activity on the GAL substrate.

Discussion

With help of our AiGEM software suite we successfully reprogrammed a \(\beta\)-glucuronidase (GUS) variant towards \(\beta\)-galactosidase (GAL) activity. We ran equilibration molecular dynamics simulations on the wildtype and the variants reported by Matsumura et al. matsumura2001vitro. The RMSD values of the simulated structures compared to the respective input structue suggests that the amino acid exchanges introduced by Matsumura do not entail structural changes. Thus we liberately defined the mutative windows for GAIA with a constant central amino acid to facilitate the cloning procedure. As the definintion of mutative windows reduces the degrees of freedom for GAIA the outputs are bias by this constraint. Despite this constraint GAIA-scores converged within 2000 generations, however with lower overall scores than for open mutation.
The library suggested by GAIA was subsequently curated manually in order to select single substitution variants. The selection was based on the mutation frequencies over all 2000 generations and the an additional assertion in PyMOL.
The single amino acid substitution T509L was suggested by GAIA with the highest frequency in that position (see figure [/]) and selected based on the substitution frequency. The mutant T509L displayed a significantly higher activity compared to all other asserted variants, including the variant T509A reported by Matsumura matsumura2001vitro. We asserted the kinects of this variant and compared it to the wildtype GAL. The mutant had a \(K_{cat}\) of \(25.02 \frac{1}{s}\) and a (K_{M}\) of \(0.1834 mM\). This is not only significantly higher than the tested wildtype GUS, which lacked any activity, but also close to the activity of the wildtype GAL.
Under our experimental conditions none of the variants reported by Matsumura displayed any significant activity. The reasons for this oberservation could lay in the different experimental conditions, differing in buffer, temperature and substrate concentration. Further Matsumura reported a maximum increase of activity of 2 log-scales compared to the wildtype. We are confident that this slight activity increase would not be measurable under our experimental condidtions.

Appendix

Table 1: Selected single and double mutants proposed by GAIA. Of the pool of proposed \(\beta\)-lactamase sequences we selected 9 single and 16 double mutations for wet-lab validation.

Mutation Fragment Classification_score
G236L F 0.56584185
G216C_P217F E 0.56683183
L218D E 0.56554008
L167G_P164L D 0.56678349
P164G D 0.56673431
P164H D 0.56748658
D129L_N130I C 0.56734717
D129L C 0.56697899
E102F B 0.56683093
Y103D B 0.56666845
K71W A 0.56605697
M67G A 0.56762165
M67G_F70G A 0.56777865
K71W_G236L A+F 0.56503928
M67G_P164G A+D 0.5675621
M67G_P164H A+D 0.56818455
E102F_P164H B+D 0.56749099
E102F_P164G B+D 0.56676418
E102F_G236L B+F 0.5658384
E102F_M67G B+A 0.56763196
D129L_P164G C+D 0.56688529
D129L_M67G C+A 0.56781197
D129L_L218L C+E 0.56570756
L218D_P164G E+D 0.56537378
L218D_K71W E+A 0.56467879
WT None 0.56685823
WT None 0.56685823

References