This year our project is the introduction of acrylic synthetic routes in Escherichia coli or
Saccharomyces cerevisiae to produce acrylic acid.
Primitive metabolic path map in E.Coli
We have a rational new design and transformation of the core enzyme ceaS2, at the same time,
we also want to be optimized to improve the acrylic acid production in the metabolic flow.
We know that for Escherichia coli, the carbon flow rate of its original glycerol metabolic
pathway may not be sufficient, and if the new glycerol metabolic pathway can be used to increase
the carbon flow of DHAP or G3P, the substrate of the core enzyme ceaS2 can be increased Concentration
to increase acrylic acid production.
Therefore, through the literature review, we found two enzymes which can achieve efficient
conversion of glycerol to generate DHAP the same way.
In our new approach, Glycerol dehydrogenase (Gly DH) is capable of efficiently converting
glycerol to 1,3-Dihydroxyacetone (DHA) and then phosphorylates DHA to DHAP via Dihydroxyacetone
kinase (DAK).
New route map in E.Coli
Before the implementation of the formal experiment, we need to model it to analyze the impact
of the introduction of new routes on the original metabolic flow, especially the two intermediates
of DHAP or G3P. Specifically, we care about the following two issues:
1. Has the DHAP or G3P's carbon flow improved after the introduction of new metabolic pathways?
Is it compared to the previous increase in production?
2. The introduction of new pathways after the entire metabolic pathway is stable and robust.
How is it?
In order to answer these two questions, we established a carbon metabolic flow model.
The overall workflow is as follows:
Parameter estimation
There are many parameters to be determined in the model. Most of these kinetic parameters can
be found in the literature or in the database, but at the same time, there are some kinetic
parameters of the enzyme we are looking for. Its organic matter, or the temperature and ph
of the enzyme are different. Therefore, we need to re-estimate this part of the parameters.
In the process data link, we cited the method using the data point weighting of University
of Manchester in year 2016 . The weighting of the samples is as follows:
1. When the sample PH is the same, the sample is weighted by 4 .2 when they are close. 1
when they differ much.
2. When the sample temperature is the same, the pH is the same.
3. When the samples are from the same species , the weight of the sample is 4.When they
are the non-identical species and are the prokaryotes,or the corresponding species mutated
to the corresponding species, the weight is 2. When they are the non-identical species and
are the eukaryotes, the weight is 1.
4. Try to delete the missing data. If there are some essential samples of the temperature
and PH missing, then the corresponding weight is 2.
1.kernel density estimate 2. Gaussian mixed model.
The fourth point reflects our point of view of Bayesian. In the absence of prior knowledge
of the case, we take as much as possible the weight of neutrality.
Based on the points above, we get a new parametric vector after weighting.In our model,
we do the fitting of the probability density according to the two methods of the parameter
vector: 1.kernel density estimatebsp;2. Gaussian mixed model.
The basic workflow of parameter estimation:
The Gaussian mixture model can be approximated to any real probability distribution in theory.
The EM algorithm is used to estimate the parameters required for the model. And we use the
Gaussian mixture model to estimate the probability density of the possible distribution of
parameters.
After making the probability distribution, we select Bin randomly, which meet the conditions
of width = len / 10.And we select the most possible bin based on the CDF, and estimate the
corresponding parameters when the bin reach average value.
Finally, we get the estimated parameter values, as well as the corresponding parameters
of the original PDF. The specific form and parameter values are as follows:
The reaction path of the original pathway is Gly to Gly-3-p and then to DAHP
The original pathway belongs to the reaction of a single channel, and there is a random
bibi reaction and an irreversible Mickey equation reaction. The reaction involves two enzymes
paticipating - glpk and glpD. We assume that the reaction concentration of these two enzymes
is 0.01 mM, assuming that the initial [Gly] concentration is 10 mM, the initial concentration
of ATP 10 mM, The concentration of Gly-3-p 0 mM and the concentration of DHAP 0 mM at the
same time.
In this reaction, we make the following assumptions about our model:
1. The ATP of the E.coil system is given externally completely, assuming that the culture
conditions given externally are sufficient and ATP maintains a stable constant.
2. Assume that the substrate involved in the reaction does not participate in other reactions.
In order to determine the yield of the target product, we chose to observe the efficiency
of the DHAP yield estimation system in view of the lack of basic ceaS2 enzyme data.
We can observe that the DHAP stops growing close to 50 minutes.
Then, we need to test the carbon pathway through the modified pathway, And add a metabolic
pathway enzyme-catalyzed by GlyDH enzyme and DAK in the original path, while the need for
NOX enzyme and CAT enzyme from the role of NAD + supplement, resulting in DHA, and finally
Phosphorylation produces DHAP.
metabolic flow after the transformation of the reaction model. According to the actual situation
of the reaction, we make the following assumptions:
1. In the reaction , due to the process of hydrogen peroxide to the production of O2, that
is, the process of generating acceptor, is faster, we will regard the reaction of NOX enzyme
NADH catalytic as an ordinary Michael's equation, rather than ordered sequence reaction.
2. Random pairs of sequence reactions and ordered sequence reaction equations are identical.
So we substrate which is identified as the [A] substrate depending on the integrity of the
data.
Metabolic pathways after transformation
We can see that the rate of DHAP is faster after changing the metabolic pathway, which means
that the higher the output per unit time after being put in use, the sooner the reaction
is done. Compared to the pre-improved pathway ,the reaction finishes roughly five minutes
ahead.
Sensitive analysis
In the previous pathway study, we noted that the Kcat values of glpK and GlyDH enzymes are
unknown (we do not have a large deviation in the absence of a sample expressed in E.coil).
The Kcat value of the reaction of propanal is the Kcat value of the reaction of Gly and
GlyDH. It is also assumed that the K63 ratio of the two enzymes is 2: 1.
We often use Kcat / Km to describe the catalytic efficiency of different enzymes for the
same substrate, and as a result of our experiments, GlyDH exhibited higher catalytic efficiency
than glpK. Thus we assume that the GlyDH enzyme and the glpK enzyme satisfy the following
relationship:
GlyDH enzyme activity and the Km value for gly and the Km value of the glpk enzyme to gly
have been known in previous studies. Next we adjust the alpha coefficient to study the effect
of different ratios on the overall metabolic flow.
KATA Sensitivity Test before Modification
KATA Sensitivity Test after Modification
We show the highest ratio (1000) and the lowest ratio (0.001) in yellow and blue lines respectively.
The pre-transformation pathway is most sensitive to the change of Kcat in glpK enzyme, and
the metabolic pathway of the target substrate is transformed with the change of α Rate is
always higher than the pre-transformation pathway, even when the glpK Kcat / Km value is
100 times the GlyDH, the reason may be DAK enzyme catalytic efficiency’s higher than glpD.
As we can see, the previous reaction is dependent on ATP, and in the previous hypothesis,
we make ATP stable in the constant .In order to analyze the ATP concentration changes on
the impact of glycerol conversion, we add a Standard deviation of 0.05, mean of the normal
distribution of variables to disturb the timing of the concentration of ATP in ODE.
For the sake of us to observe the significant results, we assume that the initial concentration
of ATP is 0 and is always greater than zero.
And the change curve of the concentration in 60min is shown below:
Before transformation
After trransformation
We found that the random change of ATP concentration had a significant effect on the pathway
after transformation, and the rate of DHAP synthesis was lower than that before transformation.
But when we adjust the standard deviation of the normal distribution random variable to
0.05, the result is shown below.
Thus, we found that even if ATP had a greater perturbation, the overall level was relatively
high in 0-60 min compared to the previous standard deviation of 0.02. While the transformation
of the metabolic pathway also reflects a more stable curve of change. At this point the concentration
of DHAP is not significantly affected by changes in ATP concentration.
Thus, in actual production, we only need to keep the ATP concentration at a slightly higher
level, not only to ensure the production of the target product, but to increase the stability
of the system as well.
Mutation Design of ceaS2 by using AEMD
Abstract
Engineering for the desired enzyme catalytic properties plays an important role in the biosynthesis
of bulk chemicals and natural products. However, it is a time-consuming task to improve enzyme
catalysis by traditional random mutagenesis. And the utility of rational design based on
protein structure often was limited by the lack of protein structure for target enzymes and
professional backgrounds of bioinformatics.
ceaS2 enzyme is the most important enzyme in our entire acrylic acid synthesis pathway,
but the activity of wild type is not high. So it is exceedingly necessary to modify it on
the basis of the "part" level to improve its catalytic reactivity. We used the AEMD platform
to conduct the mutational design for ceaS2 enzyme in order to figure out a more accurate
scheme of mutation, which can also exert great beneficial impact on the later experiments.
We have totally identified 32 mutational sites, and its point mutation transformation. The
experimental results show that there are 11 sites, where the enzyme activity gets boosted,
after the transformation. Compared to wild type ceaS2 enzyme, the highest activity has increased
by 11 times, whose effect is obviously noticeable. This also demonstrates the ability of
this designing platform.
Introduction
Enzyme engineering has been extensively used to optimize biocatalysts in industrial biotechnology
since most of enzymes in nature prefer to organisms adaptation but not industrial production
(Alvizo, et al., 2014; Ma, et al., 2009; Savile, et al., 2010). Traditionally, optimized
enzymes were obtained by random site-directed or saturated mutagenesis such as Error Prone
PCR, DNA shuffling and so on (Kabumoto, et al., 2009; Qi, et al., 2009; Reetz and Carballeira,
2007; Yep, et al., 2008). Due to the immense possibility of sequence mutation at amino acids
level, it is a time-consuming and low efficiency task to obtain a high efficient biocatalyst
by random mutation.
With the availability of an increasing number of protein structural and biochemical data,
rational design of enzymatic mutation has become more and more popular (Bloom, et al., 2005;
Chica, et al., 2005; Kiss, et al., 2013; Li, et al., 2012; Steiner and Schwab, 2012). Many
strategies have been used to obtain evolutionary information, catalytic sites and substrate
channels by integrating sequence and structural features of enzymes. Previous studies have
developed many effective computational tools for enzyme engineering, such as the enzyme design
software Rosetta (Leaver-Fay, et al., 2011) and stability design software Foldx (Van, et
al., 2011) and so on (Table S2). However, most of them only focus on one feature, like the
thermo-stability based on the known PDB structure, and often request professional backgrounds
in protein structure, biochemistry, bioinformatics and so on.
What is AEMD?
AEMD is a web-based pipeline, which integrates several approaches together for enzyme stability,
selectivity and activity engineering. This pipeline can generate comprehensive reports, which
include the recommended mutation for improving enzyme catalytic property. Specifically, users
can get the recommended mutation only inputting sequence information of target enzymes, which
is very useful in the situation without professional knowledge and the known protein structure,
since AEMD contains a functional module that can automatically predict structure of the target
enzyme based on the known structures in Protein Data Bank (PDB).
AEMD-Web provides a web interface, enabling users to conveniently predict mutants which
could improve the stability, selectivity and activity of enzymes. Users can obtain the suggestion
of mutations for almost all enzyme even without protein structure. In the future, we will
construct a comprehensive enzymatic mutant database and integrate new computing technology,
to improve the efficiency of enzyme engineering in industrial biotechnology.
Fig.1 Workflow of the Stability analysis (A), Selectivity analysis (B) and Activity analysis
(C).The blue color rectangle blocks represent the inputs of sequence or PDB file, and the
output of recommended mutation sites. The green and gray color rectangle blocks represent
the evolution- and energy-based analysis process, respectively. The yellow color diamond
blocks represent the use of other softwares and approaches. The processes were shown in
Supplementary methods in more detail.
Process
This time we utilized AEMD's Stability mode (click here for AEMD user's guide) to screen for
mutational sites that benefit the ceaS2 enzyme activity.
Because of the complexity of enzyme catalysis, it’s difficult to predict point mutation
improving protein activity accurately. How AEMD work?
Firstly,the development team of AEMD recently described a method which is able to identify
desired mutations by analyzing the coevolution information of protein sequences (Liu, et
al., 2016). In the AEMD-web, some point mutations are suggested by this method. Besides,
AEMD’s analysis generated some residues close to active center and transport tunnels which
are recommended to saturated mutation to improve activity (Fig. 1C). For the input of target
protein sequence, AEMD first obtain the PDB file using RosettaCM (Song, et al., 2013). Next,
the substrate of template PDB was mapped into target PDB using the “struct_align” funciton
of Schrodinger software (QikProp, 2015). The spatial location of substrate in target PDB
can help to determine the ligand-binding pocket of target enzyme. If all potential template
PDB had no substrate in the PDB file, AEMD predicted the ligand-binding pocket by a Rosetta
script (gen_apo_grids.linuxgccrelease) (Zanghellini, et al., 2006). After the determination
of ligand-binding pocket, AEMD generated the possible catalytic sites by search local Catalytic
Site Atlas (Furnham, et al., 2014); the residues within 5Å distance from ligands by calculating
the minimum distance between residue and substrate; and the residues located within 3 Å distance
from transport tunnels by CAVER (Chovancova, et al., 2012).(see the Fig.1 (C))
We submitted the amino acid sequence and PDB file of ceaS2 online and got the prediction
result in half an hour.As shown below, you can also
download PDF version.
Result
We first selected the mutations within the 5Å distance of active site, altogether 33 kinds,
and then used point mutation to conduct molecular cloning operation. Next step was to synthesize
the acrylic acid using the whole cell catalysis and determined the acrylic acid yield by
HPLC. The results are as follows:
In these total 33 mutations of mutational sites, there are 11 mutations' acrylic acid yield higher than that of the wild
type, which indicates a higher activity. The highest mutational site F438M presents a yield
11 times the wild type. Therefore, it is valid and tangible for us to implement AEMD to design
the mutational sites!