Difference between revisions of "Team:Uppsala/Model"

Latest revision as of 14:34, 7 December 2017

Modeling

Summary

The three enzymes in our crocin pathway are relatively newly discovered and thus poorly characterized, with no structures available. In addition, we were only able to find enzymatic parameters concerning one of the enzymes, UGTCs2 (1). Via homology modeling we could draw the conclusion that UGTCs2 and CaCCD2 are monomers while CsADH2946 is a tetramer. We could also ensure that the N-terminal was facing outward from the protein and could safely put a His-tag there for purification. Furthermore, the extensive 100 ns MD simulation of our models verified that the models were stable in solvent and that we could further work with the models to estimate the binding affinity between enzyme and substrate. Steered molecular dynamics simulations showed that CsADH2946 is specific to its substrate crocetin dialdehyde and that it has an affinity in the µM range to the substrate. The estimation of K_M from the experimental activity measurements also showed that CsADH2946 has a high affinity towards the substrate crocetin dialdehyde, in agreement with Steered Molecular Dynamics results.

Homology Modeling of the Uncharacterized Enzymes

One of the first things we noticed when looking at our selected enzymes was that no three-dimensional structures were available and the enzymes were poorly characterized in general. To be able to get structural information we turned to homology modeling.

The idea of homology modeling is to construct a 3D-structure by mapping the amino acid sequence onto a template – i.e. a known structure of a related, homologous protein through sequence alignment. For this to work you need a template with a solved 3D-structure. The quality of the model is determined by the alignment with, and, the structure of the template. Thus, ideally, you would like a high-resolution structure with high sequence identity.

The tool we used to generate our homology models was SwissDock /2/, a fully automated protein structure homology-modeling server (2, 3, 4). We plugged in the sequences of our chosen enzymes ((CaCCD2), (CsADH2946) and (UGTCs2)) and started modeling. The templates were chosen based upon quality parameters such as Global Mean Quality Estimation, QMEAN, resolution of the crystal structure and sequence identity (5). After choosing the best templates we obtained the models detailed in figure 1 and the models along with the quality scoring are summarized in table 1. We could make two immediate observations. The CsADH2946 model seemed the most promising one, quality-wise with GMQE close to 1 and higher QMEAN being better. In addition, looking at the N-terminals (the blue ends in figure 1) we could see that they are stretched out, poking outwards from the protein. This was a good indication that we could put a His-tag at this end, with no complications. Another discovery we made was that the second step enzyme CsADH2946 is most likely a tetramer. This information was helpful when purifying the enzyme and going forth with molecular dynamics.

Table 1. The enzymes modeled, which templates were used and quality assessment by means of GMQE and QMEAN. GMQE ranges between 0 and 1, where one is good and zero is bad. As for QMEAN the higher the score, the better(1).

Figure 1. Homology models of the three enzymes selected for the zeaxanthin-crocin pathway. The first and the third enzyme are monomers, while CsADH2946 is a tetramer. The homology models were created using SWISS-MODEL /2/.

Molecular Dynamics in GROMACS

– The Art of Putting Digital Molecules in Digital Boxes of Water

To assess the models and prepare them for further characterization we used GROMACS /1/ to simulate our enzymes in saline water for a total of 100 ns. This lets us assess their stability and obtain new models that are closer to their native conformation which would be the most probable state of the enzymes during the activity measurements in our wet lab.

How We Simulated Our Enzymes

Molecular dynamics is a simulation approach where the potential energies of a molecular system is parameterized using a predefined force-field. A trajectory of the atoms of the system, that is how the atoms move over time, can then be simulated by numerically intergrating Newton's equations of motion.

The MD simulations were performed based on previously defined protocols, which were nicely summarized by Justin Lemkul in a tutorial (6). All the steps until the production MD can be done on practically any system while the production MD will require a supercomputer to reduce time. Next follows a short description of the main steps in the protocols as well as some tips and general comments.

Define Box and Solvate

The protein has to be put in a box containing solvent where the simulation will take place. We chose a cubic box with at least 1 nm between the protein and the box. The simulation is run with periodic boundary conditions, meaning that if the molecule touches the boundary it will appear on the other side of the box.

Add Ions

We neutralized our enzymes and added ions to a concentration of 0.15 M to get closer to physiological conditions (7).

Energy Minimization

In this step the energy of the system is minimized using a steepest descent algorithm. This is done to remove any conformations that are unfavourable in terms of energy. The potential energy of our system was calculated using the CHARMM27 force-field in this and all later steps of our simulations.

Equilibration

Equilibration is done to relax the system into an equilibrated state. It starts of by doing a simulation with constant volume, temperature (310 K) and number of particles. Once the first one is finnished, another simulation with constant temperature (310 K), pressure (1 bar) and a constant amount of particles is run. The system is also equilibrated in the beginning of the production simulation, but by performing a more refined equilibration protocol in advance, simulation time is saved.

Production Molecular Dynamics

Once the system was equilibrated production MD was carried out with constant pressure (1 bar), temperature (310 K) and number of particles. The systems were simulated for 100 ns on a supercomputer with 24 processors running in parallel. The time it takes for the simulation to complete is dependant on the amount of particles present. Because CsADH2946 is a tetramer with the highest amount of atoms in the simulated system, it took the longest time to finnish (approximately one month).

The Resulting Stable Enzyme Models

The result we got from the 100 ns stability simulations in GROMACS show a trajectory of each enzyme wiggling around trying to equilibrate into a low energy state. We created movies of these simulations that can be viewed in figure 2.

Figure 2. These videos are the result of simulating CaCCD2 (leftmost), CsADH2946 (middle) and UGTCs2 (rightmost) for 100 ns in saline water.

The movies all confirmed that our homology models are good in the sense that they are not melting. Along with the movies we received RMSD plots. The RMSD plot describes the distances between the atoms of the initial conformation going into the simulation and the atoms of the different conformational states the enzyme develops over the 100 ns. As can be seen in figure 3, The RMSD curve for each enzyme model converge towards one low, stable value. This shows that the models equilibrate to a stable state with low energy in the solvent and indicates that our homology models are in fact good models to represent our enzymes.

Figure 3. RMSD plot of a 100 ns stability simulation in GROMACS of homology models to the enzymes CaCCD2, CsADH2946 and UGTCs2.

Steered Molecular Dynamics (Pulling) with CsADH2946 and Crocetin Dialdehyde

Since CsADH2946 had great advancements in the lab, has the best homology model score of our enzymes and also has the RMSD which converges fastest at the lowest value showing that it is a stable structure, we decided to focus the rest of our computer power to do Steered Molecular Dynamics (pulling simulations) with this enzyme. A pulling simulation between the protein and a substrate allows us to estimate the dissociation constant K_d which can be compared to K_M. This is very useful to compare to the experimentally estimated K_M for CsADH2946 in order to validate both the simulation and experiments.

Docking

To be able to execute pulling simulations we had to pick the ligands that were going to be pulled in the simulations. The obvious ligand of choice was crocetin dialdehyde, which is converted to crocetin by CsADH2946. To be able to verify that CsADH2946 specifically binds crocetin dialdehyde, another ligand, the common acetaldehyde was chosen to be able to compare the enzyme's specificity. Acetaldehyde was run in a separate simulation to crocetin dialdehyde.

Before the pulling simulations can be started, it is required to select an initial configuration where the ligand is bound to the active site of the enzyme. This position was found by docking the ligands with one subunit from the CsADH2946 tetramer. The location of the active site was found by looking at the template molecule for the homology modeling (PDB entry 4fqf) (8). The active site of this homology template can be found at a loop containing three cysteines (C301, C302 and C303). In our homology model that corresponded to C337, C338 and C339, and docking was therefore performed in the patch of the protein where this loop can be found. The docking of both ligands were done using SwissDock (9, 10), where the structure used for CsADH2946 was an equilibrated conformation 40 ns into the production MD simulation. The best docked structure where the ligand was in close contact with the triple cystein loop was then chosen as starting configuration for the pulling simulation.

Pulling

Steered molecular dynamics (pulling) is a way of estimating the binding affinity between a protein and a ligand. We based our simulations on a previously defined protocol by Justin Lemkul (11). Before starting the simulation we needed to select a trajectory in which the ligands were to be pulled. The choice of trajectory was based on a path that was not obstructed by any residues. After being equilibrated for 200 ps, the pulling simulation was run for 8000 picoseconds pulling the ligands at a velocity of 0.25 nm/ns. The results from the simulation can be seen in the movies below (figure 4) and in figure 5.

Figure 4. Results of the pulling simulations of one subunit of CsADH2946 with crocetin dialdehyde (leftmost) and acetaldehyde (rightmost). The molecules were pulled with a velocity of 0.25 nm/ns.

These indicated that more force is required to be able to pull out crocetin dialdehyde than acetaldehyde from the active site of CsADH2946 as the peaks in figure 5 can be interpreted as energy barriers being crossed. This can also be seen in the movies where crocetin dialdehyde gets stuck in two positions. Thus, the CsADH2946 model demonstrates specificity towards crocetin dialdehyde.

Figure 5: Pulling force applied to crocetin dialdehyde and acetaldehyde from one subunit of our CsADH2946 homology model. For crocetin dialdehyde there are two peaks which indicates energy barriers where more force is required to pull crocetin dialdehyde out. For acetaldehyde there is no clear peak indicating that our model is not able to bind acetaldehyde as strongly as crocetin dialdehyde.

Using the Pulling Simulation to Estimate the Dissociation Constant of Crocetin Dialdehyde

The resulting maximum force applied to crocetin dialdehyde could then be used to roughly calculate the dissociation constant. To do this the following equation was used where 𝜟G is Gibbs free energy, R is the gas constant, T the temperature and K_d the dissociation constant:

\Delta G = -RTln(K_{d})

What we need is 𝜟G to be able to calculate the dissociation constant. To get 𝜟G the maximum force (284.2160 kJ/(mol*nm)) was first converted into pN. The resulting force was 471.9628 pN. 𝜟G was then calculated by extrapolating data from a study where they plotted the maximum force against their experimental 𝜟G values (12). Calculation of K_d using the equation above yielded a value of 4.9321 µM.

Thus we are able to compare the dissociation constant from our molecular dynamics simulations to the value of K_M (~20.7842 µM) we estimated in our Bayesian inference model using our experimental data. Under the equilibrium approximation where substrate and enzyme complex are in an instantaneous equilibrium K_M and K_d are equal. From our experiments and simulations we indeed see that the estimated K_M and K_d are similar considering the fact that the dissociation constant has been calculated from an energy. This further shows that the molecular dynamics simulation is accurate, validating the result that CsADH2946 is specific to crocetin dialdehyde.

Experimental Estimation of K_M for CsADH2946

Since the kinetic parameters for CsADH2946 have never been determined, we wanted to characterize the Michaelis-Menten kinetics of the reaction and estimate K_M of our CsADH2946 using the activity measurement results. The substrate crocetin dialdehyde has low water solubility and will therefore partially precipitate when coming in contact with water. Quantitative kinetic characterization of such a system is especially complicated since the substrate concentration at the beginning of the experiment, S₀, is not known. We have therefore developed a probabilistic method that is capable of estimating Michaelis-Menten model parameters as well as the error associated with them for such systems and we demonstrate its use on our enzyme CsADH2946. The MATLAB-script we created can be downloaded here.

Enzymatic reactions are often modeled with Michaelis-Menten kinetics. In this model, the reaction rate is described by the differential equation:

(1)

where [C] and [S] are the concentrations of product and substrate, respectively. V_max is the maximum production rate of the system at saturated substrate concentration and K_M is the substrate concentration when the reaction rate is half of V_max. Reaction rate is not directly measured in the assay, so we used the experimental product concentrations Ĉ(t) to estimate the Michaelis-Menten parameters. This was done by integrating equation 1 numerically and fitting the integrated C(K_M,V_max,S₀,t) to the experimentally measured product concentrations, Ĉ(t).

The product concentrations, Ĉ(t), were estimated at every timepoint from the experimentally measured absorbance spectra. Fitting was then performed by minimizing the sum of the squared differences (SSD) between the model prediction, C (from integration) and the experimentally measured progress curve, Ĉ(t). These fitting methods are good at finding the parameters that describe the data best but give no information about the uncertainty in the parameters described by the data. Use of methods minimizing the SSD is further complicated when the substrate concentration at the beginning of the experiment, S₀, is not known, introducing further uncertainty in the data and demonstrating the need for good error estimation of the model parameters.

We developed a probabilistic method for estimating the kinetic parameters and the uncertainty in them. The method computes and integrates the probability density P(K_M,V_max,S₀,σ) as a function of all model parameters. We set the notation θ = (K_M,V_max,S₀,σ), where σ is the model noise. In this method we used Bayesian statistical inference to estimate the parameters from the experimental data. In short that means calculating P(θ|Ĉ); the probability of the model parameters, given the experimental data. We estimated this distribution from the experimental data and the mean and variance of each parameter could be calculated as:

(2)

The posterior probability P(θ|Ĉ) was calculated using Bayes theorem:

(3)

where the probability of the data P(Ĉ) is a normalization constant, the prior P(θ) describes your expectations of the model parameters before observing any data and the likelihood P(Ĉ|θ) is the probability of the data given the parameters.

The likelihood is a function of the product concentration C(K_M,V_max,S₀,t) calculated from the Michaelis-Menten model. We modelled the deviation between the experimentally measured concentrations and model product concentration C as independent and identically normally distributed at every time-point. This gave us the likelihood

(4)

where σ is the standard deviation of the normal distribution describing the deviation between the data and the model. With a flat prior (a uniform prior where all model parameters are equally likely before observing any data) the posterior P(θ|Ĉ) can simply be written as

(5)

(6)

where Z is a normalization constant.

To estimate the mean and variance of our four model parameters we calculated the integrals in equation 2 and 3, i.e., numerically integrating the likelihood L(θ) over the four dimensions of the parameters. It is a problem that is very numerically complex if L(θ) is to be evaluated for all values of θ. Therefore, we implemented a version of importance sampling, where we only evaluate L(θ) were it is large enough to contribute to the integral. We start at the maximum likelihood set of parameter θ_max (found by SSD fitting) where L(θ) is maximized and take discrete steps in θ until L(θ) is essentially 0. Therefore we do not have to compute L(θ) for all values of θ, but only at the values that actually contribute to the integral. A visual of this is displayed in figure 6. We saved the values of L(θ) of all these points in four dimensional parameter space and the marginalized distribution for the individual parameters could then be calculated. For example for K_M.

(7)

The mean and error estimate of K_M and the other parameters was then obtained by calculating the mean and standard deviation of the marginalized distribution. A plot of the marginalized distribution for each parameter can be seen in Figure 7.

Figure 6. Illustration of importance sampling of L(θ). Only samples of L(θ) that are large enough to contribute to the integral are evaluated (the blue area). Therefore, the starting value is θ_max where L(θ) is maximized and discretes steps are taken until the values of L(θ) are close to 0 and does not contribute to the integral.

Figure 7. Plot of the marginalized distribution for each parameter estimated. A. V_max B. K_M C. S₀ D. Model noise.

The Resulting Estimated Kinetic Parameters

So, the kinetic parameters for CsADH2946 were estimated using this probabilistic method where the experimental data was fitted to the model. As you can see in Figure 8, we could compute a curve (blue) with the estimated parameters that fit the experimental data. This means that our parameters are good approximations that can be used to describe the actual data. We obtained K_M = 20.7842 µM ± 3.5264 with standard diviation.

Figure 8. Plot of experimental product concentration Ĉ against time (red) and a modelled curve using estimated Michaelis-Menten kinetic parameters to best fit the experimental data (blue). The values of the estimated kinetic parameters obtained by the algorithm are also displayed in the plot. K_M = 20.7842 µM ± 3.5264 with standard diviation.

To evaluate our estimated parameters, we first compared the values obtained by our algorithm with values of the parameters calculated with maximum likelihood. These were similar, which shows that our values are reasonable. To further analyse and validate our algorithm, we ran our algorithm on a simulated a progress curve based on the estimated parameters but with added normally distributed noise. The fitting of the simulated noisy data using our algorithm should return similar parameter values as the initial experimental data generated. As can be seen in figure 9, the parameters from the fitting of the simulated noisy data did return similar parameter values. We also tried to compare our results with parameters estimated from fitting the experimental data in DynaFit. However, DynaFit was unable to return a good model fit, possibly because of our unknown substrate concentration in the beginning of the experiment.

Figure 9. Plot of simulated data based on the estimated parameters with added normally distributed noise (red) and a modelled curve using our algorithm to best fit the simulated noisy data (blue). The parameters from the fitting of the simulated noisy data returns similar parameter values as the parameters returned when fitting the experimental data.

The K_M estimated for CsADH2946 is low, which indicates a high affinity towards the substrate crocin dialdehyde. This is also agrees with the pulling simulation, where the enzyme showed specificity towards this substrate. We compared the results with the binding of acetaldehyde and validated that CsADH2946 has specificity against crocetin dialdehyde since a higher force was required to pull this substrate from the active site. These results greatly support our choice in selecting CsADH2946 as an enzyme candidate in our zeaxanthin-crocin pathway.

References

(1) Moraga AR, Nohales PF, Pérez JAF, Gómez-Gómez L. Glucosylation of the saffron apocarotenoid crocetin by a glucosyltransferase isolated from Crocus sativus stigmas. Planta. 2004 Oct 1;219(6):955–66.

(2) Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, et al. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Research. 2014 Jul 1;42(W1):W252–8.

(3) Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T. The SWISS-MODEL Repository and associated resources. Nucleic Acids Research. 2009 Jan 1;37(suppl_1):D387–92.

(4) Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics. 2006 Jan 15;22(2):195–201.

(5) Expasy.org [homepage on the Internet]. Basel: Swiss Institute of Bioinformatics; [cited 30 September 2017] Available from: https://swissmodel.expasy.org/docs/help

(6) Vt.edu [homepage on the Internet]. Virginia: Department of Biochemistry; 2015 [cited 1 October 2017]. Available from: http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/

(7) Search BioNumbers - The Database of Useful Biological Numbers. online: http://bionumbers.hms.harvard.edu/search.aspx?log=y&task=searchbytrmorg&trm=Salt&org=. Accessed October 30, 2017.

(8) Lang BS, Gorren ACF, Oberdorfer G, Wenzl MV, Furdui CM, Poole LB, et al. Vascular Bioactivation of Nitroglycerin by Aldehyde Dehydrogenase-2. J Biol Chem. 2012 Nov 2;287(45):38124–34.

(9) Grosdidier A, Zoete V, Michielin O. SwissDock, a protein-small molecule docking web service based on EADock DSS. Nucleic Acids Res. 2011 Jul;39(Web Server issue):W270-277.

(10) Grosdidier A, Zoete V, Michielin O. Fast docking using the CHARMM force field with EADock DSS. J Comput Chem. 2011 Jul 30;32(10):2149–59.

(11) Umbrella Sampling [Internet]. [cited 2017 Oct 29]. Available from: http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/umbrella/index.html

(12) Tam NM, Nguyen MT, Ngo ST. Evaluation of the absolute affinity of neuraminidase inhibitor using steered molecular dynamics simulations. Journal of Molecular Graphics and Modelling. 2017 Oct 1;77(Supplement C):137–42.

Further Reading

/1/ http://www.gromacs.org/

/2/ https://swissmodel.expasy.org/

@@ Line 4: / Line 4: @@
 <html lang="en">
 <head>
+<script type="text/javascript" src="http://latex.codecogs.com/latexit.js"></script>
    <title>Modeling</title>
    <meta charset="utf-8">
@@ Line 26: / Line 27: @@
        padding-top: 3%;
        padding-bottom: 3%;
        text-align: justify;
+line-height:150%;
@@ Line 35: / Line 38: @@
        font-size: 4.5vh;
        padding-bottom: 2%;
+line-height:150%;
 }
@@ Line 43: / Line 48: @@
        color:#4D4D4D;
        text-align: justify;
        color: #ffffff;
+line-height:150%;
      }
+    .b {
+      font-family: "Roboto Slab";
+      font-size: 4vh;
+      color:#4D4D4D;
+      text-align: justify;
+line-height:150%;
+    }
+    .menu {
+      font-family: "Roboto Slab";
+      font-size: 4vh;
+      text-align: justify;
+      background-color:#4D4D4D;
+line-height:150%;
+    }
@@ Line 56: / Line 82: @@
      <img src="https://static.igem.org/mediawiki/2017/d/dd/Modeling_Header.gif" style="width:100%; height: auto">
    </div>
+  <div class="row menu">
+    <div class="btn-group btn-group-justified">
+  <a href="#homology" class="btn b" style="color:#ffffff">Homology Modeling</a>
+  <a href="#dynamics" class="btn b" style="color:#ffffff">Molecular Dynamics</a>
+  <a href="#models" class="btn b" style="color:#ffffff">Enzyme Models</a>
+  <a href="#pulling" class="btn b" style="color:#ffffff">Pulling Simulation</a>
+  <a href="#KM" class="btn b" style="color:#ffffff">K<sub>M</sub> for CsADH2946</a>
+  <a href="#references" class="btn b" style="color:#ffffff">References</a>
+</div>
+</div>
    <div class="row">
-    <div class="col-xs-12" style="height:80px;"></div>
+  <div class="col-xs-10" style="height:5vh"> </div>
-   </div>
+</div>
+  <div class="row">
+    <div class="col-xs-2"></div>
+    <div class="col-xs-8 textbox">
+      <div class="headers" style="padding-right:3%; padding-left:3%;"> Summary </div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">The three enzymes in our crocin pathway are relatively newly discovered and thus poorly characterized, with no structures available. In addition, we were only able to find enzymatic parameters concerning one of the enzymes, UGTCs2 (1). Via homology modeling we could draw the conclusion that UGTCs2 and CaCCD2 are monomers while CsADH2946 is a tetramer. We could also ensure that the N-terminal was facing outward from the protein and could safely put a His-tag there for purification. Furthermore, the extensive 100 ns MD simulation of our models verified that the models were stable in solvent and that we could further work with the models to estimate the binding affinity between enzyme and substrate. Steered molecular dynamics simulations showed that CsADH2946 is specific to its substrate crocetin dialdehyde and that it has an affinity in the µM range to the substrate. The estimation of K<sub>M</sub> from the experimental activity measurements also showed that CsADH2946 has a high affinity towards the substrate crocetin dialdehyde, in agreement with Steered Molecular Dynamics results.</div>
+    </div>
+    <div class="col-xs-2"></div>
+   </div>
+  <div class="row">
+  <div class="col-xs-10" style="height:3vh"> </div>
+</div>
+  <div class="row" id="homology">
+  <div class="col-xs-10" style="height:6vh"> </div>
+</div>
    <div class="row">
      <div class="col-xs-2"></div>
      <div class="col-xs-8 textbox">
        <div class="headers" style="padding-right:3%; padding-left:3%;"> Homology Modeling of the Uncharacterized Enzymes </div>
-       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">One of the first things we noticed when looking at our selected enzymes was that no three-dimensional structures were available and the enzymes were poorly characterized in general, so if we wanted to get a sense of what they might look like, we would have to figure something out. The obvious thing to do, we felt, was to turn to homology modeling.</div>
+       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">One of the first things we noticed when looking at our selected enzymes was that no three-dimensional structures were available and the enzymes were poorly characterized in general. To be able to get structural information we turned to homology modeling. </div>
        <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">The idea of homology modeling is to construct a 3D-structure by mapping the amino acid sequence onto a template – i.e. a known structure of a related, homologous protein through sequence alignment. For this to work you need a template with a solved 3D-structure. The quality of the model is determined by the alignment with, and, the structure of the template. Thus, ideally, you would like a high-resolution structure with high sequence identity.</div>
-       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">While there are various tools available for homology modeling, such as MODELLER, we choose to use SWISS-MODEL /2/, a fully automated protein structure homology-modeling server (1, 2, 3). We plugged in the sequences of our chosen enzymes (CaCCD2, CsADH2946 and UGTCs2) and started modeling. While the available templates weren’t quite as high-quality as we had hoped, we were confident that they were sufficient to get the job done. We chose our templates and obtained the models detailed in figure 1 and the models along with the quality scoring are summarized in table 1. We could make two immediate observations. The CsADH2946 model seemed the most promising one, quality-wise with GMQE close to 1 and higher QMEAN being better. In addition, looking at the N-terminals (the blue ends in figure 1) we could see that they are stretched out, poking outwards from the protein. This was a good indication that we could put a His-tag at this end, with no complications. Another discovery we made was that the second step enzyme CsADH2946 is most likely a tetramer. This information was helpful when purifying the enzyme and going forth with molecular dynamics.</div>
+       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">The tool we used to generate our homology models was SwissDock /2/, a fully automated protein structure homology-modeling server (2, 3, 4). We plugged in the sequences of our chosen enzymes ((<a href="https://2017.igem.org/Team:Uppsala/Parts">CaCCD2</a>), (<a href="https://2017.igem.org/Team:Uppsala/Parts">CsADH2946</a>) and (<a href="https://2017.igem.org/Team:Uppsala/Parts">UGTCs2</a>)) and started modeling. The templates were chosen based upon quality parameters such as Global Mean Quality Estimation, QMEAN, resolution of the crystal structure and sequence identity (5). After choosing the best templates we obtained the models detailed in figure 1 and the models along with the quality scoring are summarized in table 1. We could make two immediate observations. The CsADH2946 model seemed the most promising one, quality-wise with GMQE close to 1 and higher QMEAN being better. In addition, looking at the N-terminals (the blue ends in figure 1) we could see that they are stretched out, poking outwards from the protein. This was a good indication that we could put a His-tag at this end, with no complications. Another discovery we made was that the second step enzyme CsADH2946 is most likely a tetramer. This information was helpful when purifying the enzyme and going forth with molecular dynamics.</div>
+      <figure class="figure">
+        <figcaption class="figure-caption figtext" style="text-align:center; padding-top: 3%;padding-left:3%;padding-right:3%"> Table 1. The enzymes modeled, which templates were used and quality assessment by means of GMQE and QMEAN. GMQE ranges between 0 and 1, where one is good and zero is bad. As for QMEAN the higher the score, the better(1).</figcaption>
+        <img src="https://static.igem.org/mediawiki/2017/a/ad/Uppsala-Modeling-Table.png" class="figure-img img-fluid" style="display: block; margin: auto; width: 70%; height: auto;">
+      </figure>
        <figure class="figure">
@@ Line 72: / Line 132: @@
        </figure>
-<!-- This is a test video of the UGT protein. Leaving it here for future reference. /Björn
-<video width="auto" height="50%" controls style="float:right;">
-<source src="https://static.igem.org/mediawiki/2017/9/9b/UGT_test.mp4" type="video/mp4"></source>
-Your browser does ot support video.
-</video>-->
      </div>
@@ Line 87: / Line 139: @@
      <div class="col-xs-12" style="height:80px;"></div>
    </div>
+  <div class="row" id="dynamics">
+  <div class="col-xs-10" style="height:6vh"> </div>
+</div>
    <div class="row">
      <div class="col-xs-2"></div>
      <div class="col-xs-8 textbox">
        <div class="headers" style="padding-right:3%; padding-left:3%;"> Molecular Dynamics in GROMACS </div>
-       <div class="headers" style="font-size:2.7vh;padding-right:3%; padding-left:3%;"> - The Art of Putting Digital Molecules in Digital Boxes of Water </div>
+       <div class="headers" style="font-size:2.7vh;padding-right:3%; padding-left:3%;"> – The Art of Putting Digital Molecules in Digital Boxes of Water </div>
-       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">So we constructed homology models of our enzymes. Are they good models? Are they realistic? There are several measurements that can be made on the models to estimate the answers to these questions. One such measurement is Global Model Quality Estimation (GMQE) and QMEAN (4), but the models are still just a guess of what our enzymes actually look like. To asses the models and prepare them for further research we used GROMACS /1/ to simulate our enzymes in saline water for a total of 100 ns. This lets us assess their stability and obtain new models that are closer to their native conformation which would be the most probable state of the enzymes during the activity measurements in our wet lab.</div>
+       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">To assess the models and prepare them for further characterization we used GROMACS /1/ to simulate our enzymes in saline water for a total of 100 ns. This lets us assess their stability and obtain new models that are closer to their native conformation which would be the most probable state of the enzymes during the activity measurements in our wet lab.</div>
-       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">One of the reasons for obtaining these new models were the high price tags on most of the substrates. One way of minimizing substrate consumption is to start the enzymatic assays at concentrations close to the real KM values. An educated guess would be to use the KM observed in reactions by related enzymes, but since we are the modelling group our contributions are made through advanced modelling done with lots of computer power. Thus, after ensuring that our models are stable and realistic, doing substrate pulling simulations in GROMACS would be our next move in order to estimate the K<sub>M</sub></div>
+       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b> How We Simulated Our Enzymes </b></div>
+ <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Molecular dynamics is a simulation approach where the potential energies of a molecular system is parameterized using a predefined force-field. A trajectory of the atoms of the system, that is how the atoms move over time, can then be simulated by numerically intergrating Newton's equations of motion.</div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">The MD simulations were performed based on previously defined protocols, which were nicely summarized by Justin Lemkul in a tutorial (6). All the steps until the production MD can be done on practically any system while the production MD will require a supercomputer to reduce time. Next follows a short description of the main steps in the protocols as well as some tips and general comments.</div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Define Box and Solvate</b></div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">The protein has to be put in a box containing solvent where the simulation will take place. We chose a cubic box with at least 1 nm between the protein and the box. The simulation is run with periodic boundary conditions, meaning that if the molecule touches the boundary it will appear on the other side of the box.</div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Add Ions</b></div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">We neutralized our enzymes and added ions to a concentration of 0.15 M to get closer to physiological conditions (7).</div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Energy Minimization</b></div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">In this step the energy of the system is minimized using a steepest descent algorithm. This is done to remove any conformations that are unfavourable in terms of energy. The potential energy of our system was calculated using the CHARMM27 force-field in this and all later steps of our simulations. </div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Equilibration</b></div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Equilibration is done to relax the system into an equilibrated state. It starts of by doing a simulation with constant volume, temperature (310 K) and number of particles. Once the first one is finnished, another simulation with constant temperature (310 K), pressure (1 bar) and a constant amount of particles is run. The system is also equilibrated in the beginning of the production simulation, but by performing a more refined equilibration protocol in advance, simulation time is saved. </div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Production Molecular Dynamics</b></div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Once the system was equilibrated production MD was carried out with constant pressure (1 bar), temperature (310 K) and number of particles. The systems were simulated  for 100 ns on a supercomputer with 24 processors running in parallel. The time it takes for the simulation to complete is dependant on the amount of particles present. Because CsADH2946 is a tetramer with the highest amount of atoms in the simulated system, it took the longest time to finnish (approximately one month).</div>
      </div>
@@ Line 101: / Line 170: @@
      <div class="col-xs-12" style="height:80px;"></div>
    </div>
+  <div class="row" id="models">
+  <div class="col-xs-10" style="height:6vh"> </div>
+</div>
    <div class="row">
      <div class="col-xs-2"></div>
      <div class="col-xs-8 textbox">
-       <div class="headers" style="padding-right:3%; padding-left:3%;"> How We Simulated Our Enzymes </div>
+       <div class="headers" style="padding-right:3%; padding-left:3%;"> The Resulting Stable Enzyme Models </div>
-       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Simulating molecular dynamics is complicated, but luckily for us Justin Lemkul has written an excellent tutorial on how to do almost exactly the simulation we wanted to do (5). We followed that tutorial with only a few changes such as saline concentration, water model and force field parameters to better fit our purpose. All the steps until the production MD can be done on practically any system while the production MD will require a supercomputer to reduce time. Next follows a short description of what deviations we made from the tutorial by Justin as well as some tips and general comments. The headings corresponds to the headings in the tutorial.</div>
+       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">The result we got from the 100 ns stability simulations in GROMACS show a trajectory of each enzyme wiggling around trying to equilibrate into a low energy state. We created movies of these simulations that can be viewed in figure 2. </div>
-      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Define Box and Solvate</b></div>
+<video width="auto" height="345vh" autoplay loop style="padding-left:3%">
-      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">So you want to put your protein in a box with a solvent, where the simulation will take place. The tutorial presents the rhombic dodecahedron as an option but we performed the simulation using a cube, with at least 1 nm between the protein and the box edge. A rhombic dodecahedron shape could increases the risk of your molecule crossing the border of the box. This gives visual glitches in the representation if your molecule that needs fixing before you can properly view the molecule. You can check out the “Fix Periodic Boundary Visualisation Issues” file below if you do run into these problems however. </div>
+<source src="https://static.igem.org/mediawiki/2017/1/15/Uppsala-CaCCD2.mp4" type="video/mp4"></source>
-      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">After solvation you can remove water molecules that are overlapping with the model in order to cut down on the number of molecules you need to simulate later. This is however a rather small adjustment which we did not do ourselves.</div>
+Your browser does ot support video.
-      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Add Ions</b></div>
+</video>
-      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">We neutralized our enzymes and added ions to a concentration of 0.15 M to get closer to physiological conditions (6), instead of just neutralizing the enzyme.</div>
+<video width="auto" height="345vh" autoplay loop>
-      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Energy Minimization</b></div>
+<source src="https://static.igem.org/mediawiki/2017/4/40/Uppsala-CsADH2946.mp4" type="video/mp4"></source>
-      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Here you need a viewer that read .trr to view the nice molecular movie you just made. Alternatively use the trjconv command in GROMACS to combine the trajectory with your structure into a .pdb which most viewers can handle.</div>
+Your browser does ot support video.
-      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Equilibration</b></div>
+</video>
-      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">This step is not strictly needed since the later production MD step will do essentially the same work but unrestricted and for a longer time. However, you might want to do these steps since it relaxes the structure further to minimize the risk of melting the protein during the unrestricted simulation. (Queueing a whole rack of processors at a supercomputer can take upwards a month, if you don’t have priority time and the queue is long. Having a melted protein as result after that wait is not nice.) The equilibration steps are more demanding than the energy minimization and might need to run overnight if you are using an old system.</div>
+<video width="auto" height="345vh" autoplay loop>
-       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Production Molecular Dynamics</b></div>
+<source src="https://static.igem.org/mediawiki/2017/9/9b/UGT_test.mp4" type="video/mp4"></source>
-      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Now the actual simulation can take place. The guide performs a 1 ns Molecular dynamics (MD) simulation, which is a quite short simulation. We do MD simulations during 100 ns, giving the system enough time to get to converge to its equilibrated form. Have in mind that a bigger protein, such as our CsADH2946 being a tetramer, the simulation will take a lot longer time than it would for a smaller, monomeric protein. Check the results!</div>
+Your browser does ot support video.
+</video>
+<div class="figtext" style="padding-bottom:3%; padding-left:3%;"> Figure 2. These videos are the result of simulating CaCCD2 (leftmost), CsADH2946 (middle) and UGTCs2 (rightmost) for 100 ns in saline water.</div>
+<div style="padding-bottom:3%; padding-right:3%; padding-left:3%;"> The movies all confirmed that our homology models are good in the sense that they are not melting. Along with the movies we received RMSD plots. The RMSD plot describes the distances between the atoms of the initial conformation going into the simulation and the atoms of the different conformational states the enzyme develops over the 100 ns. As can be seen in figure 3, The RMSD curve for each enzyme model converge towards one low, stable value. This shows that the models equilibrate to a stable state with low energy in the solvent and indicates that our homology models are in fact good models to represent our enzymes.</div>
+       <figure class="figure">
+        <img src="https://static.igem.org/mediawiki/2017/f/f0/Uppsala-Modeling-Pic2.png" class="figure-img img-fluid" style="display: block; margin: auto; width: 60%; height: auto; padding-top: 3%;">
+        <figcaption class="figure-caption figtext" style="text-align: center; padding-bottom: 2%; padding-left:20%;padding-right:20%"> Figure 3. RMSD plot of a 100 ns stability simulation in GROMACS of homology models to the enzymes CaCCD2, CsADH2946 and UGTCs2.</figcaption>
+      </figure>
      </div>
      <div class="col-xs-2"></div>
@@ Line 124: / Line 205: @@
      <div class="col-xs-12" style="height:80px;"></div>
    </div>
+  <div class="row" id="pulling">
+  <div class="col-xs-10" style="height:6vh"> </div>
+</div>
    <div class="row">
      <div class="col-xs-2"></div>
      <div class="col-xs-8 textbox">
-       <div class="headers" style="padding-right:3%; padding-left:3%;"> The Resulting Stable Enzyme Models </div>
+       <div class="headers" style="padding-right:3%; padding-left:3%;"> Steered Molecular Dynamics (Pulling) with CsADH2946 and Crocetin Dialdehyde </div>
-       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">The result we got from the 100 ns stability simulations in GROMACS show a trajectory of each enzyme wiggling around trying to equilibrate into a low energy state. We created movies of these simulations that can be viewed below. The movies all confirmed that our homology models are good in the sense that they are not melting. Following the tutorial we also received RMSD plots of the simulations. The RMSD plot describes the distances between the atoms of the initial conformation going into the simulation and the atoms of the different conformational states the enzyme develops over the 100 ns. As can be seen in figure 2, The RMSD curve for each enzyme model converge towards one low, stable value. This shows that the models equilibrate to a stable state with low energy in the solvent and indicates that our homology models are in fact good models to represent our enzymes.</div>
+       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Since CsADH2946 had great advancements in the lab, has the best homology model score of our enzymes and also has the RMSD which converges fastest at the lowest value showing that it is a stable structure, we decided to focus the rest of our computer power to do Steered Molecular Dynamics (pulling simulations) with this enzyme. A pulling simulation between the protein and a substrate allows us to estimate the dissociation constant K<sub>d</sub> which can be compared to K<sub>M</sub>. This is very useful to compare to the experimentally estimated K<sub>M</sub> for CsADH2946 in order to validate both the simulation and experiments.</div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Docking</b></div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">To be able to execute pulling simulations we had to pick the ligands that were going to be pulled in the simulations. The obvious ligand of choice was crocetin dialdehyde, which is converted to crocetin by CsADH2946. To be able to verify that CsADH2946 specifically binds crocetin dialdehyde, another ligand, the common acetaldehyde was chosen to be able to compare the enzyme's specificity. Acetaldehyde was run in a separate simulation to crocetin dialdehyde.</div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Before the pulling simulations can be started, it is required to select an initial configuration where the ligand is bound to the active site of the enzyme. This position was found by docking the ligands with one subunit from the CsADH2946 tetramer. The location of the active site was found by looking at the template molecule for the homology modeling (PDB entry 4fqf) (8). The active site of this homology template can be found at a loop containing three cysteines (C301, C302 and C303). In our homology model that corresponded to C337, C338 and C339, and docking was therefore performed in the patch of the protein where this loop can be found. The docking of both ligands were done using SwissDock (9, 10), where the structure used for CsADH2946 was an equilibrated conformation 40 ns into the production MD simulation. The best docked structure where the ligand was in close contact with the triple cystein loop was then chosen as starting configuration for the pulling simulation.
+</div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Pulling</b></div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Steered molecular dynamics (pulling) is a way of estimating the binding affinity between a protein and a ligand. We based our simulations on a previously defined protocol by Justin Lemkul (11). Before starting the simulation we needed to select a trajectory in which the ligands were to be pulled. The choice of trajectory was based on a path that was not obstructed by any residues. After being equilibrated for 200 ps, the pulling simulation was run for 8000 picoseconds pulling the ligands at a velocity of 0.25 nm/ns. The results from the simulation can be seen in the movies below (figure 4) and in figure 5.</div>
+<video width="auto" height="345vh" autoplay loop style="padding-left:18%">
+<source src="https://static.igem.org/mediawiki/2017/c/ce/Uppsala-Pull_crocetin_dialdehyde.mp4" type="video/mp4"></source>
+Your browser does ot support video.
+</video>
+<video width="auto" height="345vh" autoplay loop>
+<source src="https://static.igem.org/mediawiki/2017/5/58/Uppsala-Pull_Acetaldehyde.mp4" type="video/mp4"></source>
+Your browser does ot support video.
+</video>
+<div class="figtext" style="text-align: center; padding-bottom:2%; padding-left:20%;padding-right:20%"> Figure 4. Results of the pulling simulations of one subunit of CsADH2946 with crocetin dialdehyde (leftmost) and acetaldehyde (rightmost). The molecules were pulled with a velocity of 0.25 nm/ns.</div>
+<div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">These indicated that more force is required to be able to pull out crocetin dialdehyde than acetaldehyde from the active site of CsADH2946 as the peaks in figure 5 can be interpreted as energy barriers being crossed. This can also be seen in the movies where crocetin dialdehyde gets stuck in two positions. Thus, the CsADH2946 model demonstrates specificity towards crocetin dialdehyde.</div>
        <figure class="figure">
-         <img src="https://static.igem.org/mediawiki/2017/f/f0/Uppsala-Modeling-Pic2.png" class="figure-img img-fluid" style="display: block; margin: auto; width: 40%; height: auto; padding-top: 3%;">
+         <img src="https://static.igem.org/mediawiki/2017/3/36/Uppsala-Modeling-Pic3.png" class="figure-img img-fluid" style="display: block; margin: auto; width: 60%; height: auto; padding-top: 3%;">
-         <figcaption class="figure-caption figtext" style="text-align: center; padding-bottom: 2%;"> Figure 2. RMSD plot of a 100 ns stability simulation in GROMACS of homology models to the enzymes CaCCD2, CsADH2946 and UGTCs2.</figcaption>
+         <figcaption class="figure-caption figtext" style="text-align: center; padding-bottom: 2%; padding-left:20%;padding-right:20%"> Figure 5: Pulling force applied to crocetin dialdehyde and acetaldehyde from one subunit of our CsADH2946 homology model. For crocetin dialdehyde there are two peaks which indicates energy barriers where more force is required to pull crocetin dialdehyde out. For acetaldehyde there is no clear peak indicating that our model is not able to bind acetaldehyde as strongly as crocetin dialdehyde.</figcaption>
        </figure>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Using the Pulling Simulation to Estimate the Dissociation Constant of Crocetin Dialdehyde</b></div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">The resulting maximum force applied to crocetin dialdehyde could then be used to roughly calculate the dissociation constant. To do this the following equation was used where 𝜟G is Gibbs free energy, R is the gas constant, T the temperature and K<sub>d</sub> the dissociation constant:</div>
+                                                     <div style="text-align:center; background-color: #ffffff; padding-left:30%; padding-right:30%; padding-top:2%; padding-bottom:2%;" lang="latex">
+                                                    \Delta G = -RTln(K_{d})
+                                                     </div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-top:3%;">What we need is 𝜟G to be able to calculate the dissociation constant. To get 𝜟G  the maximum force (284.2160 kJ/(mol*nm)) was first converted into pN. The resulting force was 471.9628 pN. 𝜟G was then calculated by extrapolating data from a study where they plotted the maximum force against their experimental 𝜟G values (12). Calculation of K<sub>d</sub> using the equation above yielded a value of 4.9321 µM. </div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Thus we are able to compare the dissociation constant from our molecular dynamics simulations to the value of K<sub>M</sub> (~20.7842 µM) we estimated in our Bayesian inference model using our experimental data. Under the equilibrium approximation where substrate and enzyme complex are in an instantaneous equilibrium K<sub>M</sub> and K<sub>d</sub> are equal. From our experiments and simulations we indeed see that the estimated K<sub>M</sub> and K<sub>d</sub> are similar considering the fact that the dissociation constant has been calculated from an energy. This further shows that the molecular dynamics simulation is accurate, validating the result that CsADH2946 is specific to crocetin dialdehyde.
      </div>
      <div class="col-xs-2"></div>
    </div>
    <div class="row">
      <div class="col-xs-12" style="height:80px;"></div>
    </div>
+  <div class="row" id="KM">
+  <div class="col-xs-10" style="height:6vh"> </div>
+</div>
    <div class="row">
      <div class="col-xs-2"></div>
      <div class="col-xs-8 textbox">
-       <div class="headers" style="padding-right:3%; padding-left:3%;"> Pulling Simulation with CsADH2946 and Crocetin Dialdehyde </div>
+       <div class="headers" style="padding-right:3%; padding-left:3%;">Experimental Estimation of K<sub>M</sub> for CsADH2946 </div>
-       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Since CsADH2946 had great advancements in the lab, has the best homology model score of our enzymes and also has the RMSD which converges fastest at the lowest value showing that it is a stable structure, we decided to focus the rest of our computer power to do pulling simulations with this enzyme. A pulling simulation between the protein and a substrate allows us to estimate the dissociation constant Kd which can be compared to KM. This is very useful to compare to the experimentally estimated KM  for CsADH2946 in order to validate the estimation.</div>
+       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Since the kinetic parameters for CsADH2946 have never been determined, we wanted to characterize the Michaelis-Menten kinetics of the reaction and estimate K<sub>M</sub> of our CsADH2946 using the <a href="https://2017.igem.org/Team:Uppsala/CrocinPathway">activity measurement results</a>. The substrate crocetin dialdehyde has low water solubility and will therefore partially precipitate when coming in contact with water. Quantitative kinetic characterization of such a system is especially complicated since the substrate concentration at the beginning of the experiment, S<sub>0</sub>, is not known. We have therefore developed a probabilistic method that is capable of estimating Michaelis-Menten model parameters as well as the error associated with them for such systems and we demonstrate its use on our enzyme CsADH2946. The MATLAB-script we created can be downloaded <a href="https://static.igem.org/mediawiki/2017/b/bf/Uppsala-propabilisticInterferenceNew.zip">here</a>.</div>
-      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Docking</b></div>
+       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Enzymatic reactions are often modeled with Michaelis-Menten kinetics. In this model, the reaction rate is described by the differential equation:</div>
-       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">To be able to execute pulling simulations we had to pick the ligands that were going to be pulled in the simulations. The obvious ligand of choice was crocetin dialdehyde, which is converted to crocetin by CsADH2946. To be able to verify that CsADH2946 actually binds crocetin dialdehyde, another ligand, the common acetaldehyde was chosen. Acetaldehyde was run in a separate simulation to crocetin dialdehyde.</div>
+<div style="padding-left:35%;">
-      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Before being able to start the pulling simulations it is required to select an initial position which is done by docking the ligands. That position should be the active site of the enzyme. The location of the active site was found by looking at the template molecule (PDB entry 4fqf) (7). In that article the active site was on a loop containing three cysteines (C301, C302 and C303). In our homology model that corresponded to C337, C338 and C339. Because of (7) we also decided to focus on one of the subunits, which subunit was not important since both 4fqf and our model are homo-tetramers. The docking itself was done using SwissDock (8, 9). For the docking mol2 files for both of the ligands were generated and found through the ligand database SwissDock provided. The pdb file for CsADH2946 was the last frame of 100 ns molecular dynamics simulation. As for parameters, in SwissDock you can provide room coordinates for centering and the size of a box where the server should try to dock the ligand. When the docking was complete one of the seeds was chosen based on the distance to the active site and the ligand coordinates were used for the pulling simulations.
+        <img src="https://static.igem.org/mediawiki/2017/c/cc/Uppsala-Modeling-Pic4.png" class="figure-img img-fluid" style="width: 40%; height: auto; padding-bottom: 3%; padding-right:5%;">(1)
 </div>
-       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Pulling</b></div>
+       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">where [C] and [S] are the concentrations of product and substrate, respectively. V<sub>max</sub> is the maximum production rate of the system at saturated substrate concentration and K<sub>M</sub> is the substrate concentration when the reaction rate is half of V<sub>max</sub>. Reaction rate is not directly measured in the assay, so we used the experimental product concentrations Ĉ(t) to estimate the Michaelis-Menten parameters. This was done by integrating equation 1 numerically and fitting the integrated C(K<sub>M</sub>,V<sub>max</sub>,S<sub>0</sub>,t) to the experimentally measured product concentrations, Ĉ(t).</div>
-       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Once again we used a tutorial by Justin Lemkul, this time on the subject of umbrella sampling (10). However, we only performed the steps until the configuration generation, with slight modifications that were more suitable to our enzyme (the pulling trajectory for example). Before starting the simulation we needed to select a trajectory in which the ligands were to be pulled. The choice of trajectory was based on a path that was not obstructed by any residues. The pulling simulation was run for 8000 picoseconds pulling the ligands 0.25 nm/ns. The results from the simulation can be seen in the movies below and in figure 3 (insert pulling movies for each of the ligands below). These indicated that more force is required to be able to pull out crocetin dialdehyde than acetaldehyde from the active site of CsADH2946 as the peaks in figure 3 can be interpreted as energy barriers being crossed. This can also be seen in the movies where crocetin dialdehyde gets stuck in two positions. Thus, the CsADH2946 model demonstrates specificity towards crocetin dialdehyde.</div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">The product concentrations, Ĉ(t), were estimated at every timepoint from the experimentally measured absorbance spectra. Fitting was then performed by minimizing the sum of the squared differences (SSD) between the model prediction, C (from integration) and the experimentally measured progress curve, Ĉ(t). These fitting methods are good at finding the parameters that describe the data best but give no information about the uncertainty in the parameters described by the data. Use of methods minimizing the SSD is further complicated when the substrate concentration at the beginning of the experiment, S<sub>0</sub>, is not known, introducing further uncertainty in the data and demonstrating the need for good error estimation of the model parameters.</div>
+       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">We developed a probabilistic method for estimating the kinetic parameters and the uncertainty in them. The method computes and integrates the probability density P(K<sub>M</sub>,V<sub>max</sub>,S<sub>0</sub>,σ) as a function of all model parameters. We set the notation θ = (K<sub>M</sub>,V<sub>max</sub>,S<sub>0</sub>,σ), where σ is the model noise. In this method we used Bayesian statistical inference to estimate the parameters from the experimental data. In short that means calculating P(θ|Ĉ); the probability of the model parameters, given the experimental data. We estimated this distribution from the experimental data and the mean and variance of each parameter could be calculated as:</div>
+<div style="padding-left:35%;">
+        <img src="https://static.igem.org/mediawiki/2017/4/4c/Uppsala-Modeling-Pic5.png" class="figure-img img-fluid" style="width: 55%; height: auto; padding-bottom: 3%; padding-right:5%;">(2)
+</div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">The posterior probability P(θ|Ĉ) was calculated using Bayes theorem:</div>
+<div style="padding-left:35%;">
+        <img src="https://static.igem.org/mediawiki/2017/9/93/Uppsala-Modeling-Pic6.png" class="figure-img img-fluid" style="width: 55%; height: auto; padding-bottom: 3%; padding-right:5%;">(3)
+</div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">where the probability of the data P(Ĉ) is a normalization constant, the prior P(θ) describes your expectations of the model parameters before observing any data and the likelihood P(Ĉ|θ) is the probability of the data given the parameters.</div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">The likelihood is a function of the product concentration  C(K<sub>M</sub>,V<sub>max</sub>,S<sub>0</sub>,t) calculated from the Michaelis-Menten model. We modelled the deviation between the experimentally measured concentrations  and model product concentration C as independent and identically normally distributed at every time-point. This gave us the likelihood</div>
+<div style="padding-left:35%;">
+        <img src="https://static.igem.org/mediawiki/2017/3/30/Uppsala-Modeling-Pic7.png" class="figure-img img-fluid" style="width: 60%; height: auto; padding-bottom: 3%; padding-right:5%;">(4)
+</div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">where σ is the standard deviation of the normal distribution describing the deviation between the data and the model. With a flat prior (a uniform prior where all model parameters are equally likely before observing any data) the posterior P(θ|Ĉ) can simply be written as</div>
+<div style="padding-left:35%;">
+        <img src="https://static.igem.org/mediawiki/2017/a/a0/Uppsala-Modeling-Pic8.png" class="figure-img img-fluid" style="width: 35%; height: auto; padding-right:5%;">(5)
+</div>
+<div style="padding-left:35%;">
+        <img src="https://static.igem.org/mediawiki/2017/b/b3/Uppsala-Modeling-Pic9.png" class="figure-img img-fluid" style="width: 35%; height: auto; padding-bottom: 3%; padding-right:5%;">(6)
+</div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">where Z is a normalization constant.</div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">To estimate the mean and variance of our four model parameters we calculated the integrals in equation 2 and 3, i.e., numerically integrating the likelihood L(θ) over the four dimensions of the parameters. It is a problem that is very numerically complex if L(θ) is to be evaluated for all values of θ. Therefore, we implemented a version of importance sampling, where we only evaluate L(θ) were it is large enough to contribute to the integral. We start at the maximum likelihood set of parameter θ<sub>max</sub> (found by SSD fitting) where L(θ) is maximized and take discrete steps in θ until L(θ) is essentially 0. Therefore we do not have to compute L(θ)  for all values of θ, but only at the values that actually contribute to the integral. A visual of this is displayed in figure 6. We saved the values of L(θ) of all these points in four dimensional parameter space and the marginalized distribution for the individual parameters could then be calculated. For example for K<sub>M</sub>.</div>
+<div style="padding-left:30%;">
+        <img src="https://static.igem.org/mediawiki/2017/5/51/Uppsala-Modeling-Pic10.png" class="figure-img img-fluid" style="width: 60%; height: auto; padding-bottom: 3%; padding-right:3%;">(7)
+</div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">The mean and error estimate of K<sub>M</sub> and the other parameters  was then obtained by calculating the mean and standard deviation of the marginalized distribution. A plot of the marginalized distribution for each parameter can be seen in Figure 7. </div>
        <figure class="figure">
-         <img src="https://static.igem.org/mediawiki/2017/3/36/Uppsala-Modeling-Pic3.png" class="figure-img img-fluid" style="display: block; margin: auto; width: 40%; height: auto; padding-top: 3%;">
+         <img src="https://static.igem.org/mediawiki/2017/8/81/Uppsala-Modeling-Pic11.png" class="figure-img img-fluid" style="display: block; margin: auto; width: 60%; height: auto">
-         <figcaption class="figure-caption figtext" style="text-align: center; padding-bottom: 2%;"> Figure 3: Pulling force applied to crocetin dialdehyde and acetaldehyde from one subunit of our CsADH2946 homology model. For crocetin dialdehyde there are two peaks which indicates energy barriers where more force is required to pull crocetin dialdehyde out. For acetaldehyde there is no clear peak indicating that our model is not able to bind acetaldehyde as strongly as crocetin dialdehyde.</figcaption>
+         <figcaption class="figure-caption figtext" style="text-align: center; padding-bottom: 2%; padding-left:20%;padding-right:20%"> Figure 6.  Illustration of importance sampling of L(θ). Only samples of L(θ) that are large enough to contribute to the integral are evaluated (the blue area). Therefore, the starting value is θ<sub>max</sub> where L(θ) is maximized and discretes steps are taken until the values of L(θ) are close to 0 and does not contribute to the integral.</figcaption>
        </figure>
+      <figure class="figure">
+        <img src="https://static.igem.org/mediawiki/2017/9/95/Uppsala-Modeling-Pic12.png" class="figure-img img-fluid" style="display: block; margin: auto; width: 60%; height: auto;">
+        <figcaption class="figure-caption figtext" style="text-align: center; padding-bottom: 2%; padding-left:20%;padding-right:20%"> Figure 7.  Plot of the marginalized distribution for each parameter estimated. A. V<sub>max</sub> B. K<sub>M</sub> C. S<sub>0</sub> D. Model noise. </figcaption>
+      </figure>
-      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>Using the Pulling Simulation to Estimate the Dissociation Constant of Crocetin Dialdehyde</b></div>
-      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">The resulting maximum force applied to crocetin dialdehyde could then be used to roughly calculate the dissociation constant. To do this the following equation was used where 𝜟G is Gibbs free energy, R is the gas constant, T the temperature and Kd the dissociation constant:</div>
-       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">This equation assumes that the system is in equilibrium. What we need is 𝜟G to be able to calculate the dissociation constant. To get 𝜟G  the maximum force (284.2160 kJ/(mol*nm)) was first converted into pN, where we assumed that we had one mole in our system. The resulting force was 471.9628 pN. 𝜟G was then calculated by extrapolating data from a study where they plotted the so called rupture force against their experimental 𝜟G values (11). Calculation of Kd using equation (1) yielded a value of 4.9321 µM. </div>
+       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%; padding-bottom:1%"><b>The Resulting Estimated Kinetic Parameters</b></div>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">So, the kinetic parameters for CsADH2946 were estimated using this probabilistic method where the experimental data was fitted to the model. As you can see in Figure 8, we could compute a curve (blue) with the estimated parameters that fit the experimental data. This means that our parameters are good approximations that can be used to describe the actual data. We obtained K<sub>M</sub> = 20.7842 µM &#177; 3.5264 with standard diviation.</div>
-       <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">Thus we are able to compare the dissociation constant from our molecular dynamics simulations to the value of KM (~20.7842 µM) we estimated in our Bayesian interference model (link to Km estimation modelling) and draw the conclusion that the estimated KM is approximately four times larger than the calculated Kd. With these values a lot less guesswork would be required when setting up an activity assay, and we therefore hope that other iGEM teams continue our work of characterizing CsADH2946.</div>
+       <figure class="figure">
+        <img src="https://static.igem.org/mediawiki/2017/8/89/Uppsala-Modeling-Pic13.png" class="figure-img img-fluid" style="display: block; margin: auto; width: 60%; height: auto;">
+        <figcaption class="figure-caption figtext" style="text-align: center; padding-bottom: 2%; padding-left:20%;padding-right:20%"> Figure 8. Plot of experimental product concentration Ĉ against time (red) and a modelled curve using estimated Michaelis-Menten kinetic parameters to best fit the experimental data (blue). The values of the estimated kinetic parameters obtained by the algorithm are also displayed in the plot. K<sub>M</sub> = 20.7842 µM &#177; 3.5264 with standard diviation.</figcaption>
+      </figure>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">To evaluate our estimated parameters, we first compared the values obtained by our algorithm with values of the parameters calculated with maximum likelihood. These were similar, which shows that our values are reasonable. To further analyse and validate our algorithm, we ran our algorithm on a simulated a progress curve based on the estimated parameters but with added normally distributed noise. The fitting of the simulated noisy data using our algorithm should return similar parameter values as the initial experimental data generated. As can be seen in figure 9, the parameters from the fitting of the simulated noisy data did return similar parameter values. We also tried to compare our results with parameters estimated from fitting the experimental data in DynaFit. However, DynaFit was unable to return a good model fit, possibly because of our unknown substrate concentration in the beginning of the experiment.</div>
+      <figure class="figure">
+        <img src="https://static.igem.org/mediawiki/2017/7/72/Uppsala-Modeling-Pic14.png" class="figure-img img-fluid" style="display: block; margin: auto; width: 60%; height: auto">
+        <figcaption class="figure-caption figtext" style="text-align: center; padding-bottom: 2%; padding-left:20%;padding-right:20%"> Figure 9. Plot of simulated data based on the estimated parameters with added normally distributed noise (red) and a modelled curve using our algorithm to best fit the simulated noisy data (blue). The parameters from the fitting of the simulated noisy data returns similar parameter values as the parameters returned when fitting the experimental data.</figcaption>
+      </figure>
+      <div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">The K<sub>M</sub> estimated for CsADH2946 is low, which indicates a high affinity towards the substrate crocin dialdehyde. This is also agrees with the  <a href="#pulling">pulling simulation</a>, where the enzyme showed specificity towards this substrate. We compared the results with the binding of acetaldehyde and validated that CsADH2946 has specificity against crocetin dialdehyde since a higher force was required to pull this substrate from the active site. These results greatly support our choice in selecting CsADH2946 as an enzyme candidate in our zeaxanthin-crocin pathway.</div>
      </div>
      <div class="col-xs-2"></div>
    </div>
+  <div class="row">
+    <div class="col-xs-12" style="height:80px;"></div>
+  </div>
+  <div class="row" id="references">
+  <div class="col-xs-10" style="height:6vh"> </div>
+</div>
+  <div class="row">
+    <div class="col-xs-2"></div>
+    <div class="col-xs-8 textbox">
+           <div style="padding-right:3%; padding-left:3%; padding-bottom: 1%;"> <b>References </b></div>
+           <div style="padding-right:3%; padding-left:3%; padding-bottom: 2%;"> (1) Moraga AR, Nohales PF, Pérez JAF, Gómez-Gómez L. Glucosylation of the saffron apocarotenoid crocetin by a glucosyltransferase isolated from Crocus sativus stigmas. Planta. 2004 Oct 1;219(6):955–66. </div>
+           <div style="padding-right:3%; padding-left:3%;"> (2) Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, et al. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Research. 2014 Jul 1;42(W1):W252–8.</div>
+           <div style="padding-right:3%; padding-left:3%;"> (3) Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T. The SWISS-MODEL Repository and associated resources. Nucleic Acids Research. 2009 Jan 1;37(suppl_1):D387–92.  </div>
+           <div style="padding-right:3%; padding-left:3%;"> (4) Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics. 2006 Jan 15;22(2):195–201.</div>
+           <div style="padding-right:3%; padding-left:3%;"> (5) Expasy.org [homepage on the Internet]. Basel: Swiss Institute of Bioinformatics; [cited 30 September 2017] Available from: <a hef="https://swissmodel.expasy.org/docs/help"> https://swissmodel.expasy.org/docs/help </a></div>
+           <div style="padding-right:3%; padding-left:3%;"> (6) Vt.edu [homepage on the Internet]. Virginia: Department of Biochemistry; 2015 [cited 1 October 2017]. Available from: <a href="http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/"> http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/ </a></div>
+           <div style="padding-right:3%; padding-left:3%;"> (7) Search BioNumbers - The Database of Useful Biological Numbers. online: <a href="http://bionumbers.hms.harvard.edu/search.aspx?log=y&task=searchbytrmorg&trm=Salt&org=">http://bionumbers.hms.harvard.edu/search.aspx?log=y&task=searchbytrmorg&trm=Salt&org=</a>. Accessed October 30, 2017.</div>
+           <div style="padding-right:3%; padding-left:3%;"> (8) Lang BS, Gorren ACF, Oberdorfer G, Wenzl MV, Furdui CM, Poole LB, et al. Vascular Bioactivation of Nitroglycerin by Aldehyde Dehydrogenase-2. J Biol Chem. 2012 Nov 2;287(45):38124–34.</div>
+           <div style="padding-right:3%; padding-left:3%;"> (9) Grosdidier A, Zoete V, Michielin O. SwissDock, a protein-small molecule docking web service based on EADock DSS. Nucleic Acids Res. 2011 Jul;39(Web Server issue):W270-277.</div>
+           <div style="padding-right:3%; padding-left:3%;"> (10) Grosdidier A, Zoete V, Michielin O. Fast docking using the CHARMM force field with EADock DSS. J Comput Chem. 2011 Jul 30;32(10):2149–59.</div>
+           <div style="padding-right:3%; padding-left:3%;"> (11) Umbrella Sampling [Internet]. [cited 2017 Oct 29]. Available from: <a href="http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/umbrella/index.html"> http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/umbrella/index.html</a></div>
+           <div style="padding-right:3%; padding-left:3%; padding-bottom:2%;"> (12) Tam NM, Nguyen MT, Ngo ST. Evaluation of the absolute affinity of neuraminidase inhibitor using steered molecular dynamics simulations. Journal of Molecular Graphics and Modelling. 2017 Oct 1;77(Supplement C):137–42.</div>
+           <div style="padding-bottom: 1%; padding-left:3%;"> <b>Further Reading </b></div>
+           <div style="padding-right:3%; padding-left:3%;" > /1/ <a href="http://www.gromacs.org/">http://www.gromacs.org/</a></div>
+           <div style="padding-right:3%; padding-left:3%; padding-bottom: 2%;"> /2/ <a href="https://swissmodel.expasy.org/">https://swissmodel.expasy.org/</a> </div>
+    </div>
+    <div class="col-xs-2"></div>
+  </div>