Team:Uppsala/Model

Modeling

Homology Modeling of the Uncharacterized Enzymes

One of the first things we noticed when looking at our selected enzymes was that no three-dimensional structures were available and the enzymes were poorly characterized in general. To be able to get structural information we turned to homology modeling.

The idea of homology modeling is to construct a 3D-structure by mapping the amino acid sequence onto a template – i.e. a known structure of a related, homologous protein through sequence alignment. For this to work you need a template with a solved 3D-structure. The quality of the model is determined by the alignment with, and, the structure of the template. Thus, ideally, you would like a high-resolution structure with high sequence identity.

The tool we used to generate our homology models was SwissDock /2/, a fully automated protein structure homology-modeling server (1, 2, 3). We plugged in the sequences of our chosen enzymes (CaCCD2, CsADH2946 and UGTCs2) and started modeling. The templates were chosen based upon quality parameters such as Global Mean Quality Estimation, QMEAN, resolution of the crystal structure and sequence identity (4). After choosing the best templates and obtained the models detailed in figure 1 and the models along with the quality scoring are summarized in table 1. We could make two immediate observations. The CsADH2946 model seemed the most promising one, quality-wise with GMQE close to 1 and higher QMEAN being better. In addition, looking at the N-terminals (the blue ends in figure 1) we could see that they are stretched out, poking outwards from the protein. This was a good indication that we could put a his-tag at this end, with no complications. Another discovery we made was that the second step enzyme CsADH2946 is most likely a tetramer. This information was helpful when purifying the enzyme and going forth with molecular dynamics.

Figure 1. Homology models of the three enzymes selected for the zeaxanthin-crocin pathway. The first and the third enzyme are monomers, while CsADH2946 is a tetramer. The homology models were created using SWISS-MODEL /2/.

Molecular Dynamics in GROMACS

– The Art of Putting Digital Molecules in Digital Boxes of Water

To assess the models and prepare them for further characterization we used GROMACS /1/ to simulate our enzymes in saline water for a total of 100 ns. This lets us assess their stability and obtain new models that are closer to their native conformation which would be the most probable state of the enzymes during the activity measurements in our wet lab.

How We Simulated Our Enzymes

The MD simulations were performed based on previously defined protocols, which were nicely summarized by Justin Lemkul in a tutorial (5). All the steps until the production MD can be done on practically any system while the production MD will require a supercomputer to reduce time. Next follows a short description of the main steps in the tutorial as well as some tips and general comments. The headings corresponds to the headings in the tutorial.

Define Box and Solvate

So you want to put your protein in a box with a solvent, where the simulation will take place. The tutorial presents the rhombic dodecahedron as an option but we performed the simulation using a cube, with at least 1 nm between the protein and the box edge. A rhombic dodecahedron shape could increases the risk of your molecule crossing the border of the box. This gives visual glitches in the representation if your molecule that needs fixing before you can properly view the molecule. You can check out the “Fix Periodic Boundary Visualisation Issues” file below if you do run into these problems however.

After solvation you can remove water molecules that are overlapping with the model in order to cut down on the number of molecules you need to simulate later. This is however a rather small adjustment which we did not do ourselves.

Add Ions

We neutralized our enzymes and added ions to a concentration of 0.15 M to get closer to physiological conditions (6), instead of just neutralizing the enzyme.

Energy Minimization

Here you need a viewer that read .trr to view the nice molecular movie you just made. Alternatively use the trjconv command in GROMACS to combine the trajectory with your structure into a .pdb which most viewers can handle.

Equilibration

This step is not strictly needed since the later production MD step will do essentially the same work but unrestricted and for a longer time. However, you might want to do these steps since it relaxes the structure further to minimize the risk of melting the protein during the unrestricted simulation. (Queueing a whole rack of processors at a supercomputer can take upwards a month, if you don’t have priority time and the queue is long. Having a melted protein as result after that wait is not nice.) The equilibration steps are more demanding than the energy minimization and might need to run overnight if you are using an old system.

Production Molecular Dynamics

Now the actual simulation can take place. The guide performs a 1 ns MD simulation, which is a quite short simulation. We do MD simulations during 100 ns, giving the system enough time to get to converge to its equilibrated form. Have in mind that a bigger protein, such as our CsADH2946 being a tetramer, the simulation will take a lot longer time than it would for a smaller, monomeric protein. Check the results!

The Resulting Stable Enzyme Models

The result we got from the 100 ns stability simulations in GROMACS show a trajectory of each enzyme wiggling around trying to equilibrate into a low energy state. We created movies of these simulations that can be viewed below. The movies all confirmed that our homology models are good in the sense that they are not melting. Along with the movies we received RMSD plots. The RMSD plot describes the distances between the atoms of the initial conformation going into the simulation and the atoms of the different conformational states the enzyme develops over the 100 ns. As can be seen in figure 2, The RMSD curve for each enzyme model converge towards one low, stable value. This shows that the models equilibrate to a stable state with low energy in the solvent and indicates that our homology models are in fact good models to represent our enzymes.

Figure 2. RMSD plot of a 100 ns stability simulation in GROMACS of homology models to the enzymes CaCCD2, CsADH2946 and UGTCs2.

Pulling Simulation with CsADH2946 and Crocetin Dialdehyde

Since CsADH2946 had great advancements in the lab, has the best homology model score of our enzymes and also has the RMSD which converges fastest at the lowest value showing that it is a stable structure, we decided to focus the rest of our computer power to do pulling simulations with this enzyme. A pulling simulation between the protein and a substrate allows us to estimate the dissociation constant K_d which can be compared to K_M. This is very useful to compare to the experimentally estimated K_M for CsADH2946 in order to validate the estimation.

Docking

To be able to execute pulling simulations we had to pick the ligands that were going to be pulled in the simulations. The obvious ligand of choice was crocetin dialdehyde, which is converted to crocetin by CsADH2946. To be able to verify that CsADH2946 actually binds crocetin dialdehyde, another ligand, the common acetaldehyde was chosen. Acetaldehyde was run in a separate simulation to crocetin dialdehyde.

Before being able to start the pulling simulations it is required to select an initial position which is done by docking the ligands. That position should be the active site of the enzyme. The location of the active site was found by looking at the template molecule (PDB entry 4fqf) (7). In that article the active site was on a loop containing three cysteines (C301, C302 and C303). In our homology model that corresponded to C337, C338 and C339. Because of (7) we also decided to focus on one of the subunits. The docking itself was done using SwissDock (8, 9). For the docking mol2 files for both of the ligands were generated and found through the ligand database SwissDock provided. The pdb file for CsADH2946 was the last frame of 100 ns MD simulation. As for parameters, in SwissDock you can provide room coordinates for centering and the size of a box where the server should try to dock the ligand. When the docking was complete one of the seeds was chosen based on the distance to the active site and the ligand coordinates were used for the pulling simulations.

Pulling

Once again we used a tutorial by Justin Lemkul, this time on the subject of umbrella sampling (10). However, we only performed the steps until the configuration generation, with slight modifications that were more suitable to our enzyme (the pulling trajectory for example). Before starting the simulation we needed to select a trajectory in which the ligands were to be pulled. The choice of trajectory was based on a path that was not obstructed by any residues. The pulling simulation was run for 8000 picoseconds pulling the ligands at a velocity of 0.25 nm/ns. The results from the simulation can be seen in the movies below and in figure 3 (insert pulling movies for each of the ligands below). These indicated that more force is required to be able to pull out crocetin dialdehyde than acetaldehyde from the active site of CsADH2946 as the peaks in figure 3 can be interpreted as energy barriers being crossed. This can also be seen in the movies where crocetin dialdehyde gets stuck in two positions. Thus, the CsADH2946 model demonstrates specificity towards crocetin dialdehyde.

Using the Pulling Simulation to Estimate the Dissociation Constant of Crocetin Dialdehyde

The resulting maximum force applied to crocetin dialdehyde could then be used to roughly calculate the dissociation constant. To do this the following equation was used where 𝜟G is Gibbs free energy, R is the gas constant, T the temperature and K_d the dissociation constant:

\Delta G = -RTln(K_{d})

This equation assumes that the system is in equilibrium. What we need is 𝜟G to be able to calculate the dissociation constant. To get 𝜟G the maximum force (284.2160 kJ/(mol*nm)) was first converted into pN, where we assumed that we had one mole in our system. The resulting force was 471.9628 pN. 𝜟G was then calculated by extrapolating data from a study where they plotted the so called rupture force against their experimental 𝜟G values (11). Calculation of Kd using equation (1) yielded a value of 4.9321 µM.

Thus we are able to compare the dissociation constant from our molecular dynamics simulations to the value of KM (~20.7842 µM) we estimated in our Bayesian interference model (link to Km estimation modelling) and draw the conclusion that the estimated KM is approximately four times larger than the calculated Kd. With these values a lot less guesswork would be required when setting up an activity assay, and we therefore hope that other iGEM teams continue our work of characterizing CsADH2946.

Estimation of K_M for CsADH2946

Since the kinetic parameters for CsADH2946 have never been determined, we wanted to characterize the Michaelis-Menten kinetics of the reaction and estimate KM of our CsADH2946 using the activity measurement results(link Step 2 - Crocetin Dialdehyde → Crocetin). The substrate crocetin dialdehyde has low water solubility and will therefore partially precipitate when coming in contact with water. Quantitative kinetic characterization of such a system is especially complicated since the substrate concentration at the beginning of the experiment, S0 , is not known. We have therefore developed a probabilistic method that is capable of estimating Michaelis-Menten model parameters as well as the error associated with them for such systems and we demonstrate its use on our enzyme CsADH2946. The MATLAB-script we created can be downloaded here(Probabilistic inference.zip).

Docking

To be able to execute pulling simulations we had to pick the ligands that were going to be pulled in the simulations. The obvious ligand of choice was crocetin dialdehyde, which is converted to crocetin by CsADH2946. To be able to verify that CsADH2946 actually binds crocetin dialdehyde, another ligand, the common acetaldehyde was chosen. Acetaldehyde was run in a separate simulation to crocetin dialdehyde.

Before being able to start the pulling simulations it is required to select an initial position which is done by docking the ligands. That position should be the active site of the enzyme. The location of the active site was found by looking at the template molecule (PDB entry 4fqf) (7). In that article the active site was on a loop containing three cysteines (C301, C302 and C303). In our homology model that corresponded to C337, C338 and C339. Because of (7) we also decided to focus on one of the subunits, which subunit was not important since both 4fqf and our model are homo-tetramers. The docking itself was done using SwissDock (8, 9). For the docking mol2 files for both of the ligands were generated and found through the ligand database SwissDock provided. The pdb file for CsADH2946 was the last frame of 100 ns molecular dynamics simulation. As for parameters, in SwissDock you can provide room coordinates for centering and the size of a box where the server should try to dock the ligand. When the docking was complete one of the seeds was chosen based on the distance to the active site and the ligand coordinates were used for the pulling simulations.

Pulling

Once again we used a tutorial by Justin Lemkul, this time on the subject of umbrella sampling (10). However, we only performed the steps until the configuration generation, with slight modifications that were more suitable to our enzyme (the pulling trajectory for example). Before starting the simulation we needed to select a trajectory in which the ligands were to be pulled. The choice of trajectory was based on a path that was not obstructed by any residues. The pulling simulation was run for 8000 picoseconds pulling the ligands 0.25 nm/ns. The results from the simulation can be seen in the movies below and in figure 3 (insert pulling movies for each of the ligands below). These indicated that more force is required to be able to pull out crocetin dialdehyde than acetaldehyde from the active site of CsADH2946 as the peaks in figure 3 can be interpreted as energy barriers being crossed. This can also be seen in the movies where crocetin dialdehyde gets stuck in two positions. Thus, the CsADH2946 model demonstrates specificity towards crocetin dialdehyde.

Using the Pulling Simulation to Estimate the Dissociation Constant of Crocetin Dialdehyde

The resulting maximum force applied to crocetin dialdehyde could then be used to roughly calculate the dissociation constant. To do this the following equation was used where 𝜟G is Gibbs free energy, R is the gas constant, T the temperature and Kd the dissociation constant:

This equation assumes that the system is in equilibrium. What we need is 𝜟G to be able to calculate the dissociation constant. To get 𝜟G the maximum force (284.2160 kJ/(mol*nm)) was first converted into pN, where we assumed that we had one mole in our system. The resulting force was 471.9628 pN. 𝜟G was then calculated by extrapolating data from a study where they plotted the so called rupture force against their experimental 𝜟G values (11). Calculation of Kd using equation (1) yielded a value of 4.9321 µM.

Thus we are able to compare the dissociation constant from our molecular dynamics simulations to the value of KM (~20.7842 µM) we estimated in our Bayesian interference model (link to Km estimation modelling) and draw the conclusion that the estimated KM is approximately four times larger than the calculated Kd. With these values a lot less guesswork would be required when setting up an activity assay, and we therefore hope that other iGEM teams continue our work of characterizing CsADH2946.