Homology Modeling of the Uncharacterized Enzymes
One of the first things we noticed when looking at our selected enzymes was that no three-dimensional structures were available and the enzymes were poorly characterized in general. To be able to get structural information we turned to homology modeling.
The idea of homology modeling is to construct a 3D-structure by mapping the amino acid sequence onto a template – i.e. a known structure of a related, homologous protein through sequence alignment. For this to work you need a template with a solved 3D-structure. The quality of the model is determined by the alignment with, and, the structure of the template. Thus, ideally, you would like a high-resolution structure with high sequence identity.
The tool we used to generate our homology models with was SwissDock /2/, a fully automated protein structure homology-modeling server (1, 2, 3). We plugged in the sequences of our chosen enzymes (CaCCD2, CsADH2946 and UGTCs2) and started modeling. The templates were chosen based upon quality parameters such as Global Mean Quality Estimation, QMEAN, resolution of the crystal structure and sequence identity. After choosing the best templates and obtained the models detailed in figure 1 and the models along with the quality scoring are summarized in table 1. We could make two immediate observations. The CsADH2946 model seemed the most promising one, quality-wise with GMQE close to 1 and higher QMEAN being better. In addition, looking at the N-terminals (the blue ends in figure 1) we could see that they are stretched out, poking outwards from the protein. This was a good indication that we could put a his-tag at this end, with no complications. Another discovery we made was that the second step enzyme CsADH2946 is most likely a tetramer. This information was helpful when purifying the enzyme and going forth with molecular dynamics.
Molecular Dynamics in GROMACS
- The Art of Putting Digital Molecules in Digital Boxes of Water
So we constructed homology models of our enzymes. Are they good models? Are they realistic? There are several measurements that can be made on the models to estimate the answers to these questions. One such measurement is GMQE and QMEAN (4), but the models are still just a guess of what our enzymes actually look like. To asses the models and prepare them for further research we used GROMACS /1/ to simulate our enzymes in saline water for a total of 100 ns. This lets us assess their stability and obtain new models that are closer to their native conformation which would be the most probable state of the enzymes during the activity measurements in our wet lab.
One of the reasons for obtaining these new models were the high price tags on most of the substrates. One way of minimizing substrate consumption is to start the enzymatic assays at concentrations close to the real KM values. An educated guess would be to use the KM observed in reactions by related enzymes, but since we are the modelling group our contributions are made through advanced modelling done with lots of computer power. Thus, after ensuring that our models are stable and realistic, doing substrate pulling simulations in GROMACS would be our next move in order to estimate the KM
How We Simulated Our Enzymes
Simulating molecular dynamics is complicated, but luckily for us Justin Lemkul has written an excellent tutorial on how to do almost exactly the simulation we wanted to do (5). We followed that tutorial with only a few changes such as saline concentration, water model and force field parameters to better fit our purpose. All the steps until the production MD can be done on practically any system while the production MD will require a supercomputer to reduce time. Next follows a short description of what deviations we made from the tutorial by Justin as well as some tips and general comments. The headings corresponds to the headings in the tutorial.
Define Box and Solvate
So you want to put your protein in a box with a solvent, where the simulation will take place. The tutorial presents the rhombic dodecahedron as an option but we performed the simulation using a cube, with at least 1 nm between the protein and the box edge. A rhombic dodecahedron shape could increases the risk of your molecule crossing the border of the box. This gives visual glitches in the representation if your molecule that needs fixing before you can properly view the molecule. You can check out the “Fix Periodic Boundary Visualisation Issues” file below if you do run into these problems however.
After solvation you can remove water molecules that are overlapping with the model in order to cut down on the number of molecules you need to simulate later. This is however a rather small adjustment which we did not do ourselves.
Add Ions
We neutralized our enzymes and added ions to a concentration of 0.15 M to get closer to physiological conditions (6), instead of just neutralizing the enzyme.
Energy Minimization
Here you need a viewer that read .trr to view the nice molecular movie you just made. Alternatively use the trjconv command in GROMACS to combine the trajectory with your structure into a .pdb which most viewers can handle.
Equilibration
This step is not strictly needed since the later production MD step will do essentially the same work but unrestricted and for a longer time. However, you might want to do these steps since it relaxes the structure further to minimize the risk of melting the protein during the unrestricted simulation. (Queueing a whole rack of processors at a supercomputer can take upwards a month, if you don’t have priority time and the queue is long. Having a melted protein as result after that wait is not nice.) The equilibration steps are more demanding than the energy minimization and might need to run overnight if you are using an old system.
Production Molecular Dynamics
Now the actual simulation can take place. The guide performs a 1 ns Molecular dynamics (MD) simulation, which is a quite short simulation. We do MD simulations during 100 ns, giving the system enough time to get to converge to its equilibrated form. Have in mind that a bigger protein, such as our CsADH2946 being a tetramer, the simulation will take a lot longer time than it would for a smaller, monomeric protein. Check the results!
The Resulting Stable Enzyme Models
The result we got from the 100 ns stability simulations in GROMACS show a trajectory of each enzyme wiggling around trying to equilibrate into a low energy state. We created movies of these simulations that can be viewed below. The movies all confirmed that our homology models are good in the sense that they are not melting. Following the tutorial we also received RMSD plots of the simulations. The RMSD plot describes the distances between the atoms of the initial conformation going into the simulation and the atoms of the different conformational states the enzyme develops over the 100 ns. As can be seen in figure 2, The RMSD curve for each enzyme model converge towards one low, stable value. This shows that the models equilibrate to a stable state with low energy in the solvent and indicates that our homology models are in fact good models to represent our enzymes.
Pulling Simulation with CsADH2946 and Crocetin Dialdehyde
Since CsADH2946 had great advancements in the lab, has the best homology model score of our enzymes and also has the RMSD which converges fastest at the lowest value showing that it is a stable structure, we decided to focus the rest of our computer power to do pulling simulations with this enzyme. A pulling simulation between the protein and a substrate allows us to estimate the dissociation constant Kd which can be compared to KM. This is very useful to compare to the experimentally estimated KM for CsADH2946 in order to validate the estimation.
Docking
To be able to execute pulling simulations we had to pick the ligands that were going to be pulled in the simulations. The obvious ligand of choice was crocetin dialdehyde, which is converted to crocetin by CsADH2946. To be able to verify that CsADH2946 actually binds crocetin dialdehyde, another ligand, the common acetaldehyde was chosen. Acetaldehyde was run in a separate simulation to crocetin dialdehyde.
Before being able to start the pulling simulations it is required to select an initial position which is done by docking the ligands. That position should be the active site of the enzyme. The location of the active site was found by looking at the template molecule (PDB entry 4fqf) (7). In that article the active site was on a loop containing three cysteines (C301, C302 and C303). In our homology model that corresponded to C337, C338 and C339. Because of (7) we also decided to focus on one of the subunits, which subunit was not important since both 4fqf and our model are homo-tetramers. The docking itself was done using SwissDock (8, 9). For the docking mol2 files for both of the ligands were generated and found through the ligand database SwissDock provided. The pdb file for CsADH2946 was the last frame of 100 ns molecular dynamics simulation. As for parameters, in SwissDock you can provide room coordinates for centering and the size of a box where the server should try to dock the ligand. When the docking was complete one of the seeds was chosen based on the distance to the active site and the ligand coordinates were used for the pulling simulations.
Pulling
Once again we used a tutorial by Justin Lemkul, this time on the subject of umbrella sampling (10). However, we only performed the steps until the configuration generation, with slight modifications that were more suitable to our enzyme (the pulling trajectory for example). Before starting the simulation we needed to select a trajectory in which the ligands were to be pulled. The choice of trajectory was based on a path that was not obstructed by any residues. The pulling simulation was run for 8000 picoseconds pulling the ligands 0.25 nm/ns. The results from the simulation can be seen in the movies below and in figure 3 (insert pulling movies for each of the ligands below). These indicated that more force is required to be able to pull out crocetin dialdehyde than acetaldehyde from the active site of CsADH2946 as the peaks in figure 3 can be interpreted as energy barriers being crossed. This can also be seen in the movies where crocetin dialdehyde gets stuck in two positions. Thus, the CsADH2946 model demonstrates specificity towards crocetin dialdehyde.
Using the Pulling Simulation to Estimate the Dissociation Constant of Crocetin Dialdehyde
The resulting maximum force applied to crocetin dialdehyde could then be used to roughly calculate the dissociation constant. To do this the following equation was used where 𝜟G is Gibbs free energy, R is the gas constant, T the temperature and Kd the dissociation constant:
This equation assumes that the system is in equilibrium. What we need is 𝜟G to be able to calculate the dissociation constant. To get 𝜟G the maximum force (284.2160 kJ/(mol*nm)) was first converted into pN, where we assumed that we had one mole in our system. The resulting force was 471.9628 pN. 𝜟G was then calculated by extrapolating data from a study where they plotted the so called rupture force against their experimental 𝜟G values (11). Calculation of Kd using equation (1) yielded a value of 4.9321 µM.
Thus we are able to compare the dissociation constant from our molecular dynamics simulations to the value of KM (~20.7842 µM) we estimated in our Bayesian interference model (link to Km estimation modelling) and draw the conclusion that the estimated KM is approximately four times larger than the calculated Kd. With these values a lot less guesswork would be required when setting up an activity assay, and we therefore hope that other iGEM teams continue our work of characterizing CsADH2946.