Team:Uppsala/Model

Modeling

Homology Modeling of the Uncharacterized Enzymes

One of the first things we noticed when looking at our selected enzymes was that no three-dimensional structures were available and the enzymes were poorly characterized in general, so if we wanted to get a sense of what they might look like, we would have to figure something out. The obvious thing to do, we felt, was to turn to homology modeling.

The idea of homology modeling is to construct a 3D-structure by mapping the amino acid sequence onto a template – i.e. a known structure of a related, homologous protein through sequence alignment. For this to work you need a template with a solved 3D-structure. The quality of the model is determined by the alignment with, and, the structure of the template. Thus, ideally, you would like a high-resolution structure with high sequence identity.

While there are various tools available for homology modeling, such as MODELLER, we choose to use SWISS-MODEL /2/, a fully automated protein structure homology-modeling server (1, 2, 3). We plugged in the sequences of our chosen enzymes (CaCCD2, CsADH2946 and UGTCs2) and started modeling. While the available templates weren’t quite as high-quality as we had hoped, we were confident that they were sufficient to get the job done. We chose our templates and obtained the models detailed in figure 1 and the models along with the quality scoring are summarized in table 1. We could make two immediate observations. The CsADH2946 model seemed the most promising one, quality-wise with GMQE close to 1 and higher QMEAN being better. In addition, looking at the N-terminals (the blue ends in figure 1) we could see that they are stretched out, poking outwards from the protein. This was a good indication that we could put a His-tag at this end, with no complications. Another discovery we made was that the second step enzyme CsADH2946 is most likely a tetramer. This information was helpful when purifying the enzyme and going forth with molecular dynamics.

Molecular Dynamics in GROMACS

- The Art of Putting Digital Molecules in Digital Boxes of Water

So we constructed homology models of our enzymes. Are they good models? Are they realistic? There are several measurements that can be made on the models to estimate the answers to these questions. One such measurement is Global Model Quality Estimation (GMQE) and QMEAN (4), but the models are still just a guess of what our enzymes actually look like. To asses the models and prepare them for further research we used GROMACS /1/ to simulate our enzymes in saline water for a total of 100 ns. This lets us assess their stability and obtain new models that are closer to their native conformation which would be the most probable state of the enzymes during the activity measurements in our wet lab.

One of the reasons for obtaining these new models were the high price tags on most of the substrates. One way of minimizing substrate consumption is to start the enzymatic assays at concentrations close to the real KM values. An educated guess would be to use the KM observed in reactions by related enzymes, but since we are the modelling group our contributions are made through advanced modelling done with lots of computer power. Thus, after ensuring that our models are stable and realistic, doing substrate pulling simulations in GROMACS would be our next move in order to estimate the K_M

How We Simulated Our Enzymes

Simulating molecular dynamics is complicated, but luckily for us Justin Lemkul has written an excellent tutorial on how to do almost exactly the simulation we wanted to do (5). We followed that tutorial with only a few changes such as saline concentration, water model and force field parameters to better fit our purpose. All the steps until the production MD can be done on practically any system while the production MD will require a supercomputer to reduce time. Next follows a short description of what deviations we made from the tutorial by Justin as well as some tips and general comments. The headings corresponds to the headings in the tutorial.

Define Box and Solvate

So you want to put your protein in a box with a solvent, where the simulation will take place. The tutorial presents the rhombic dodecahedron as an option but we performed the simulation using a cube, with at least 1 nm between the protein and the box edge. A rhombic dodecahedron shape could increases the risk of your molecule crossing the border of the box. This gives visual glitches in the representation if your molecule that needs fixing before you can properly view the molecule. You can check out the “Fix Periodic Boundary Visualisation Issues” file below if you do run into these problems however.

After solvation you can remove water molecules that are overlapping with the model in order to cut down on the number of molecules you need to simulate later. This is however a rather small adjustment which we did not do ourselves.

Add Ions

We neutralized our enzymes and added ions to a concentration of 0.15 M to get closer to physiological conditions (6), instead of just neutralizing the enzyme.

Energy Minimization

Here you need a viewer that read .trr to view the nice molecular movie you just made. Alternatively use the trjconv command in GROMACS to combine the trajectory with your structure into a .pdb which most viewers can handle.

Equilibration

This step is not strictly needed since the later production MD step will do essentially the same work but unrestricted and for a longer time. However, you might want to do these steps since it relaxes the structure further to minimize the risk of melting the protein during the unrestricted simulation. (Queueing a whole rack of processors at a supercomputer can take upwards a month, if you don’t have priority time and the queue is long. Having a melted protein as result after that wait is not nice.) The equilibration steps are more demanding than the energy minimization and might need to run overnight if you are using an old system.

Production Molecular Dynamics

Now the actual simulation can take place. The guide performs a 1 ns Molecular dynamics (MD) simulation, which is a quite short simulation. We do MD simulations during 100 ns, giving the system enough time to get to converge to its equilibrated form. Have in mind that a bigger protein, such as our CsADH2946 being a tetramer, the simulation will take a lot longer time than it would for a smaller, monomeric protein. Check the results!

< div class="headers" style="padding-right:3%; padding-left:3%;"> The Resulting Stable Enzyme Models

< div style="padding-bottom:3%; padding-right:3%; padding-left:3%;">The result we got from the 100 ns stability simulations in GROMACS show a trajectory of each enzyme wiggling around trying to equilibrate into a low energy state. We created movies of these simulations that can be viewed below. The movies all confirmed that our homology models are good in the sense that they are not melting. Following the tutorial we also received RMSD plots of the simulations. The RMSD plot describes the distances between the atoms of the initial conformation going into the simulation and the atoms of the different conformational states the enzyme develops over the 100 ns. As can be seen in figure 2, The RMSD curve for each enzyme model converge towards one low, stable value. This shows that the models equilibrate to a stable state with low energy in the solvent and indicates that our homology models are in fact good models to represent our enzymes.