Difference between revisions of "Team:IONIS-PARIS/Modeling/rnafolding"

Line 172: Line 172:
 
                           <center>
 
                           <center>
 
                             <p><b>We know that it could be hard to launch yourself in these kind of work. To help you with that, we wrote a handbook to introduce you to the RNA simulation. We hope it will help all the teams to get involve with the computational approach.</b></p>
 
                             <p><b>We know that it could be hard to launch yourself in these kind of work. To help you with that, we wrote a handbook to introduce you to the RNA simulation. We hope it will help all the teams to get involve with the computational approach.</b></p>
                           <a class ="cacahuete" role="button" targert="_blank"href="https://static.igem.org/mediawiki/2017/4/4b/Ionis-Paris-2017-Modelling-handbook.pdf">Download Modeling Handbook</a>
+
                           <a class ="cacahuete" role="button" target="_blank"href="https://static.igem.org/mediawiki/2017/4/4b/Ionis-Paris-2017-Modelling-handbook.pdf">Download Modeling Handbook</a>
 
                         </center>
 
                         </center>
 
                           <hr>
 
                           <hr>

Revision as of 17:46, 31 October 2017

Back to top

Structure-function characterization of the cspA 5'UTR through molecular modeling

Overview:

  • We used a computanional approach to study the folding and the conformation of our RNA to understand and bring an explanation of its mechanism.
  • We identified the more stable and realistic structures based on computational analysis
  • In complement to the folding simulation, we used RNA-protein docking to study the interaction with a ribosomal protein.
  • Introduction


    In this project we developed a genetic sequence that allows bacteria to produce amilCP when it undergoes cold shock. To achieve this, we chose to use the cspA promoter that is activated during cold shock in several bacteria such as wild type (WT) Escherichia coli. Also, based on the literature we choose to include the UP element of the cspA promoter, the cspA 5’UTR, and the DownStream Box (DSBox) in our final construction. These temperature sensing sequence motifs are upstream of our reporter protein (amilCP), with the exception of the DSBox (see Fig.1). Our bibliographic review indicates that these components are required to see the conditional expression of amilCP only at low temperature1,2 (under 15°C, approximately).


    Figure 1: Schema of sequence 7 construction.


    We found in the bibliography that the current main hypothesis that could explain the expression of the cold shock protein family is the structure of the mRNA. At 37°C their mRNA is quickly degraded. As the temperature decreases the mRNA assumes a different structure. This leads to the stabilization of the mRNA structure at low temperatures.
    There were no experimental structures solved for the cspA 5’UTR, nor for any close sequence homologs. Therefore, we used a computational method, SimRNA3, in order to generate 3D models of this region of the cspA mRNA.
    Our results suggest that several short distinct sequence motifs classically identified as important for ribosome binding actually participate together in the formation of a secondary structure motif that is recognized in a generally non base-pair specific manner by ribosomal accessory proteins in the formation of the pre-initiation complex at low temperatures.

    Materials and Methods


    Software

    SimRNA models RNA with a coarse grained 3D model subjected to a statistical potential derived from a set of experimentally determined RNA structures, and uses a Monte Carlo method for conformational sampling.
    For our first simulations we decided to try the isotemperature method. It was supposed to show the folding of RNA at a fixed temperature. Unfortunately, initial results were quite variable in their final structures. The simulations begin with RNA in a planar circle conformation. At each iteration of the Monte Carlo procedure, it calculates the lowest energy conformation for the RNA. The problem is that the simulation can get stuck in a local energy minimum, and as the simulation runs it becomes unable to escape it. Thus, we can’t see if there is another local minimum deeper than the one where our simulation was stuck. This is why we decided to switch to the Replica Exchange Monte Carlo (REMC) protocol for our simulations.
    Replica-Exchange Monte Carlo is a technique for enhancing the conformational sampling of systems by Monte Carlo methods. It requires the creation of several replica simulations of the same system, with each replica running at a different simulated temperature. Conformations that are close in energy are exchanged periodically between each replica, allowing the detection and exploration of the principal local minima on the system’s potential energy surface. This corresponds to the discovery of new conformations separated from initial conditions by energy barriers.

    Simulation parameters

    In our simulations, the cspA 5’UTR sequence comes from the registered part BBa_K328001:

    ACGGUUUGACGUACAGACCAUUAAAGCAGUGUAGUAAGGCAAGUCCCUUCAAGAGUUAUCGUUGAUACCCCUCGUAGUGCACAUUCUUUAACGCUUCAAAAUCUGUAAAGCACGCCAUAUCGCCGAAAGGCACACUUAAUUAUUAAAGGUAAUACACUAUGAGUAUGACUGGUAUCGUAAAUGACU

    The REMC simulations were made of eight replicas, running for 16 000 000 steps in total. One frame was recorded every 320 steps, which correspond to a total of 50 000 frames per replica. Each frame is a PDB file of one particular step of the folding process.

    The initial temperature of the annealing process was set to 1.35 and the final temperature to 0.9, (per the published SimRNA benchmark settings), recalling that these values are dimensionless (they are the Lennard Jones reduced temperature unit).

    Once the simulation is done, SimRNA generates one trajectory file per replica. To obtain the representative structures of the simulation, we have done a clustering analysis. SimRNA has a built in clustering routine. We choose to take the top 1% of the lowest energy structures and separated them with a threshold of 18.7 Ångström (per the published clustering guideline of allowing 0.1 Ångström/nucleotide).

    Hardware

    To run our simulation we used a personal gaming computer running Linux 16.04 LTS.

    Hardware Specifications:

    CPU: Intel Core i7 6700K @ 4.00GHz

    RAM: 16,0 Go Dual-Channel DDR4

    Motherboard: ASUSTeK COMPUTER INC. Z170 PRO GAMING (LGA1151)

    Graphics: NVIDIA GeForce GTX 1070 - 8Go



    We know that it could be hard to launch yourself in these kind of work. To help you with that, we wrote a handbook to introduce you to the RNA simulation. We hope it will help all the teams to get involve with the computational approach.

    Download Modeling Handbook

    Results

    We chose to take the top 1% of the lowest energy structures and separated them with a threshold of 18.7 Ångström (per the published clustering guideline of allowing 0.1 Ångström/nucleotide).

    Figure 2: Percentage of structures in each cluster.


    For our simulation we choose to work with the top 6 clusters (out of 25 identified in total) as they represent more than 90% of the clustered structures, as displayed in the clustering log file

    3D representative structures viewer:


  • With the REMC simulation, clustering process, and the use of GROMACS5 to analyze the trajectory file we obtained the 3D model of each lowest energy structure of each cluster and also the energy value.

    Figure 3: representative structures of each cluster with the DS box in yellow.


    Cluster number

    Energy of the lowest energy structures of each cluster in SimRna energy units

    1

    -2043

    2

    -2073

    3

    -2048

    4

    -2046

    5

    -2034

    6

    -2032

    Table 1: Energy value of lowest energy frame of each cluster in SimRNA energy units


    Figure 4: REMC cloud of the cspA 5’UTR simulation. 40 000 data points were plot with GNUplot.

    The REMC Cloud is a graph that allow usto sort RNA structures generated during the simulation according to their energy and their Root-Mean-Square-Deviation (RMSD) from a selected reference structure. This the average distance between the homologous  atoms of different conformations of a molecule.

    This  graph allow us to see the different steps of the simulation with 3 dot groups. Group I corresponds to the first RNA structures generated, as the simulation begin with a circular RNA it take a little bit of iterations to find lower energy structures. Group II corresponds to intermediate structures, we can see that at this point RMSD of the structures have also decreased meaning that the RNA is becoming more folded. Finally, Group III corresponds to final structures, folded into structures of low energy.

    Regarding the energy values of each cluster’s representative structure, they are all in Group III, so this group represents 6 different RNA structures. We can’t characterize the clusters by global energy or RMSD. To go further in the cluster characterisation we have to study the conformations and nucleic acid interactions in each structure.


    Figure 5: Contact map of each cluster.


    The intramolecular contact map, generated with RNAmap2D6 allow us to see which nucleic acids are interacting between each others. It also allow us to rapidly compare different conformations between several RNA structures. Here, it reveals that the top 6 clusters actually describe 3 possible main folding topologies for the sequence.

    Possible Similar Clusters:

    Set 1: clusters 1,3,4,6 (topologically similar to each other, despite some differences in 3D)

    Set 2: cluster 2

    Set 3: cluster 5

    We aim to compare the clusters of the same set and between sets, in order to see the similarities and differences. We use a “model vs model contact map” to achieve that goal.


    Figure 6: Contact map Model vs Model. On each figure, one cluster is at the left bottom side, and the other at the right top side. The common nucleic acid interactions are colored in blue and unique nucleic acid interactions in red.


    We saw that the cluster 1, 3, 4, and 6 have some close similarities, the fact that they are in different cluster can be explained by rigid body movements of subdomains and not topologically different folding. The clusters 2 and 5 are rather unique, compared to each other and to cluster 1.

    So we can consider that there are 3 main topologies, and the secondary structure study will be based on those results.


    The study of the article “The cspA mRNA Is a Thermosensor that Modulates Translation of the Cold-Shock Protein CspA.”4 indicates that ribosomal protein S1 could be the key to cspA translation. Ribosomal protein S1 is crucial for ribosome initiation on many mRNAs, particularly for those with highly structured 5′ regions, and those with no or weak Shine-Dalgarno sequences.  To compare the three differents secondary structures from our sets with the wild-type cspA secondary structure from the article (mapped by RNA footprinting), we have highlighted in green the putative S1 binding domain (residue 134 to 147), in blue the initiation codon (residue 161 to 163), in red the Shine-Dalgarno sequence (residue 148 to 151) and in yellow the DSbox sequence as general information.


    Figure 7: Comparison between Cluster 1 and the article’s figure


    The first similarity between the two structures is the big semi-loop that begin with the first nucleic acid and end with the end of the mRNA sequence where the reference sequences are located. On both structures, the S1 binding site and the Shine-Dalgarno sequences are located on a hairpin. Finally, under the semi-loop, the two structures present a long branch.


    Figure 8: Comparison between Cluster 2 and the article’s figure


    The cluster 2 presents no real similarities to the article’s figure. Indeed, we can’t see this big semi-loop and the DS box is not located on a hairpin as it is on the article’s figure.

    Figure 9: Comparison between Cluster 5 and the article’s figure


    When we first look at the general structure of the two structures we can see a striking resemblance: there is the semi-loop with three hairpins and a long branch at the bottom. But when we look at the reference sequences locations on the cluster 5, it doesn’t correspond to the reference sequences locations on the article’s figure.


    The secondary structure analysis shows that the wild-type cspA has a secondary structure rather similar to our cspA secondary structure from cluster 1. Therefore, the lowest energy structure from the first cluster could be a good docking template for the S1 binding protein. This docking is presented in the next section.

    cspA 5’UTR - S1 domain 3 docking

    Based on our previous cspA 5’UTR results, we wished to study the docking procedure of the csPA 5’UTR on its natural ligand. To do so, we extensively researched the possible ligand on the ribosome7.

    Our research lead us to consider the ribosomal protein subunit 1 (S1), and more specifically its domain 3 (d3).

    Through modeling, we used several modeling servers, step by step, to obtain a trustworthy docking structure. We used the phyre28 server to have a homology model (based on template selection, followed by MAFFT-L-INSI9 (to check the template-target alignment)and I-Tasser10 (to generate the final model, based on previous information). All parameters inputted were defaults, except for manually indicating MAFFT the L-INSI algorythm.

    Figure 10: S1 modeled protein structure with (left) and without (right) Sticks


    On this structure, we observed a specific motif with conserved residues, that we will call “valley” from now onwards. This valley seemed to be an accessible site, being located on the exterior side of S1_d3, while having two strands of highly conserved residues bordering it. As such, we considered it a very likely docking site for our cspA 5’UTR structure.

    To determine whether this was due solely to the modeling process, we compared it to a reference sequence found in the PROTEIN DATABASE (PDB). This sequence, referred as the code 4v9r on the PDB, consisted of S1_d3 bound to single stranded RNA (ssRNA)12. We applied the same steps used for our S1_d3, resulting in the following model.

    Figure 11: S1 reference protein structure with (left) and without (right) Sticks


    On this structure, we also observe the same valley motif with highly conserved residues, reinforcing our previous hypothesis.

    From the previous models, we decided to study more precisely the conservation of residues to determine the best docking site possible and test our valley hypothesis. This was done using the Consurf web server for both S1-d3 and 4v9r13

    Figure 12: S1 protein residue conservation : S1_d3 (left) and 4v9r (right)


    As we suspected, Consurf reaffirmed the existence of the valley presenting many highly conserved residues (darker coloration). This confirmed our previous hypothesis concerning the docking site.

    Now that we had the ligand, the receptor and the likely RNA binding site, we had all we needed to launch an actual docking simulation. To do so, we used the PatchDock server13, setting the receptor as S1_d3, the ligand as our 5’UTR (SimRNA cluster 1 result) and using default settings (4 REMC, Default complex type).

    Contrary to what we initially expected, we did not get on clear docking position (the ssRNA part), but several groups of docking positions, as shown below.

    Figure 13 & 14 : S1 + cspA 5'UTR docking snapshots/results.

    “Many of the resulting poses show the the conserved binding site interacting with the RNA. We are still analyzing these results.”

    Although we obtained several positions, we still notice that many of them correspond to the conserved binding sites interacting with the RNA. We are still analyzing these results to determine the most likely docking structure.

    Conclusion

    The SimRNA approach offers a good way to obtain a pool of structures with Replica Exchange Monte Carlo. The clustering allowed the selection and the discrimination of structures by energy value, and RMSD, and fold topology. We have isolated 3 structures sets from 6 clusters which represent more than 90% of the isolated structures from the clustering process, thanks to the contact maps analysis. Now we work on one defined structure, and the analysis on the docking protein/RNA is in progress.



    References:

    1. Yang, L., Zhou, D., Liu, X., Han, H., Zhan, L., Guo, Z., Zhang, L., Qin, C., Wong, H. and Yang, R. (2009). Cold-induced gene expression profiles of Vibrio parahaemolyticus: a time-course analysis. FEMS Microbiology Letters, 291(1), pp.50-58.
    2. Xia, B., Ke, H., Jiang, W. and Inouye, M. (2001). The Cold Box Stem-loop Proximal to the 5′-End of theEscherichia coli cspAGene Stabilizes Its mRNA at Low Temperature. Journal of Biological Chemistry, 277(8), pp.6005-6011.
    3. Michal J. Boniecki, Grzegorz Lach, Wayne K. Dawson, Konrad Tomala, Pawel Lukasz, Tomasz Soltysinski, Kristian M. Rother, Janusz M. Bujnicki; SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction, Nucleic Acids Research, Volume 44, Issue 7, 20 April 2016, Pages e63, doi.org/10.1093/nar/gkv1479
    4. Giuliodori, Anna & Di Pietro, Fabio & Marzi, Stefano & Masquida, Benoît & Wagner, Rolf & Romby, Pascale & Gualerzi, Claudio & L Pon, Cynthia. (2010). The cspA mRNA Is a Thermosensor that Modulates Translation of the Cold-Shock Protein CspA. Molecular cell. 37. 21-33. 10.1016/j.molcel.2009.11.033
    5. GROningen MAchine for Chemical Simulations, gromacs.org
    6. Pietal, M. J., Szostak, N., Rother, K. M., & Bujnicki, J. M. (2012). RNAmap2D–calculation, visualization and analysis of contact and distance maps for RNA and protein-RNA complex structures. BMC bioinformatics, 13(1), 333.
    7. Duval, M., Korepanov, A., Fuchsbauer, O., Fechter, P., Haller, A., Fabbretti, A., … Marzi, S. (2013). Escherichia coli Ribosomal Protein S1 Unfolds Structured mRNAs Onto the Ribosome for Active Translation Initiation. PLoS Biology, 11(12), e1001731, doi.org/10.1371/journal.pbio.1001731
    8. Protein Homology/analogy Recognition Engine V 2.0, Phyre 2
    9. Multiple alignment program for amino acid or nucleotide sequences, mafft.cbrc.jp/alignment/server/
    10. Iterative Threading ASSEmbly Refinement, I-TASSER/
    11. Protein 4v9r
    12. ConSurf Server
    13. PATCHDOCK  Molecular Docking Algorithm Based on Shape Complementarity Principles, PATCHDOCK