Team:Toronto/Protein-Modelling

Protein Modelling

Introduction

LacILOV is a protein fusion of the helix-loop-helix (HLH) motif from LacI and a Light-Oxygen-Voltage (LOV) sensing domain. The HLH motif functions as a DNA-binding domain for the lac operon, and the LOVII domain changes conformation when exposed to blue light. LacILOV, as a fusion protein of these two domains, regulates gene expression through the lac operon in the presence/absence of blue light.

Rationale

In the lab, we used a photo-reporter assay to measure gene expression activity under the control of LacILOV. We transformed E.coli expressing mCherry as a reporter gene under the control of LacILOV. However, significant expression levels of the reporter gene required 12 hours of stimulation with blue light. Investigating the structural mechanisms of how LacILOV binds to DNA could provide insights to engineering a LacILOV variant with greater gene expression efficacy. Using state of the art protein modeling tools, we computationally produced a protein structure of LacILOV and identified a critical α-helix that stabilizes its DNA binding domain. We were able to show that deletion of specific amino acid residues in this critical α-helix region led to destabilization of the DNA binding domain. By targeting these amino acid residues, we could potentially engineer a more efficient LacILOV in the future.

LacILOV Protein Structure Prediction

To model the structure of LacILOV, we first used PyRosetta, a Python-based platform for Rosetta protein modelling software to predict protein folding by creating customized modeling algorithms. PyRosetta uses Rosetta sampling and scoring functions, such as for protein structure manipulation and energy calculations for running Monte Carlo-based simulations. [2] As a validation step, we wrote a script to predict for LOVII domain in which its crystal structure was already solved and published in Protein Data Bank (PDB) (PDB ID: 2V1A). The protein structure we generated using our scripts were not comparable to the crystal structures of the LOVII domain that is already published in the Protein Data Bank (PDB), the single database of storing information about the 3D structures of large biological molecules, including proteins. Since our scripts did not generate models comparable to the crystal structures, we decided to use a well-established protein structure prediction server, I-TASSER, developed by the Zhang lab. [3][4]

Figure 1: I-TASSER predicted structure of LacILOV without mutations (left) compared to crystal structure (2V1A) of LOV II domain (right). Shows accuracy of I-TASSER for predicting 3D structure.

I-TASSER performs three main steps in structure prediction. For the first step, for a submitted amino acid sequence, I-TASSER uses threading to retrieve template proteins of similar folds from the PDB, to identify template structures that are structurally similar to the sequence. Threading works by aligning each amino acid from the submitted sequence to a position in a template structure, and assessing how well this sequence fits the template. For the second step, fragments from threading-aligned regions are then taken from the template structures and assembled into full-length models using Monte Carlo-based simulations, while threading-unaligned regions of the sequence and any cases where no template structure is found are built by ab initio modelling. Structure assembly by Monte Carlo-based simulations is the most time-consuming step of I-TASSER, and it is guided by a composite energy function that has three different terms: a statistical energy term derived from experimentally-solved crystal structures, a template-based energy term from template structures in the PDB database, and the option of a user-specified restraint. This calculates the total energy for one structure. Thousands of different protein structures are generated from the same submitted sequence, which are then clustered based on structure similarity. For the third step, the cluster centroids, which are obtained by averaging the coordinates of all structures, then undergo structure re-assembly once more to refine local geometries and remove steric clashes. These structures are then clustered again and the lowest energy structures are selected, of which energy was calculated from the composite energy function described above. The final full-atomic models are then built by creating atomic details based on the selected structures through optimization of the hydrogen-bonding network. This completes the structure prediction process of a protein by I-TASSER.

To evaluate the accuracy of generated structure models, the I-TASSER C-score (confidence score) is calculated from the structure assembly simulations. The C-score is typically in the range of -5 to 2, where a higher C-score signifies a model with a high confidence. A benchmark test has also shown that protein structures have an accuracy at the residue level of an average error of less than 1.5 Å compared with X-ray crystallography data with an I-TASSER C-score of less than -1.5. The I-TASSER server has also participated in CASP (Critical Assessment of Techniques for Protein Structure Prediction), a community-wide experiment that evaluates the efficacy of current techniques in protein structure prediction. In the past four CASPs, I-TASSER has consistently ranked first in the Server section of the competition. [5]. In light of I-TASSER’s accuracy and performance when it comes to protein structure prediction compared to other techniques, our team decided to use I-TASSER to predict LacILOV’s protein structure.

Important Secondary Structures in LacI and LOVII domain

To unravel LacILOV’s ineffective regulation of gene expression under the lac operon, we examined its two components and their structures, the DNA-binding domain of LacI and the LOVII domain. LacI’s function in LacILOV involves its DNA-binding region, the helix-loop-helix motif. When LacI’s HLH motif recognizes its consensus sequence, it makes sequence-specific contacts to the DNA bases in the major and minor grooves of DNA. [1] Otherwise, the HLH motif is intrinsically unstructured. Meanwhile, the “photoswitch" function of LOVII domain arises from structural changes that occur in the C-terminal Jα helix. Upon illumination to blue light, a prominent displacement occurs in the Jα helix, whereas structural changes in the core domain are insignificant. [6] This light-induced unfolding of Jα helix is known to contribute to the overall conformation of the LOVII domain.

Although the crystal structures of LacI and LOVII domain have been solved individually, the protein structure of our fused protein, LacILOV, has not yet been elucidated. Thus, we have decided to begin with predicting its protein structure using I-TASSER. Interestingly, based on our best model of LacILOV (with the highest C-score), we have noticed the formation of a novel α-helix that was spanning at the junction of the two components. We speculated that this particular α-helix was responsible for increasing the affinity of LacILOV for its promoter sequence, and thereby requiring a substantial amount of blue light exposure to release LacILOV from the promoter and to initiate the transcription of the downstream reporter gene.

To explore the predicted protein structure of LacILOV, we used Foldit Standalone which provides an interactive, 3D graphical interface that allows users to examine various biochemical properties of a protein structure, such as energy levels, side chains, and importantly hydrogen-bond interactions. Used for research, Foldit Standalone allows for manipulation of protein structures through configurable visualizations with the use of Rosetta molecular modeling package, and gives access to other Rosetta features such as its energy scoring and sampling functions and support for RosettaScripts. [7]

Results

Using Foldit Standalone, we have identified six residues within the α-helix that were forming H-bonds thereby contributing to the stability of the α-helix. Then, we have generated a series of single point mutations of the selected residues, including substitutions and deletions, and fed through I-TASSER to check if any of these point mutations changed the stability of the α-helix and if we could get any insights on the function of LacILOV. A library of mutant LacILOV sequences can be found in Supplementary Materials.

After assessing different permutations via I-TASSER, we have determined that G58 and T60-T61 are the key residues to the stabilization of the α-helix. When either G58 or T60-T61 was deleted from the LacILOV sequence, I-TASSER predicted a disrupted alpha-helix. Interesting, these deletions have also resulted in a more disordered DNA-binding domain.

data
Figure 2.a: I-TASSER predicted structure of LacILOV. Formation of many α-helical loops in the N-terminus.
data
Figure 2.b: I-TASSER predicted structure of LacILOV with G58 deletion. G58 deletion disrupted α-helical formation.
data
Figure 2.c: I-TASSER predicted structure of LacILOV with T60-T61 deletion. G58 deletion disrupted α-helical formation.

Collectively, we propose that G58 and T60-T61 contribute to the drastic increase in DNA-binding affinity of LacILOV. Our finding suggests that by deleting these key residues, we can engineer a more efficacious LacILOV that is more sensitive to light,, therefore creating an improved light-inducible gene expression system.

Conclusion

Our use of computational modelling allows for the improvement and optimization of LacILOV, as we were able to gather information on which structures and residues are implicated in LacILOV’s function. We used computational tools such as I-TASSER and Foldit Standalone to achieve this. I-TASSER allowed us to predict a reliably accurate protein structure of LacILOV, which we were then able to examine for any significant changes in comparison to its original LacI and LOVII crystal structures that are part of the PDB. Upon identification of a novel α-helix that emerges from the HLH domain of LacI that is fused to LOVII, we used Foldit Standalone to directly manipulate LacILOV’s structure to observe which important residues contribute to the formation of this new α-helix. We then found that residues 58 and 60-61 were specifically implicated, as deletions of these residues disrupted this new α-helix. We now have mutated sequences of LacILOV that can be tested using assays to see whether it results in a better, more sensitive LacILOV that is less stable when bound to the promoter region. Overall, our use of computational tools allowed us to engineer a better version of LacILOV that is optimized for its critical function in our project’s light-activated CRISPR switch.

Future Directions

In the future, with the knowledge of these important residues within LacILOV’s structure, we can modify its current sequence to generate new sequences that have either the deletion of residue 58, or the deletions of residues 60-61. We can then perform Photo -Reporter assays on these mutated sequences and compare the results to the original to validate that these mutations can actually improve sensitivity of LacILOV. We can perform the Photo-Reporter assay to test out whether it is these residues that are critical to LacILOV stability.

Supplementary Materials

Compressed file contains all ITASSER jobs with master spreadsheet, all figures including computational structure of original LacILOV, its G58 deletion and T60-T61 deletion mutants, and a Python script for generating LacILOV mutants.

References

  1. Schumacher MA, Choi KY, Zalkin H, Brennan, RG. 1994. Crystal structure of LacI member, PurR, bound to DNA: minor groove binding by alpha helices. Science. 266(5186): 763-770. doi.org/10.1126/science.7973627
  2. Chaudhury S, Lyskov S, Gray JJ. 2010. PyRosetta: a script-based interface for implementing molecular modelling algorithms using Rosetta. Bioinformatics. 26(5): 689–691. doi.org/10.1093/bioinformatics/btq007
  3. Y Zhang. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 9: 40 (2008). doi: 10.1186/1471-2105-9-40.
  4. J Yang, R Yan, A Roy, D Xu, J Poisson, Y Zhang. The I-TASSER Suite: Protein structure and function prediction. Nature Methods, 12: 7-8 (2015). doi:10.1038/nmeth.3213
  5. Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y. 2015. The I-TASSER Suite: protein structure and function prediction. Nature Methods. 12(1): 7-8. doi.org/10.1038/nmeth.3213
  6. Halavaty AS, Moffat K. 2007. N- and C-terminal flanking regions modulate light-induced signal transduction in the LOV2 domain of the blue light sensor phototropic 1 from Avena sativa. Biochemistry. 46(49): 14001-14009. doi.org/10.1021/bi701543e
  7. Kleffner R, Flatten J, Leaver-Fay A, Baker D, Siegel JB, Khatib F, Cooper S. 2017. Foldit Standalone: a video game-derived protein structure manipulation interface using Rosetta. Bioinformatics. 33(17): 2765-2767. doi.org/10.1093/bioinformatics/btx283