As part of our iGEM project, we face the challenge of adapting the tRNA synthetase (aaRS) to non-canonical amino acids. For this purpose we carry out several rounds of a positive-negative selection process in the laboratory as previously described by Schulz [Liu et al, 2010].
However, due to the rapid development in the field of protein and molecular structure analysis, there has been an increase in the availability of molecular 3D structure data. These data are organized in publicly available databases, which provide a foundation for the modeling and simulation of chemical-biological processes in bioinformatics.
For this reason, the practical laboratory work in our project was supplemented by a theoretical approach, involving modeling, simulation and evaluation on the computer. More specifically, our approach was divided into two subprojects, an evolution subproject and an evaluation subproject.
Liu, David R., et al. "Engineering a tRNA and aminoacyl-tRNA synthetase for the site-specific incorporation of unnatural amino acids into proteins in vivo." Proceedings of the National Academy of Sciences 94.19 (1997): 10092-10097.
Evolution subproject
In the evolution subproject, the aim is to design a taaRS for the new non-canonical amino acid CBT. For this purpose, the binding pocket must be evolved in a manner that effectively loads the tRNA with the amino acid, thus also specifically recognizing it. As CBT is a large amino acid, we decided to use the tyrosyl-tRNA- synthetase ofMethanoccocus jannischii as a template. The usual way to do this in the lab is to generate a library with NNK- scheme primers (link zu den selektionsplasmiden). An important limitation of this method is that a large number of sequences has to be sampled. Consequentially, a large library is needed in order to find a working synthetase. Such extensive libraries are costly, time-consuming to construct, and hard to screen. Using a modeling approach is more cost- and time-efficient, and additionally leads to a better understanding of the function and evolution of the synthetase, as one can examine in which way the evolution affects the protein structure. Rosetta makes it possible to minimize the library by generating a set of most probable candidates for a usable synthetase. This way, the library is much more manageable. Hence, for our project, we want to find suitable synthetases using Rosetta and build the best results in the lab and evaluate them.
Evaluation subproject
Next to our modeling evolution subproject, we want to establish the classic positive negative selection [Liu et al., 1997] process with the Methanoccocus jannischii tyrosyl tRNAsynthase and the non-canonical amino acid nitrophenylalanine. Therefore, the core of the evaluation project is to compare the tRNA synthetase produced in our laboratory with tRNA synthetases constructed in the past.
Methods
In both subprojects, open-source software "Rosetta" was used, which was introduced at the University of Washington by David Baker in 1997 [Simons et al., 1997], initially in the context of protein structure prediction. As analyzing such structures with NMR or similar methods is very expensive and time consuming, correctly predicting structures by means of computation holds great potential for future research. Since its release, Rosetta has grown to include numerous modules and is currently widely used in research. In our application we focus on two Rosetta modules, called the "Rosetta Ligand Docking Protocol" and the "Rosetta Enzyme Design Protocol", respectively.
As part of the evaluation project, in addition to Rosetta, also was used the software "Modeller" to carry out homology structure predictions.
Liu, David R., et al. "Engineering a tRNA and aminoacyl-tRNA synthetase for the site-specific incorporation of unnatural amino acids into proteins in vivo." Proceedings of the National Academy of Sciences 94.19 (1997): 10092-10097.
Simons, Kim T., et al. "Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions." Journal of molecular biology 268.1 (1997): 209-225.
Rosetta Ligand Docking
Overview
The Ligand Docking protocol describes a way to assess how good small ligands can bind in a binding pocket of a big protein, such as an enzyme [Meiler et al., 2006]. Tthe Rosetta Ligand Docking protocol was applied in two ways. First, the lab selection was evaluated(link zu selektionsplasmiden), so that we couldevaluate the specifity of our tRNA synthase and compare the results with the literature. In particular, we have seven aaRS sequences with different mutations that need to be ranked. The aaRS produced in our laboratory is then evaluated on the basis of this ranking.
The second application of ligand docking is to raise the specificity of the aaRS that we create with the Enzyme Design protocol. After creating aaRS`s with that protocol, we will evaluate the best results with the Ligand Docking protocol, thereby limiting the number of possible candidates.
Methods
The Rosetta ligand docking protocol is useful if small ligands or drugs have to be docked in a random binding site. However, the investigation of protein-protein or protein-peptide interactions should be done via the protein-protein docking protocol[Gray et al.]. This allows for example to simulate antigen antibody interactions.
The Rosetta ligand docking protocol expects specifically prepared inputs, in particular the amino acid ligand and the protein backbone need to be provided in a standardized format.
The ligand needs to be specified in the MOL, MOL2- or SDF-file format, respectively. Such a description can be retrieved from a PDB file automatically. Therefore, the limited availability of PDB files for specific molecules is a major drawback. This conversion process usually augments the data with hydrogen atoms that are typically missing from the PDB file. Alternatively, the ligand can be designed using SMILES or manually using tools such as Avogadro. In the next step, the SDF or MOL(2) file is used to create a conformer ensemble is in turn used to generate a Rosetta parameter file. In addition to the specific names of all atoms present in the ligand, this parameter file also stores all bonds between the individual atoms, including the binding angles and binding distances.
The aaRS backbone is available in PDB via the identifier “1j1u” and obtained in the standard PDB file format. For the ligand docking, an additional relaxation step is required as a preprocessing step. This has already been performed by Florian Richter, who kindly provided the resulting relaxed 3D structure to us.
To reduce the time needed searching, the chain with the binding site (in our case chain ‘A’) is selected and the original ligand (tyrosyl) is deleted.
The Rosetta ligand docking process is configured in an XML file. Examples of such configurations can be found in []. An essential parameter is the starting position of the ligand in three-dimensional space. For this purpose we used http://ligasite.org/index.php?intro to obtain the position of the original Tyrosyl ligand, but it is sufficient if the ligand is placed in a radius of 10 A around the backbone, within that the starting position is chosen at random.
Finally, Rosetta execution is started in a shell, providing additional parameters on the command line or in an ‘options’ file.
Algorithm
The ligand docking algorithm basically consists of the following steps:
1) starting position is chosen randomly or defined an .xml file2) placement of the ligand is modified by a random translation of a distance of 0.1 A in each direction and 0.05° around each axis3) rigid body orientation and side-chain angles of the ligand are optimized using the gradient based Davidson–Fletcher–Powell algorithm. Afterwards, the corresponding energy function is calculate daccording to the Monte-Carlo method.
P= min (1, exp(-(Estart-Efinal)/kT). This move is accepted if the energy function decreases.
To find the optimal binding position, steps two and three have to be repeated 50 times.
This protocol has to be repeated N times.
N is depending on the size of the ligand, its flexibility (and therefore the size of the conformational ligand ensemble), and the binding site between 1000 and 5000.
The process is summarized in Fig.
Meiler, Jens, and David Baker. "ROSETTALIGAND: Protein–small molecule docking with full side‐chain flexibility." Proteins: Structure, Function, and Bioinformatics 65.3 (2006): 538-548.
Gray, Jeffrey J., et al. "Protein–protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations." Journal of molecular biology 331.1 (2003): 281-299.
Modeller
Overview
Within the scope of the evaluation project, 3D structures of the mutated tyrosyl aaRS of Methanococus jannischii are required. Unfortunately, there are no x-ray structure data available in the literature. Only sequences of seven aaRS`s which have been evolved on nitrophenylalanines were previously described [Peters et al.]. In order to predict the corresponding homology structures, we use the software “Modeller”.
Method
TODO-- noch nicht ganz fertig daher kein text
Peters, Francis B., et al. "Photocleavage of the polypeptide backbone by 2-nitrophenylalanine." Chemistry & biology 16.2 (2009): 148-152.
Rosetta EnzymeDesign
Overview
For our project, we have to create a new aaRS for the non-canonical amino acid CBT. As this amino acid is synthesized for the first time, there is currently no suitable tRNA synthetase available to charge the tRNAs. Therefore, we applied the Enzyme Design Protocol in order to design the binding site of the synthetase in a way that allows it to form an effective and specific enzyme.
Method
The ligand needs to be specified in the MOL, MOL2- or SDF-file format. Such a description can be obtained automatically by converting the relevant information from a PDB file if available. This conversion process usually also involves augmenting the data with hydrogen atoms that are typically missing from the PDB file. Alternatively, the ligand can be designed using SMILES or manually using tools such as Avogadro. In the next step, the SDF or MOL(2) file is used to create a conformer ensemble that is used to create a Rosetta parameter file. In addition to the specific names of all atoms present in the ligand, this parameter file also stores all bonds between the individual atoms, including the binding angles and binding distances.
- cst-file TODO noch in der Schwebe daher kein Text
The enzyme design algorithm
The enzyme design algorithm basically is summarized in Fig.