Difference between revisions of "Team:Bielefeld-CeBiTec/Model"

Line 16: Line 16:
 
<div class="bevel tr"></div>
 
<div class="bevel tr"></div>
 
<div class="content">
 
<div class="content">
<h3> Overview </h3>
+
<h3> Short Summary </h3>
 
<article>
 
<article>
As part of our iGEM project, we face the challenge of adapting the tRNA synthetase (aaRS) to non-canonical amino acids. For this purpose we carry out several rounds of a positive-negative selection process in the laboratory as previously described by  Schulz [Liu et al, 2010].
+
As our project explores possibilities of an expanded genetic code via unnatural bases and non-canonical amino acids, we set out to complement our lab work via modeling of novel amino acyl tRNA synthetases (aaRS) for a non-canonical amino acids we synthetized in the lab. In order to incorporate non-canonical amino acids into proteins via the translational process, the aaRS has to attach the amino acid to the respective tRNA. Thus, we designed aaRS sequences which were meant to link our own non-canonical amino acid to a fitting tRNA. As a result, we obtained a couple of sequences of possible aaRS candidates, which we evaluated, based on a ROSETTA score, and ordered via gene synthesis.  
However, due to the rapid development in the field of protein and molecular structure analysis, there has been an increase in the availability of molecular 3D structure data. These data are organized in publicly available databases, which provide a foundation for the modeling and simulation of chemical-biological processes in bioinformatics.
+
In practice, our modeling consisted of the following steps:
For this reason, the practical laboratory work in our project was supplemented by a theoretical approach, involving modeling, simulation and evaluation on the computer. More specifically, our approach was divided into two subprojects, an evolution subproject and an evaluation subproject.
+
Liu, David R., et al. "Engineering a tRNA and aminoacyl-tRNA synthetase for the site-specific incorporation of unnatural amino acids into proteins in vivo." Proceedings of the National Academy of Sciences 94.19 (1997): 10092-10097.
+
 
</article>
 
</article>
<h3>Evolution subproject</h3>
+
<style>
 +
table {
 +
    font-family: arial, sans-serif;
 +
    border-collapse: collapse;
 +
    width: 100%;
 +
}
 +
 
 +
td, th {
 +
    border: 1px solid #dddddd;
 +
    text-align: left;
 +
    padding: 8px;
 +
}
 +
 
 +
tr:nth-child(even) {
 +
    background-color: #dddddd;
 +
}
 +
</style>
 +
</head>
 +
<body>
 +
 
 +
<table>
 +
  <tr>
 +
    <th>Step</th>
 +
    <th>Software/Method</th>
 +
    <th>Meaning</th>
 +
  </tr>
 +
  <tr>
 +
    <td>1. Ligand Preparation</td>
 +
    <td>Manually via Avogadro</td>
 +
    <td>Due to the novelty of our amino acid, no information on the ligand is available in databases. Therefore, all information has to be provided manually and then generate a conformer ensemble, containing for example all energetically useful arrangements of atoms within the molecule.</td>
 +
  </tr>
 +
  <tr>
 +
    <td>2. Scaffold categorization</td>
 +
    <td>ROSETTA protocol</td>
 +
    <td>The scaffold describes the rough layout of the synthetase. We downloaded the scaffold 1j1u, the aaRS of Methalonococcus janischii as a template, and then relaxed its structure to improve the outcome of the ROSETTA algorithm.</td>
 +
  </tr>
 +
  <tr>
 +
    <td>3. Set simulation constrains</td>
 +
    <td>Manually via ROSETTA</td>
 +
    <td>Constrains with regards to possible mutations of the synthetase ensure that the generated sequences fit to the amino acid. For example, we constrained the distance between certain atoms and their angle to a range optimal for hydrogen bonds.</td>
 +
  </tr>
 +
  <tr>
 +
    <td>4. Enzyme Matching</td>
 +
    <td>ROSETTA protocol</td>
 +
    <td>ROSETTA combines information about the ligand and constrains to find possible hydrogen bonding partners and propose the shape of the scaffold within the set constraints.</td>
 +
  </tr>
 +
  <tr>
 +
    <td>5. Enzyme Design</td>
 +
    <td>ROSETTA protocol</td>
 +
    <td>An algorithm uses the information from the previous step and information on the ligand to simulate the mutation process and generate sequences for optimized scaffolds with corresponding scores as measures of fit.</td>
 +
  </tr>
 +
  <tr>
 +
    <td>6. Evaluate results in silico</td>
 +
    <td>Manually</td>
 +
    <td>We evaluate the visual output and the score values and order the sequences with the most promising results via gene synthesis. </td>
 +
  </tr>
 +
  <tr>
 +
  <td>7. Evaluate results in vivo</td>
 +
    <td>Manually</td>
 +
    <td>The synthetases are validated in the lab with the corresponding ncAA via a positive-negative selection system. </td>
 +
  </tr>
 +
</table>
 +
 
 +
<article>
 +
As a result, we obtained a couple of sequences of possible aaRS candidates, which we evaluated, based on a ROSETTA score, and ordered via gene synthesis.
 +
Figure A describes our modeling project as a whole
 +
 
 +
</article>
 +
 
 +
<h3>Introduction</h3>
 +
                <h4>Overview</h4>
 
<article>
 
<article>
In the evolution subproject, the aim is to design a taaRS for the new non-canonical amino acid CBT. For this purpose, the binding pocket must be evolved in a manner that effectively loads the tRNA with the amino acid, thus also specifically recognizing it. As CBT is a large amino acid, we decided to use the tyrosyl-tRNA- synthetase of<i>Methanoccocus jannischii</i> as a template. The usual way to do this in the lab is to generate a library with NNK- scheme primers (link zu den selektionsplasmiden). An important limitation of this method is that a large number of sequences has to be sampled. Consequentially, a large library is needed in order to find a working synthetase. Such extensive libraries are costly, time-consuming to construct, and hard to screen. Using a modeling approach is more cost- and time-efficient, and additionally leads to a better understanding of the function and evolution of the synthetase, as one can examine in which way the evolution affects the protein structure. Rosetta makes it possible to minimize the library by generating a set of most probable candidates for a usable synthetase. This way, the library is much more manageable. Hence, for our project, we want to find suitable synthetases using Rosetta and build the best results in the lab and evaluate them.
+
As part of our iGEM project, we are faced with the challenge of adapting the tRNA synthetase to non-canonical amino acids. For this purpose, modelled possible candidates for synthetases as a preparation for carrying out a positive-negative selection according to Schulz [] in the laboratory.
 +
 
 +
Due to the rapid development in the field of protein and molecular structure analysis, there has been an increase in the availability of molecular 3D structure data. These data are organized in publicly available databases which provide a foundation for the modeling and simulation of chemical-biological processes in bioinformatics. As our non-canonical amino acid has been synthetized by ourselves, no such comprehensive information is available, yet. However, information of similarly structured amino acids can potentially serve as a basis for our modeling.
 +
 
 +
As evaluating an expanded genetic code is a complex task, the practical laboratory work of our project is supplemented by a theoretical approach, involving modeling, simulation, and evaluation on the computerin silico. Specifically, we focused on simulation to designaimed at designing an aaRS tRNA synthetase for the new non-canonical amino acid CBT-ASP. Additionally to CBT, we also simulated the evolution process for the non-canonical amino acid NPA as a validation of our modeling procedure, altough as synthases for this ncAA are known and thus comparable to our in silico result, we can evaluate our modeling procedure. (Vielleicht hier ein wenig schöner) For this purposeOur core challenge was to evolve, the binding pocket must be evolved in a manner which effectively charges the tRNA with the amino acid, thus also recognizing this amino acid specifically.
 +
 
 
</article>
 
</article>
<h3>Evaluation subproject</h3>
+
<h4>Method</h4>
 
<article>
 
<article>
Next to our modeling evolution subproject, we want to establish the classic positive negative selection [Liu et al., 1997] process with the <i>Methanoccocus jannischii</i> tyrosyl tRNAsynthase and the non-canonical amino acid nitrophenylalanine. Therefore, the core of the evaluation project is to compare the tRNA synthetase produced in our laboratory with tRNA synthetases constructed in the past.
+
We used the open-source software "Rosetta" for the main part of our modeling project, which was introduced at the University of Washington by David Baker in 1997, initially in the context of protein structure prediction. Since then, Rosetta has grown to include numerous modules and is currently widely used in research. In our application, we focus on the Rosetta module called the "Rosetta Enzyme Design Protocol"
</article>
+
  
<h3>Methods</h3>
+
              </article>
 +
 
 +
<h3>ROSETTA Enzyme Design</h3>
 +
                <h4>Overview </h4>
 
<article>
 
<article>
In both subprojects,  open-source software "Rosetta" was used, which was introduced at the University of Washington by David Baker in 1997 [Simons et al., 1997], initially in the context of protein structure prediction. As analyzing such structures with NMR or similar methods is very expensive and time consuming, correctly predicting structures by means of computation holds great potential for future research. Since its release, Rosetta has grown to include numerous modules and is currently widely used in research. In our application we focus on two Rosetta modules, called the "Rosetta Ligand Docking Protocol" and the "Rosetta Enzyme Design Protocol", respectively.
+
Since the non-canonical amino acid synthesized in the laboratory is completely novel, there is no corresponding tRNA synthetase which can load the tRNA, yet. For this reason, we use the enzyme design protocol to design the binding pocket in a way that allows it to form an effective and specific enzyme. The protocol consists of two main steps: matching and designing.
As part of the evaluation project, in addition to Rosetta, also was used the software "Modeller" to carry out homology structure predictions.
+
The enzyme design algorithm basically is summarized in Fig. B
Liu, David R., et al. "Engineering a tRNA and aminoacyl-tRNA synthetase for the site-specific incorporation of unnatural amino acids into proteins in vivo." Proceedings of the National Academy of Sciences 94.19 (1997): 10092-10097.
+
Simons, Kim T., et al. "Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions." Journal of molecular biology 268.1 (1997): 209-225.
+
 
</article>
 
</article>
 +
 +
<div class="figure medium">
 +
<img class="figure image" src="https://static.igem.org/mediawiki/2017/8/82/T--Bielefeld-CeBiTec--CDR-flowchartdesign.png">
 +
<p class="figure subtitle"><b>Figure (2): Flowchart Enzym Design Protocol</b><br></p>
 +
</div>
 +
 +
 +
        </div>
 +
 
         </div>
 
         </div>
 
<div class="bevel bl"></div>
 
<div class="bevel bl"></div>
Line 44: Line 126:
 
</div>
 
</div>
  
<div class="contentbox">
 
<div class="bevel tr"></div>
 
<div class="content">
 
<h3> Rosetta Ligand Docking </h3>
 
<h4> Overview </h4>
 
<article>
 
The Ligand Docking protocol describes a way to assess how good small ligands can bind in a binding pocket of a big protein, such as an enzyme [Meiler et al., 2006]. Tthe Rosetta Ligand Docking protocol was applied in two ways. First,  the lab selection was evaluated(link zu selektionsplasmiden), so that we couldevaluate the specifity of our tRNA synthase and compare the results with the literature. In particular, we have seven aaRS sequences with different mutations that need to be ranked. The aaRS produced in our laboratory is then evaluated on the basis of this ranking.
 
The second application of ligand docking is to raise the specificity of the aaRS that we create with the Enzyme Design protocol. After creating aaRS`s with that protocol, we will evaluate the best results with the Ligand Docking protocol, thereby limiting the number of possible candidates.
 
</article>
 
<h4> Methods</h4>
 
<article>
 
The Rosetta ligand docking protocol is useful if small ligands or drugs have to be docked in a random binding site. However, the investigation of protein-protein or protein-peptide interactions should be done via the  protein-protein docking protocol[Gray et al.]. This allows for example to simulate antigen antibody interactions.
 
  
The Rosetta ligand docking protocol expects specifically  prepared inputs, in particular the amino acid ligand and the protein backbone need to be provided in a standardized format.
 
The ligand needs to be specified in the MOL, MOL2- or SDF-file format, respectively. Such a description can be retrieved from a PDB file automatically. Therefore, the limited availability of PDB files for specific molecules is a major drawback. This conversion process usually  augments the data with hydrogen atoms that are typically missing from the PDB file. Alternatively, the ligand can be designed using SMILES or manually using tools such as Avogadro. In the next step, the SDF or MOL(2) file is used to create a conformer ensemble is in turn used to generate a Rosetta parameter file. In addition to the specific names of all atoms present in the ligand, this parameter file also stores all bonds between the individual atoms, including the binding angles and binding distances.
 
The aaRS backbone is available in PDB via the identifier “1j1u” and obtained in the standard PDB file format. For the ligand docking, an additional relaxation step is required as a preprocessing step. This has already been performed by Florian Richter, who kindly provided the resulting relaxed 3D structure to us.
 
To reduce the time needed searching, the chain with the binding site (in our case chain ‘A’) is selected and the original ligand (tyrosyl) is deleted.
 
The Rosetta ligand docking process is configured in an XML file. Examples of such configurations can be found in []. An essential parameter is the starting position of the ligand in three-dimensional space. For this purpose we used http://ligasite.org/index.php?intro to obtain the position of the original Tyrosyl ligand, but it is sufficient if the ligand is placed in a radius of 10 A around the backbone, within that the starting position is chosen at random.
 
Finally, Rosetta execution is started in a shell, providing additional parameters on the command line or in an ‘options’ file.
 
</article>
 
 
 
<h4>Algorithm</h4>
+
<h4>Matching Step</h4>
 
<article>
 
<article>
The ligand docking algorithm basically consists of the following steps:
 
<ul>
 
<ol>1) starting position is chosen randomly or defined an .xml file</ol>
 
<ol>2) placement of the ligand is modified by a random translation of a distance of 0.1 A in each direction and 0.05° around each axis</ol>
 
<ol>3) rigid body orientation and side-chain angles of the ligand are optimized using the gradient based Davidson–Fletcher–Powell algorithm. Afterwards,  the corresponding energy function is calculate daccording to the Monte-Carlo method.
 
P= min (1, exp(-(E<sub>start</sub>-E<sub>final</sub>)/kT). This move is accepted if the energy function decreases.</ol>
 
</ul>
 
To find the optimal binding position, steps two and three have to be repeated 50 times.
 
This protocol has to be repeated N times.
 
N is depending on the size of the ligand, its flexibility (and therefore the size of the conformational ligand ensemble), and the binding site between 1000 and 5000.
 
The process is summarized in Fig.
 
  
 +
The meaning of the matching step is to match the amino acids which constrains to the ligand, following specific constrains which ensure that the result is sensible and feasible. For this, ROSETTA analyzes the structural formula of the non- canonical amino acid and offers the possible hydrogen binding partners. </br>
 +
Matching step inputs </br>
 +
For the matching step, the following input-files are needed:
 +
<ul>
 +
  <li>a “.params”-file specifying information about the ligand</li>
 +
  <li>a “.pdb”-file providing a rough scaffold layout</li>
 +
  <li>a “.cst”-file to define the bindings between ligand and scaffold</li>
 +
  <li>a “.pos”-file to define the positions of the amino acids of the scaffold </li>
 +
  <li>a “.flags” file to control all inputs. These files are necessary, as they describe the ligand and backbone and specify the parameters of the algorithm </li>
 +
</ul>
 +
To read about each file in further detail, please click the technical detail button below: </br>
 +
TECHNICAL DETAILS COMING SOON </br>
 +
Matching step outputs </br>
 +
The output generated in the matching step is the layout of the scaffold as well as one or more states of the amino acid which enable interaction with the ligand. This information is stored as a “.pdb” file and becomes part of the input for the design step. </br>
 +
Our results for this step </br>
 +
We used the “1j1u”-scaffold from PDB for our matching step. The “1j1u.pdb”-file contains the Tyrosyl-tRNA-synthetase, which is labeld under “Chain A”, the orthogonol tRNA under “Chain B” and the natural ligand Tyrosyl. For our project, we deleted the natural ligand and “Chain B”, because it was not neccerary to change their structure or sequence and it was a way to save computer time and power.
  
 
+
We designed the ligands manually by usingin Avogadro, and for the .cst-file, we choose the default matching algorithm for simulations of both amino acids.  
<div class="figure medium">
+
<img class="figure image" src="https://static.igem.org/mediawiki/2017/thumb/3/37/T--Bielefeld-CeBiTec--CDR-flowchartligand.png/538px-T--Bielefeld-CeBiTec--CDR-flowchartligand.png">
+
<p class="figure subtitle"><b>Figure (1): Flowchart Ligand Docking Protocol</b><br></p>
+
</div>
+
 
+
Meiler, Jens, and David Baker. "ROSETTALIGAND: Protein–small molecule docking with full side‐chain flexibility." Proteins: Structure, Function, and Bioinformatics 65.3 (2006): 538-548.
+
Gray, Jeffrey J., et al. "Protein–protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations." Journal of molecular biology 331.1 (2003): 281-299.
+
 
</article>
 
</article>
        </div>
+
<h4>Design Step </h4>
<div class="bevel bl"></div>
+
</div>
+
 
+
<div class="contentbox">
+
<div class="bevel tr"></div>
+
<div class="content">
+
<h3>Modeller</h3>
+
<h4>Overview</h4>
+
<article>
+
Within the scope of the evaluation project, 3D structures of the mutated tyrosyl aaRS of <i>Methanococus jannischii</i> are required. Unfortunately, there are no x-ray structure data available in the literature. Only sequences of seven aaRS`s which have been evolved on nitrophenylalanines were previously described [Peters et al.]. In order to predict the corresponding homology structures, we use the software “Modeller”.
+
</article>
+
<h4>Method</h4>
+
 
<article>
 
<article>
TODO-- noch nicht ganz fertig daher kein text
 
Peters, Francis B., et al. "Photocleavage of the polypeptide backbone by 2-nitrophenylalanine." Chemistry & biology 16.2 (2009): 148-152.
 
</article>
 
        </div>
 
<div class="bevel bl"></div>
 
</div>
 
  
<div class="contentbox">
+
The design step applies an algorithm such that the binding pocket and the near environment are mutated and the remaining scaffold is repacked. Additionally, a badness-of-fit score is generated which indicates how well the mutation fits the amino acid. For every file from the matching step, a model with a score and a “.pdb-file” will be generated, specifying where the sequence can be located, and the 3D-structure can be analyzed. Notably, the amino acid structure can be extracted separately.
<div class="bevel tr"></div>
+
The following section describes the structure of the design step. For further details on each step, click the technical details button. </br>
<div class="content">
+
1. Optimizing the catalytic interactions </br>
<h3>Rosetta EnzymeDesign</h3>
+
For the first alternative, the file can be generated either by the Rosetta standard or a manually created .”res”- file. For more details, we refer to the Rosetta documentation. (link:https://www.rosettacommons.org/manuals/archive/rosetta3.5_user_guide/d1/d97/resfiles.html) </br>
<h4>Overview</h4>
+
[TECHNICAL DETAILS COMING SOON] </br>
<article>
+
2. Cycles of sequence design and minimazation within constrains </br>
For our project, we have to create a new aaRS for the non-canonical amino acid CBT. As this amino acid is synthesized for the first time, there is currently no suitable tRNA synthetase available to charge the tRNAs. Therefore, we applied the Enzyme Design Protocol in order to design the binding site of the synthetase in a way that allows it to form an effective and specific enzyme.  
+
To optimize the structure we used applied an iterative optimization algorithm. This algorithm mutates all residues from the backbone, which are not part of the catalytic center, to alanine, and a small energy function refraction will place the ligand in an optimal position to the backbone. </br>
 +
[TECHNICAL DETAILS COMING SOON] </br>
 +
Design step inputs </br>
 +
The following input files are relevant for the design procedure:
 +
<ul>
 +
  <li>“.pdb”-file generated in the matching step</li>
 +
  <li>“.cst”-file for the ligand</li>
 +
  <li>“.params”-file for the ligand and the scaffold </li>
 +
  <li>“.flags” to coordinate the inputs</li>
 +
</ul>  
 +
For further information on these files, please refer to step 2 above. </br>
 +
Design step outputs </br>
 +
The output for the design step is a “.pdb”-file containing the mutated scaffold and a “.score”-file.
 +
For every PDB-file, a line in the score-file is generated, so it is easy to evaluate the given structure.
 +
The first score in the file is the total score of the model. After that, the number of hydrogen bonds in the protein as a whole and in the constraints is listed, followed by the number of dismissed polars in the catalytic residues as well in the whole protein and in the constraints.
 +
See the technical details below for a full overview of the output information </br>
 +
[TECHNICAL DETAILS COMING SOON] </br>
 +
We choose our synthetases because of a good total score and a good ligand score. We checked the corresponding PDB-files, and rated the ligand and the binding pocket as satisfying, so that the ligand assumedly does not collide with residues in the near environment.
 +
The total scores for CBT are not as good as the scores for NPA. However, the ligand scores are acceptable in both cases. A visual evaluation confirms that the ligand fits into the binding pocket. </br>
 +
Our results for this step </br>
 +
We used this algrithm to simulate the evolution of the tyrosyl-tRNA with the amino acids Nitrophenylalanine and CBT-ASP. </br>
 +
NPA simulation: </br>
 +
We created one .cst-file-block for the nitrogroup of NPA. Since there are two oxygen-atoms in the nitrogroup, we defined two atom nametags. As several possibilities are useful, we defined two possible constraint partners for the hydrogen bonds. The first is asparagine (N) or glutamine (Q) and the second is glycine (G). We set the possible distance to 2.8 A, as it is the optimal distance for hydrogenbonds, and a tolerance level of 0.5 A. We set the angles to 120° with a tolerance of 40°, as recommended by Florian Richter during our talk in cologne. The torsion angles were set to 180° with a tolerance of 180° and a penalty of 0, such that the torsion angles can rotate completely freely. </br>
 +
CBT-ASP simulation: </br>
 +
CBT-ASP can build hydrogen bonds in two ways. The first is a weak hydrogen bond on the sulphur atom and the other possibility is a normal hydrogen bond on the nitrogen (N2) after the C-gamma. We wrote three cst-files, one for a possible bond with sulpur, one for a possible bond with nitrogen, and one for both bonds. As possible corresponding amino acids, we chose serine, threonine, tyrosine, asparagine, glutamine, and glycine. </br>
 +
[TECHNICAL DETAILS COMING SOON]
 +
<h3> Results </h3>
 +
<h4> Results in silico </h4>
 +
We used this algrithm to simulate the evolution of the tyrosyl-tRNA with the amino acids Nitrophenylalanine and CBT-ASP
 +
We obtained 13 synthetase sequences for CBT-ASP, and 43 sequences for NPA, which fit well into the binding site according to the ROSETTA score.
 +
<h4> Results in vivo </h4>
 +
RESULTS COMING SOON
 
</article>
 
</article>
<h4>Method</h4>
 
<article>
 
The ligand needs to be specified in the MOL, MOL2- or SDF-file format. Such a description can be obtained automatically by converting the relevant information from a PDB file if available. This conversion process usually also involves augmenting the data with hydrogen atoms that are typically missing from the PDB file. Alternatively, the ligand can be designed using SMILES or manually using tools such as Avogadro. In the next step, the SDF or MOL(2) file is used to create a conformer ensemble that is used to create a Rosetta parameter file. In addition to the specific names of all atoms present in the ligand, this parameter file also stores all bonds between the individual atoms, including the binding angles and binding distances.
 
- cst-file TODO noch in der Schwebe daher kein Text
 
 
The enzyme design algorithm
 
The enzyme design algorithm basically is summarized in Fig.
 
</article>
 
<div class="figure medium">
 
<img class="figure image" src="https://static.igem.org/mediawiki/2017/8/82/T--Bielefeld-CeBiTec--CDR-flowchartdesign.png">
 
<p class="figure subtitle"><b>Figure (2): Flowchart Enzym Design Protocol</b><br></p>
 
</div>
 
 
  
 
         </div>
 
         </div>

Revision as of 22:52, 29 October 2017

Modeling

Short Summary

As our project explores possibilities of an expanded genetic code via unnatural bases and non-canonical amino acids, we set out to complement our lab work via modeling of novel amino acyl tRNA synthetases (aaRS) for a non-canonical amino acids we synthetized in the lab. In order to incorporate non-canonical amino acids into proteins via the translational process, the aaRS has to attach the amino acid to the respective tRNA. Thus, we designed aaRS sequences which were meant to link our own non-canonical amino acid to a fitting tRNA. As a result, we obtained a couple of sequences of possible aaRS candidates, which we evaluated, based on a ROSETTA score, and ordered via gene synthesis. In practice, our modeling consisted of the following steps:
Step Software/Method Meaning
1. Ligand Preparation Manually via Avogadro Due to the novelty of our amino acid, no information on the ligand is available in databases. Therefore, all information has to be provided manually and then generate a conformer ensemble, containing for example all energetically useful arrangements of atoms within the molecule.
2. Scaffold categorization ROSETTA protocol The scaffold describes the rough layout of the synthetase. We downloaded the scaffold 1j1u, the aaRS of Methalonococcus janischii as a template, and then relaxed its structure to improve the outcome of the ROSETTA algorithm.
3. Set simulation constrains Manually via ROSETTA Constrains with regards to possible mutations of the synthetase ensure that the generated sequences fit to the amino acid. For example, we constrained the distance between certain atoms and their angle to a range optimal for hydrogen bonds.
4. Enzyme Matching ROSETTA protocol ROSETTA combines information about the ligand and constrains to find possible hydrogen bonding partners and propose the shape of the scaffold within the set constraints.
5. Enzyme Design ROSETTA protocol An algorithm uses the information from the previous step and information on the ligand to simulate the mutation process and generate sequences for optimized scaffolds with corresponding scores as measures of fit.
6. Evaluate results in silico Manually We evaluate the visual output and the score values and order the sequences with the most promising results via gene synthesis.
7. Evaluate results in vivo Manually The synthetases are validated in the lab with the corresponding ncAA via a positive-negative selection system.
As a result, we obtained a couple of sequences of possible aaRS candidates, which we evaluated, based on a ROSETTA score, and ordered via gene synthesis. Figure A describes our modeling project as a whole

Introduction

Overview

As part of our iGEM project, we are faced with the challenge of adapting the tRNA synthetase to non-canonical amino acids. For this purpose, modelled possible candidates for synthetases as a preparation for carrying out a positive-negative selection according to Schulz [] in the laboratory. Due to the rapid development in the field of protein and molecular structure analysis, there has been an increase in the availability of molecular 3D structure data. These data are organized in publicly available databases which provide a foundation for the modeling and simulation of chemical-biological processes in bioinformatics. As our non-canonical amino acid has been synthetized by ourselves, no such comprehensive information is available, yet. However, information of similarly structured amino acids can potentially serve as a basis for our modeling. As evaluating an expanded genetic code is a complex task, the practical laboratory work of our project is supplemented by a theoretical approach, involving modeling, simulation, and evaluation on the computerin silico. Specifically, we focused on simulation to designaimed at designing an aaRS tRNA synthetase for the new non-canonical amino acid CBT-ASP. Additionally to CBT, we also simulated the evolution process for the non-canonical amino acid NPA as a validation of our modeling procedure, altough as synthases for this ncAA are known and thus comparable to our in silico result, we can evaluate our modeling procedure. (Vielleicht hier ein wenig schöner) For this purposeOur core challenge was to evolve, the binding pocket must be evolved in a manner which effectively charges the tRNA with the amino acid, thus also recognizing this amino acid specifically.

Method

We used the open-source software "Rosetta" for the main part of our modeling project, which was introduced at the University of Washington by David Baker in 1997, initially in the context of protein structure prediction. Since then, Rosetta has grown to include numerous modules and is currently widely used in research. In our application, we focus on the Rosetta module called the "Rosetta Enzyme Design Protocol"

ROSETTA Enzyme Design

Overview

Since the non-canonical amino acid synthesized in the laboratory is completely novel, there is no corresponding tRNA synthetase which can load the tRNA, yet. For this reason, we use the enzyme design protocol to design the binding pocket in a way that allows it to form an effective and specific enzyme. The protocol consists of two main steps: matching and designing. The enzyme design algorithm basically is summarized in Fig. B

Figure (2): Flowchart Enzym Design Protocol

Matching Step

The meaning of the matching step is to match the amino acids which constrains to the ligand, following specific constrains which ensure that the result is sensible and feasible. For this, ROSETTA analyzes the structural formula of the non- canonical amino acid and offers the possible hydrogen binding partners.
Matching step inputs
For the matching step, the following input-files are needed:
  • a “.params”-file specifying information about the ligand
  • a “.pdb”-file providing a rough scaffold layout
  • a “.cst”-file to define the bindings between ligand and scaffold
  • a “.pos”-file to define the positions of the amino acids of the scaffold
  • a “.flags” file to control all inputs. These files are necessary, as they describe the ligand and backbone and specify the parameters of the algorithm
To read about each file in further detail, please click the technical detail button below:
TECHNICAL DETAILS COMING SOON
Matching step outputs
The output generated in the matching step is the layout of the scaffold as well as one or more states of the amino acid which enable interaction with the ligand. This information is stored as a “.pdb” file and becomes part of the input for the design step.
Our results for this step
We used the “1j1u”-scaffold from PDB for our matching step. The “1j1u.pdb”-file contains the Tyrosyl-tRNA-synthetase, which is labeld under “Chain A”, the orthogonol tRNA under “Chain B” and the natural ligand Tyrosyl. For our project, we deleted the natural ligand and “Chain B”, because it was not neccerary to change their structure or sequence and it was a way to save computer time and power. We designed the ligands manually by usingin Avogadro, and for the .cst-file, we choose the default matching algorithm for simulations of both amino acids.

Design Step

The design step applies an algorithm such that the binding pocket and the near environment are mutated and the remaining scaffold is repacked. Additionally, a badness-of-fit score is generated which indicates how well the mutation fits the amino acid. For every file from the matching step, a model with a score and a “.pdb-file” will be generated, specifying where the sequence can be located, and the 3D-structure can be analyzed. Notably, the amino acid structure can be extracted separately. The following section describes the structure of the design step. For further details on each step, click the technical details button.
1. Optimizing the catalytic interactions
For the first alternative, the file can be generated either by the Rosetta standard or a manually created .”res”- file. For more details, we refer to the Rosetta documentation. (link:https://www.rosettacommons.org/manuals/archive/rosetta3.5_user_guide/d1/d97/resfiles.html)
[TECHNICAL DETAILS COMING SOON]
2. Cycles of sequence design and minimazation within constrains
To optimize the structure we used applied an iterative optimization algorithm. This algorithm mutates all residues from the backbone, which are not part of the catalytic center, to alanine, and a small energy function refraction will place the ligand in an optimal position to the backbone.
[TECHNICAL DETAILS COMING SOON]
Design step inputs
The following input files are relevant for the design procedure:
  • “.pdb”-file generated in the matching step
  • “.cst”-file for the ligand
  • “.params”-file for the ligand and the scaffold
  • “.flags” to coordinate the inputs
For further information on these files, please refer to step 2 above.
Design step outputs
The output for the design step is a “.pdb”-file containing the mutated scaffold and a “.score”-file. For every PDB-file, a line in the score-file is generated, so it is easy to evaluate the given structure. The first score in the file is the total score of the model. After that, the number of hydrogen bonds in the protein as a whole and in the constraints is listed, followed by the number of dismissed polars in the catalytic residues as well in the whole protein and in the constraints. See the technical details below for a full overview of the output information
[TECHNICAL DETAILS COMING SOON]
We choose our synthetases because of a good total score and a good ligand score. We checked the corresponding PDB-files, and rated the ligand and the binding pocket as satisfying, so that the ligand assumedly does not collide with residues in the near environment. The total scores for CBT are not as good as the scores for NPA. However, the ligand scores are acceptable in both cases. A visual evaluation confirms that the ligand fits into the binding pocket.
Our results for this step
We used this algrithm to simulate the evolution of the tyrosyl-tRNA with the amino acids Nitrophenylalanine and CBT-ASP.
NPA simulation:
We created one .cst-file-block for the nitrogroup of NPA. Since there are two oxygen-atoms in the nitrogroup, we defined two atom nametags. As several possibilities are useful, we defined two possible constraint partners for the hydrogen bonds. The first is asparagine (N) or glutamine (Q) and the second is glycine (G). We set the possible distance to 2.8 A, as it is the optimal distance for hydrogenbonds, and a tolerance level of 0.5 A. We set the angles to 120° with a tolerance of 40°, as recommended by Florian Richter during our talk in cologne. The torsion angles were set to 180° with a tolerance of 180° and a penalty of 0, such that the torsion angles can rotate completely freely.
CBT-ASP simulation:
CBT-ASP can build hydrogen bonds in two ways. The first is a weak hydrogen bond on the sulphur atom and the other possibility is a normal hydrogen bond on the nitrogen (N2) after the C-gamma. We wrote three cst-files, one for a possible bond with sulpur, one for a possible bond with nitrogen, and one for both bonds. As possible corresponding amino acids, we chose serine, threonine, tyrosine, asparagine, glutamine, and glycine.
[TECHNICAL DETAILS COMING SOON]

Results

Results in silico

We used this algrithm to simulate the evolution of the tyrosyl-tRNA with the amino acids Nitrophenylalanine and CBT-ASP We obtained 13 synthetase sequences for CBT-ASP, and 43 sequences for NPA, which fit well into the binding site according to the ROSETTA score.

Results in vivo

RESULTS COMING SOON