Difference between revisions of "Team:Bielefeld-CeBiTec/Model"

m (navbar active bugfix)
m
 
(71 intermediate revisions by 7 users not shown)
Line 4: Line 4:
 
<body>
 
<body>
 
<div class="container">
 
<div class="container">
 +
<div id="title" style="background-image: url(https://static.igem.org/mediawiki/2017/9/9c/T--Bielefeld-CeBiTec--title-img-modeling.jpg);">
 +
<img src="https://static.igem.org/mediawiki/2017/9/9c/T--Bielefeld-CeBiTec--title-img-modeling.jpg">
 +
<div id="title-bg">
 +
<div id="title-text">
 +
Modeling
 +
</div>
 +
</div>
 +
</div>
 
<div class="contentbox">
 
<div class="contentbox">
 +
<div class="borderbox">
 
<div class="bevel tr"></div>
 
<div class="bevel tr"></div>
 
<div class="content">
 
<div class="content">
<h1> Modeling </h1>
+
<h3> Organization of our modeling projects </h3>
        </div>
+
<div class="article">
<div class="bevel bl"></div>
+
On this page, we describe our main modeling project, which was integral for our whole project.
</div>
+
  
<div class="container">
+
However, besides this complex modeling, we also conducted and applied several straight-forward stochastic and statistical models
 +
 
 +
to support and guide numerous steps of our laboratory work. Some of these modeling projects are briefly described in this box; however,
 +
we recommend reading the linked pages for further information. </br>
 +
 
 +
</div>
 +
<h4>Discriminant function model for the ICG prediction: </h4>
 +
<div class="article">
 +
We conducted a discriminant function analysis for the classification of nucleotides in Oxford Nanopore sequencing reads
 +
at a specific position. This model is part of our <a target="_blank"href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software#iCG">iCG</a> software module and enabled the successful detection of unnatural bases.
 +
</div>
 +
<h4>Calculation of the required library size for the selection system: </h4>
 +
<div class="article">
 +
We applied combinatorics and statistics to calculate the optimal library size for the <a target="_blank"href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Results/translational_system/library_and_selection">tRNA synthetase selection process.</a>
 +
This was a trade off between putting lots of efforts into constructing a very huge library and missing diversity in a too small library.
 +
Therefore, we predicted the optimal library size. Experimental validation of this prediction was done via MiSeq analysis of the diversity
 +
of a subset of this library.
 +
</div>
 +
<h4>Strength prediction for a transcription signal amplification system <a target="_blank"href="http://parts.igem.org/Part:BBa_K2201373">(BBa_K2201373)</a>: <!--den part gibts nicht --> </h4>
 +
<div class="article">
 +
We modeled and visually compared the mRFP production over time for a normal mRFP reporter system and compared it to our <a target="_blank"href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Composite_Part">enhanced signaling.</a>.
 +
Validity was done as part of the positive selection process for an adapted tRNA synthetase.
 +
</div>
 +
</div>
 +
</div>
 +
<div class="bevel bl"></div>
 +
</div>
 
<div class="contentbox">
 
<div class="contentbox">
 
<div class="bevel tr"></div>
 
<div class="bevel tr"></div>
 
<div class="content">
 
<div class="content">
<h2> Overview </h2>
+
<h3> Short Summary </h3>
<article>
+
<div class="article">
As part of our iGEM project, we face the challenge of adapting the tRNA synthetase (aaRS) to non-canonical amino acids. For this purpose we carry out several rounds of a positive-negative selection process in the laboratory as previously described by  Schulz [Liu et al, 2010].
+
As our project explores possibilities of an expanded genetic code via unnatural bases and non-canonical amino acids,  
However, due to the rapid development in the field of protein and molecular structure analysis, there has been an increase in the availability of molecular 3D structure data. These data are organized in publicly available databases, which provide a foundation for the modeling and simulation of chemical-biological processes in bioinformatics.
+
we set out to complement and improve our lab work via modeling of novel amino acyl tRNA synthetases (aaRS) for a non-canonical amino  
For this reason, the practical laboratory work in our project was supplemented by a theoretical approach, involving modeling, simulation and evaluation on the computer. More specifically, our approach was divided into two subprojects, an evolution subproject and an evaluation subproject.
+
acids, which were synthetized in our lab. In order to incorporate non-canonical amino acids into proteins via the translational
Liu, David R., et al. "Engineering a tRNA and aminoacyl-tRNA synthetase for the site-specific incorporation of unnatural amino acids into proteins in vivo." Proceedings of the National Academy of Sciences 94.19 (1997): 10092-10097.
+
process, the aaRS has to attach the amino acid to the respective tRNA. Thus, we designed aaRS sequences which were adjusted
</article>
+
to link our own non-canonical amino acid to a fitting tRNA. Candidates were evaluated and selected, based on a ROSETTA score.
<h2>Evolution subproject</h2>
+
Most promising sequences were ordered via gene synthesis for the experimental validation.  
<article>
+
Figure 1 provides a rough overview of our modeling project. Table 1 below summarizes the realization in practice.
In the evolution subproject, the aim is to design a taaRS for the new non-canonical amino acid CBT. For this purpose, the binding pocket must be evolved in a manner that effectively loads the tRNA with the amino acid, thus also specifically recognizing it. As CBT is a large amino acid, we decided to use the tyrosyl-tRNA- synthetase of<i>Methanoccocus jannischii</i> as a template. The usual way to do this in the lab is to generate a library with NNK- scheme primers (link zu den selektionsplasmiden). An important limitation of this method is that a large number of sequences has to be sampled. Consequentially, a large library is needed in order to find a working synthetase. Such extensive libraries are costly, time-consuming to construct, and hard to screen. Using a modeling approach is more cost- and time-efficient, and additionally leads to a better understanding of the function and evolution of the synthetase, as one can examine in which way the evolution affects the protein structure. Rosetta makes it possible to minimize the library by generating a set of most probable candidates for a usable synthetase. This way, the library is much more manageable. Hence, for our project, we want to find suitable synthetases using Rosetta and build the best results in the lab and evaluate them.
+
</div>
</article>
+
<div class="figure medium">
<h2>Evaluation subproject</h2>
+
<img class="figure image" src="https://static.igem.org/mediawiki/2017/3/35/T--Bielefeld-CeBiTec--CDR-overviewmodeling.png">
<article>
+
<p class="figure subtitle"><b>Figure 1: Modeling Project Overview</b><br> A stylized overview of our modeling project,
Next to our modeling evolution subproject, we want to establish the classic positive negative selection [Liu et al., 1997] process with the <i>Methanoccocus jannischii</i> tyrosyl tRNAsynthase and the non-canonical amino acid nitrophenylalanine. Therefore, the core of the evaluation project is to compare the tRNA synthetase produced in our laboratory with tRNA synthetases constructed in the past.
+
containing both <i>in silico</i> and <i>in vivo</i> components.</p>
</article>
+
</div>
 
+
<p class="table-headline"><b>Table 1: Steps of our modeling project</b> Our modeling project consists of seven main steps, combining <i>in silico</i> and <i>in vivo</i> components.</p>
<h2>Methods</h2>
+
<table style="margin-bottom: 0px;">
<article>
+
<thead>
In both subprojects,  open-source software "Rosetta" was used, which was introduced at the University of Washington by David Baker in 1997 [Simons et al., 1997], initially in the context of protein structure prediction. As analyzing such structures with NMR or similar methods is very expensive and time consuming, correctly predicting structures by means of computation holds great potential for future research. Since its release, Rosetta has grown to include numerous modules and is currently widely used in research. In our application we focus on two Rosetta modules, called the "Rosetta Ligand Docking Protocol" and the "Rosetta Enzyme Design Protocol", respectively.
+
<tr>
As part of the evaluation project, in addition to Rosetta, also was used the software "Modeller" to carry out homology structure predictions.
+
<th style="width: auto" class="header">Step</th>
Liu, David R., et al. "Engineering a tRNA and aminoacyl-tRNA synthetase for the site-specific incorporation of unnatural amino acids into proteins in vivo." Proceedings of the National Academy of Sciences 94.19 (1997): 10092-10097.
+
<th style="width: auto" class="header">Software/Method</th>
Simons, Kim T., et al. "Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions." Journal of molecular biology 268.1 (1997): 209-225.
+
<th style="width: auto" class="header">Meaning</th>
</article>
+
</tr>
        </div>
+
</thead>
 +
<tbody>
 +
  <tr>
 +
<td>1. Ligand preparation</td>
 +
<td>Manually via <a href="https://avogadro.cc/">Avogadro</a></td>
 +
<td>Due to the novelty of our amino acid, no information on the ligand is available in databases.
 +
Therefore, manual generation of a conformer ensemble, containing
 +
for example all energetically useful arrangements of atoms within the molecule, was required.</td>
 +
  </tr>
 +
  <tr>
 +
<td>2. Scaffold categorization</td>
 +
<td>ROSETTA protocol</td>
 +
<td>The scaffold describes the rough layout of the synthetase. We downloaded the scaffold
 +
<a href="http://www.rcsb.org/pdb/explore/explore.do?structureId=1j1u">1J1U </a>, the aaRS of <i> Methalonococcus janischii </i>  
 +
as a template, and then relaxed its structure to improve the outcome of the ROSETTA algorithm.</td>
 +
  </tr>
 +
  <tr>
 +
<td>3. Set simulation constrains</td>
 +
<td>Manually via ROSETTA</td>
 +
<td>Constrains with regards to possible mutations of the synthetase ensure that the generated sequences fit to the amino acid.  
 +
For example, we constrained the distance between certain atoms and their angle to a range optimal for hydrogen bonds.</td>
 +
  </tr>
 +
  <tr>
 +
<td>4. <a href="https://www.rosettacommons.org/manuals/archive/rosetta3.5_user_guide/d7/dfc/match.html">Enzyme matching</a></td>
 +
<td>ROSETTA protocol</td>
 +
<td>ROSETTA combines information about the ligand and constrains to find possible hydrogen bonding partners and propose the shape
 +
of the scaffold within the set constraints.</td>
 +
  </tr>
 +
  <tr>
 +
<td>5. <a href="https://www.rosettacommons.org/manuals/archive/rosetta3.5_user_guide/d6/dbc/enzyme_design.html">Enzyme design</a></td>
 +
<td>ROSETTA protocol</td>
 +
<td>An algorithm uses the information from the previous step and information on the ligand to simulate the mutation process and
 +
generate sequences for optimized scaffolds with corresponding scores as measures of fit.</td>
 +
  </tr>
 +
  <tr>
 +
<td>6. Evaluate results <i>in silico</i></td>
 +
<td>Manually</td>
 +
<td>Based on the score values, we ordered the synthesis of the most promising sequences. </td>
 +
  </tr>
 +
  <tr>
 +
  <td>7. Evaluate results <i>in vivo</i></td>
 +
<td>Manually</td>
 +
<td>The synthetases are validated in the lab with the corresponding ncAA via a
 +
<a href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Project/translational_system/library_and_selection">positive-negative selection system.</a> </td>
 +
  </tr>
 +
</tbody>
 +
</table>
 +
</div>
 
<div class="bevel bl"></div>
 
<div class="bevel bl"></div>
</div>
 
 
</div>
 
</div>
 
<div class="container">
 
 
<div class="contentbox">
 
<div class="contentbox">
 
<div class="bevel tr"></div>
 
<div class="bevel tr"></div>
<div class="content">
+
<div class="content">
<h2> Rosetta Ligand Docking </h2>
+
<h3>Introduction</h3>
<h4> Overview </h4>
+
                <h4>Overview</h4>
<article>
+
<div class="contentline">
The Ligand Docking protocol describes a way to assess how good small ligands can bind in a binding pocket of a big protein, such as an enzyme [Meiler et al., 2006]. Tthe Rosetta Ligand Docking protocol was applied in two ways. First,  the lab selection was evaluated(link zu selektionsplasmiden), so that we couldevaluate the specifity of our tRNA synthase and compare the results with the literature. In particular, we have seven aaRS sequences with different mutations that need to be ranked. The aaRS produced in our laboratory is then evaluated on the basis of this ranking.
+
<div class="third">
The second application of ligand docking is to raise the specificity of the aaRS that we create with the Enzyme Design protocol. After creating aaRS`s with that protocol, we will evaluate the best results with the Ligand Docking protocol, thereby limiting the number of possible candidates.
+
<div class="figure large">
</article>
+
<img class="figure image" src="https://static.igem.org/mediawiki/2017/3/36/T--Bielefeld-CeBiTec--CDR-pdb1j1u.png">
<h4> Methods</h4>
+
<p class="figure subtitle"><b>Figure 2: Tyrosyl-tRNA-synthetase </b><br> 3D-structure based on "1J1U" from PDB edited with pymol</p>
<article>
+
</div>
The Rosetta ligand docking protocol is useful if small ligands or drugs have to be docked in a random binding site. However, the investigation of protein-protein or protein-peptide interactions should be done via the  protein-protein docking protocol[Gray et al.]. This allows for example to simulate antigen antibody interactions.
+
</div>
 
+
<div class="third double">
The Rosetta ligand docking protocol expects specifically  prepared inputs, in particular the amino acid ligand and the protein backbone need to be provided in a standardized format.
+
<div class ="article">
The ligand needs to be specified in the MOL, MOL2- or SDF-file format, respectively. Such a description can be retrieved from a PDB file automatically. Therefore, the limited availability of PDB files for specific molecules is a major drawback. This conversion process usually  augments the data with hydrogen atoms that are typically missing from the PDB file. Alternatively, the ligand can be designed using SMILES or manually using tools such as Avogadro. In the next step, the SDF or MOL(2) file is used to create a conformer ensemble is in turn used to generate a Rosetta parameter file. In addition to the specific names of all atoms present in the ligand, this parameter file also stores all bonds between the individual atoms, including the binding angles and binding distances.
+
As part of our iGEM project, we faced the challenge of adapting the tRNA synthetase
The aaRS backbone is available in PDB via the identifier “1j1u” and obtained in the standard PDB file format. For the ligand docking, an additional relaxation step is required as a preprocessing step. This has already been performed by Florian Richter, who kindly provided the resulting relaxed 3D structure to us.
+
to non-canonical amino acids. For this purpose, we modeled possible candidates for synthetases as an
To reduce the time needed searching, the chain with the binding site (in our case chain ‘A’) is selected and the original ligand (tyrosyl) is deleted.
+
alternative to carrying out a positive-negative selection according to (Liu <i>et al.</i>, 2007) in the laboratory.
The Rosetta ligand docking process is configured in an XML file. Examples of such configurations can be found in []. An essential parameter is the starting position of the ligand in three-dimensional space. For this purpose we used http://ligasite.org/index.php?intro to obtain the position of the original Tyrosyl ligand, but it is sufficient if the ligand is placed in a radius of 10 A around the backbone, within that the starting position is chosen at random.
+
Finally, Rosetta execution is started in a shell, providing additional parameters on the command line or in an ‘options’ file.
+
</article>
+
+
<h4>Algorithm</h4>
+
<article>
+
The ligand docking algorithm basically consists of the following steps:
+
<ul>
+
<ol>1) starting position is chosen randomly or defined an .xml file</ol>
+
<ol>2) placement of the ligand is modified by a random translation of a distance of 0.1 A in each direction and 0.05° around each axis</ol>
+
<ol>3) rigid body orientation and side-chain angles of the ligand are optimized using the gradient based Davidson–Fletcher–Powell algorithm. Afterwards,  the corresponding energy function is calculate daccording to the Monte-Carlo method.
+
P= min (1, exp(-(E<sub>start</sub>-E<sub>final</sub>)/kT). This move is accepted if the energy function decreases.</ol>
+
</ul>
+
To find the optimal binding position, steps two and three have to be repeated 50 times.
+
This protocol has to be repeated N times.
+
N is depending on the size of the ligand, its flexibility (and therefore the size of the conformational ligand ensemble), and the binding site between 1000 and 5000.
+
The process is summarized in Fig.
+
 
+
  
 +
Due to the rapid development in the field of protein and molecular structure analysis, there has been an
 +
increase in the availability of molecular 3D structure data. These data are organized in publicly available
 +
databases which provide a foundation for the modeling and simulation of chemical-biological processes in bioinformatics.
 +
As our ncAA has been synthetized in our lab, no such comprehensive information is available, yet.
 +
However, information of similarly structured amino acids can potentially serve as a basis for our modeling.
  
<div class="figure medium">
+
Specifically, we focused on simulation to
<img class="figure image" src="https://static.igem.org/mediawiki/2017/thumb/3/37/T--Bielefeld-CeBiTec--CDR-flowchartligand.png/538px-T--Bielefeld-CeBiTec--CDR-flowchartligand.png">
+
design an aaRS for the new ncAA <a target="_blank"href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Project/toolbox/fusing#CBT">CBT-Asparagine</a>.
<p class="figure subtitle"><b>Figure (1): Flowchart Ligand Docking Protocol</b><br></p>
+
</div>
 +
</div>
 +
</div>
 +
<h4>Method</h4>
 +
<div class ="article">
 +
We used the open-source software <a href="https://www.rosettacommons.org/docs/latest/getting_started/Getting-Started">"ROSETTA"</a>
 +
for the main part of our modeling project, which was introduced at the University of Washington by David Baker in 1997
 +
(Simon <i>et al.</i>,1997), initially in the context of protein structure prediction. ROSETTA has grown through the addition of numerous modules and is currently widely used in research. In our application, we focus on the Rosetta module called the "Rosetta Enzyme Design Protocol"
 +
</div>
 +
</div>
 +
<div class="bevel bl"></div>
 +
</div>
 +
<div class="contentbox">
 +
<div class="bevel tr"></div>
 +
<div class="content">
 +
<h3>ROSETTA Enzyme Design</h3>
 +
<h4>Overview </h4>
 +
<div class ="article">
 +
Since the non-canonical amino acid synthesized in the laboratory is completely novel, there is no corresponding tRNA synthetase which can load the tRNA, yet. For this reason, we use the enzyme design protocol to design the binding pocket in a way that allows it to form an effective and specific enzyme. The protocol consists of two main steps: matching and designing (Richter <i>et al.</i>, 2011)
 +
The enzyme design algorithm is briefly summarized in Figure 3.
 
</div>
 
</div>
 
+
<div class="figure medium">
Meiler, Jens, and David Baker. "ROSETTALIGAND: Protein–small molecule docking with full side‐chain flexibility." Proteins: Structure, Function, and Bioinformatics 65.3 (2006): 538-548.
+
<img class="figure image" src="https://static.igem.org/mediawiki/2017/8/82/T--Bielefeld-CeBiTec--CDR-flowchartdesign.png">
Gray, Jeffrey J., et al. "Protein–protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations." Journal of molecular biology 331.1 (2003): 281-299.
+
<p class="figure subtitle"><b>Figure 3: Flowchart Enzym Design Protocol.</b><br></p>
</article>
+
</div>
        </div>
+
</div>
<div class="bevel bl"></div>
+
<div class="bevel bl"></div>
 
</div>
 
</div>
 
<div class="container">
 
 
<div class="contentbox">
 
<div class="contentbox">
 
<div class="bevel tr"></div>
 
<div class="bevel tr"></div>
 
<div class="content">
 
<div class="content">
<h2>Modeller</h2>
+
<h4> Matching Step</h4>
<h4>Overview</h4>
+
<div class="contentline">
<article>
+
<div class="third">
Within the scope of the evaluation project, 3D structures of the mutated tyrosyl aaRS of <i>Methanococus jannischii</i> are required. Unfortunately, there are no x-ray structure data available in the literature. Only sequences of seven aaRS`s which have been evolved on nitrophenylalanines were previously described [Peters et al.]. In order to predict the corresponding homology structures, we use the software “Modeller”.
+
<div class="figure large">
</article>
+
<img class="figure image" src="https://static.igem.org/mediawiki/2017/d/de/T--Bielefeld-CeBiTec--CDR-const.png">
<h4>Method</h4>
+
<p class="figure subtitle"><b>Figure 4: overview of constraints</b><br> all possible constraints, which can be set, with dashed lines indicating hydrogenbonds, normal lines indicating covalent bonds</p>
<article>
+
</div>
TODO-- noch nicht ganz fertig daher kein text
+
</div>
Peters, Francis B., et al. "Photocleavage of the polypeptide backbone by 2-nitrophenylalanine." Chemistry & biology 16.2 (2009): 148-152.
+
<div class="third double">
</article>
+
<div class="article">
        </div>
+
The meaning of the matching step is to match the amino acids which constrains to the ligand, following specific constrains which ensure that the result is sensible and feasible. For this, ROSETTA analyzes the structural formula of the non- canonical amino acid and offers the possible hydrogen binding partners.
<div class="bevel bl"></div>
+
</div>
 +
<h4>Matching step inputs</h4>
 +
<div class="article">
 +
For the matching step, the following input-files are needed:
 +
<ul>
 +
<li>a “.params”-file specifying information about the ligand</li>
 +
<li>a “.pdb”-file providing a rough scaffold layout</li>
 +
<li>a “.cst”-file to define the bindings between ligand and scaffold</li>
 +
<li>a “.pos”-file to define the positions of the amino acids in the scaffold </li>
 +
<li>a “.flags” file to control all inputs. These files are necessary, as they describe the ligand and backbone and specify the parameters of the algorithm </li>
 +
</ul>  
 +
To read about each file in further detail, please click the technical detail button below: </br>
 +
<a class="hidden-expand">SHOW TECHNICAL DETAILS</a>
 +
</div>
 +
<article class="hidden-block">
 +
<ul>
 +
<li> <a href="https://www.rosettacommons.org/demos/latest/tutorials/prepare_ligand/prepare_ligand_tutorial"> </a>“.params”-file: </br>
 +
A conformer ensemble has to be generated using information about the ligand, as the non-canonical amino acids are not generally available in databases like <a href=""http://www.rcsb.org/pdb/home/home.do>PDB</a>, making it necessary to build them manually
 +
using tools like <a href="https://pymol.org/2/">pymol</a>, <a href="https://avogadro.cc/">Avogadro</a> or <a href="http://www.cambridgesoft.com/software/overview.aspx">Chemdraw</a>. Using these tools,
 +
files can be saved in the desired format. The ligand needs to be specified in the “.sdf”, “.mol” or “.mol2” file format. Such a
 +
file can be obtained automatically by converting the relevant information from a “.pdb” file, if available. This conversion process usually also involves augmenting the data with hydrogen atoms
 +
in case they are missing from the “.pdb” file. Alternatively, the ligand can be designed using "Simplified Molecular Input Line Entry Specification" <a href="http://daylight.com/smiles/">(SMILES)</a> or manually using tools such as <a href="https://avogadro.cc/">Avogadro</a>, as we did. In the next step, the ligand file is used
 +
to create a conformer ensemble that is in turn used to create a Rosetta parameter (“.params”) file. In addition to the specific names of all atoms present in the ligand, this parameter file also
 +
  stores all bonds between the individual atoms, including the binding angles and binding distances. Rosetta cannot generate the conformer ensemble by itself, so an additional tool is needed.
 +
  Different tools are capable of creating the conformer ensemble automatically, but it is best to manually define constraints for the chi1, chi2 and backbone psi torsion angles that define the
 +
  orientation of the ligand in the binding pocket. For this, we know of three tools: The first is <a href="https://www.eyesopen.com/omega">OpenEye Omega</a>, but the full license is very costly and the academic free version is hard to obtain.
 +
  The second tool is <a href="http://accelrys.com/">Accelrys Discovery Studio</a>, but Accerlys does not provide a free license. The third tool is <a href="https://dasher.wustl.edu/tinker/">TINKER</a>, which is free, but poorly documented and depends on a specific keyfile,
 +
  which requires a high amount of chemical expertise to generate. Conformers might also be generated without constrains, for which different tools are available, in our case, we used <a href="http://www.conflex.net/">ConFlex</a>.
 +
  Conformers need to be stored in one file (“.sdf”, “.mol”, or “.mol2”).
 +
<li> “.pdb”-file: </br>
 +
The input-file for the scaffold, in our case the tRNA synthetase, can be downloaded in PDB format from Protein Data Bank (PDB). It is then necessary to delete the natural ligand from the PDB-file,
 +
as we need to incorporate our own aaRS. Additionally, it is advised to relax the  preferably, the structure should be relaxed in order to allow for flexibility with regards to the simulation outcomes.  
 +
For further details, see the <a href="https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/relax.">ROSETTA Relaxing documentation.</a>
 +
<li><a href="https://www.rosettacommons.org/manuals/archive/rosetta3.5_user_guide/d5/dd4/match_cstfile_format.html">“.cst”-file:</a>  </br>
 +
The .cst-file defines the potential hydrogen bonds between the ligand and the amino acid. For example, the code block  characterized by the tags “CST::BEGIN” and “CST::END”, specifies the orientation or
 +
catalytic function of the enzyme. </br>
 +
More specifically, the first record of the block begins with “TEMPLATE::ATOM_MAP”, followed by either “atom_name” or “atom_type”, depending on whether a specific residue or a specific type of residue
 +
is provided. In the latter case, it is not important to choose specific atoms. Instead, a catalytic residue of the amino acid such as “OH” or “Nhis” is specified. The next lines of the TEMPLATE::ATOM_MAP
 +
record define the residues using one-letter or three-letter-codes that are prefixed by “residue1” or “residue3”, respectively.
 +
The second record, beginning with the tag “CONSTRAINT”, contains all relevant distance, angle and torsion constraints for the matching. Each constraint is described with five parameters.
 +
In the case of the distance constraint, the first parameter describes the optimal distance “x0” between the chosen residues, the second parameter describes the tolerance “xtol”,
 +
the third parameter defines the strength “k” and the fourth parameter specifies the type of bond (1 for a covalent bond, 0 otherwise). If the modulus of the difference between the actual distance “x”
 +
and the specified optimal distance is smaller than the tolerance, then the penality score is zero. Otherwise, the constraint consists of the term <p>k*(|x-x<sub>0</sub>|-x<sub>tol</sub>)</p>
 +
to the penality score. For the angle and torsion constraints, the description is similar.  
 +
If necessary, additional hydrogen bonds to other atoms of the ligand are specified in terms of additional blocks, using the tag “VARIABLE::CST”.
 +
Finally, most of the blocks described above can be optionally followed by an “ALGORITHM_INFO” record that stores details of the matching algorithm by parameter values.
 +
We refer to the Rosetta documentation for further details.
 +
<li>”.pos”-file: </br>
 +
The “.pos” file contains the allowed locations in the scaffold for the chosen catalytic residues in each constraint block of the “.cst” file.
 +
</ul> </br>
 +
</article>
 +
</div>
 +
</div>
 +
<h4>Matching step outputs</h4>
 +
<div class="contentline">
 +
<div class="third">
 +
<div class="figure large">
 +
<img class="figure image" src="https://static.igem.org/mediawiki/2017/8/82/T--Bielefeld-CeBiTec--CDR-cst.png">
 +
<p class="figure subtitle"><b>Figure 5: example of an output-pdb-file from the matching step</b><br> CBT-Asparagine in purple, amino acid in green, created in pymol</p>
 +
</div>
 +
</div>
 +
<div class="third double">
 +
<div class="article">
 +
The output generated in the matching step is the layout of the scaffold as well as one or more states of the amino acid which enable interaction with the ligand. This information is stored as a “.pdb” file and becomes part of the input for the design step. </br>
 +
</div>
 +
<h4>Our results for this step </h4>
 +
<div class="article">
 +
We used the <a href="http://www.rcsb.org/pdb/explore.do?structureId=1j1u">“1J1U”</a>-scaffold from PDB for our matching step.
 +
The “1J1U.pdb”-file contains the tyrosyl-tRNA-synthetase, which is labeld under “Chain A”, the orthogonol tRNA under “Chain B”
 +
and the natural ligand tyrosyl. For our project, we deleted the natural ligand and “Chain B”, because it was not necessary to
 +
change their structure or sequence and it was a way to save compute time.
 +
We designed the ligands manually via <a href="https://avogadro.cc/">Avogadro</a> based on a default matching algorithm for both
 +
amino acids, thus creating useful .cst-files.  
 +
</div>
 +
</div>
 +
</div>
 +
</div>
 +
<div class="bevel bl"></div>
 
</div>
 
</div>
 
<div class="container">
 
 
<div class="contentbox">
 
<div class="contentbox">
 
<div class="bevel tr"></div>
 
<div class="bevel tr"></div>
<div class="content">
+
<div class="content">
<h2>Rosetta EnzymeDesign</h2>
+
<h4> Design Step</h4>
<h4>Overview</h4>
+
<div class="article">
<article>
+
The design step applies an algorithm such that the binding pocket and the near environment are mutated and the remaining scaffold is
For our project, we have to create a new aaRS for the non-canonical amino acid CBT. As this amino acid is synthesized for the first time, there is currently no suitable tRNA synthetase available to charge the tRNAs. Therefore, we applied the Enzyme Design Protocol in order to design the binding site of the synthetase in a way that allows it to form an effective and specific enzyme.  
+
repacked. Additionally, a badness-of-fit score is generated which indicates how well the mutation fits the amino acid. For every file from the
</article>
+
matching step, a model with a score and a “.pdb-file” was generated, specifying where the sequence can be located. Additionally, the ".pdb-file"
<h4>Method</h4>
+
makes visual analysis of the 3D-structure possible. Notably, the amino acid structure can be extracted separately.
<articles>
+
The following section describes the structure of the design step. Further details on each step can be obtaind by showing the Technical Details Section. </br>
The ligand needs to be specified in the MOL, MOL2- or SDF-file format. Such a description can be obtained automatically by converting the relevant information from a PDB file if available. This conversion process usually also involves augmenting the data with hydrogen atoms that are typically missing from the PDB file. Alternatively, the ligand can be designed using SMILES or manually using tools such as Avogadro. In the next step, the SDF or MOL(2) file is used to create a conformer ensemble that is used to create a Rosetta parameter file. In addition to the specific names of all atoms present in the ligand, this parameter file also stores all bonds between the individual atoms, including the binding angles and binding distances.
+
1. Optimizing the catalytic interactions </br>
- cst-file TODO noch in der Schwebe daher kein Text
+
For the first alternative, the file can be generated either by the Rosetta standard or a manually created .”res”- file. For more details, we refer to the  
 
+
<a href="https://www.rosettacommons.org/manuals/archive/rosetta3.5_user_guide/d1/d97/resfiles.html">Rosetta documentation</a>.  </br>
The enzyme design algorithm
+
For the latter alternative, residues are automatically categorized by their location of the C&alpha;;lpha;.
The enzyme design algorithm basically is summarized in Fig.  
+
<a class="hidden-expand">SHOW TECHNICAL DETAILS</a>
</article>
+
</div>
<div class="figure medium">
+
<article class="hidden-block">
<img class="figure image" src="https://static.igem.org/mediawiki/2017/8/82/T--Bielefeld-CeBiTec--CDR-flowchartdesign.png">
+
Residues are catagorized as follows: </br>
<p class="figure subtitle"><b>Figure (2): Flowchart Enzym Design Protocol</b><br></p>
+
<ul>
 +
  <li> residues that have their C&alpha; within a distance cut1 angstroms of any ligand heavyatom will be set to designable
 +
  <li> res that have C&alpha; within a distance cut2 of any ligand heavyatom and the C&beta; closer to that ligand atom than the Calpha will be set to designable. cut2 has to be larger than cut1
 +
  <li> res that have C&alpha; within a certain distance cut3 of any ligand heavyatom will be set to repackable. cut3 has to be larger than cut2
 +
  <li>res that have C&alpha; within a distance cut4 of any ligand heavy atom and the C&beta; closer to that ligand atom will be set to repackable. cut4 has to be larger than cut3
 +
  <li> all residues not in any of the above 4 groups are kept static.
 +
</ul>  </br>
 +
</article>
 +
<article>
 +
2. Cycles of sequence design and minimazation within constrains </br>
 +
To optimize the structure we applied an iterative optimization algorithm. This algorithm mutates all residues from the backbone, which are not part
 +
of the catalytic center, to alanine, and a small energy function refraction will place the ligand in an optimal position to the backbone. </br>
 +
<a class="hidden-expand">SHOW TECHNICAL DETAILS</a>
 +
</article>
 +
<article class="hidden-block">
 +
For this approach, bb_min and chi_min allow for backbone flexibility and the rotation of the torsions. An alternative for this minimization step is the
 +
Monte Carlo rigid body ligand sampling. For further information on this method, we refer to the <a href="https://www.rosettacommons.org/manuals/archive/rosetta3.5_user_guide/d6/dbc/enzyme_design.html">ROSETTA documentation</a>.
 +
</br>
 +
    </article>
 +
<h4>Design step inputs </h4>
 +
The following input files are relevant for the design procedure:
 +
<ul>
 +
  <li>“.pdb”-file generated in the matching step</li>
 +
  <li>“.cst”-file for the ligand</li>
 +
  <li>“.params”-file for the ligand and the scaffold </li>
 +
  <li>“.flags” to coordinate the inputs</li>
 +
</ul>
 +
For further information on these files, please refer to step 2 above. </br>
 +
 +
<div class="contentline">
 +
<div class="third">
 +
<div class="figure large">
 +
<img class="figure image" src="https://static.igem.org/mediawiki/2017/7/73/T--Bielefeld-CeBiTec--CDR-design.png">
 +
<p class="figure subtitle"><b>Figure 6: example of CBT-Asparagine in the binding pocket </b><br> CBT-asparagine in purple, scaffold in green, created in pymol</p>
 +
</div>
 +
</div>
 +
<div class="third double">
 +
<h4>Design step outputs</h4>
 +
<article>
 +
The output for the design step is a “.pdb”-file containing the mutated scaffold and a “.score”-file.
 +
For every .pdb-file, a line in the score-file is generated, so it is easy to evaluate the given structure.
 +
The first score in the file is the total score of the model. After that, the number of hydrogen bonds
 +
in the protein as a whole and in the constraints is listed, followed by the number of dismissed solutions in the
 +
catalytic residues as well in the whole protein and in the constraints.
 +
See the technical details below for a full overview of the output information </br>
 +
<a class="hidden-expand">SHOW TECHNICAL DETAILS</a>
 +
</article>
 +
 +
<article class="hidden-block">
 +
<ul>
 +
<li>total_score: energy (excluding the constraint energy)
 +
<li>fa_rep: full atom repulsive energy
 +
<li>hbond_sc: hbond sidechain energy
 +
<li>all_cst: all constraint energy
 +
<li>tot_pstat_pm: pack statistics, 0-1, 1 = fully packed
 +
<li>total_nlpstat_pm: pack statistics withouth the ligand present
 +
<li>tot_burunsat_pm: buried unsatisfied polar residues, higher = more buried unsat polars (just a count)
 +
<li>tot_hbond_pm: total number of hbonds
 +
<li>tot_NLconst_pm: total number of non-local contacts ( two residues form a nonlocal
 +
contact if they are farther than 8 residues apart in sequence but interact with a Rosetta score of lower than -1.0 )
 +
</ul> </br>
 +
</article>
 +
</div>
 +
</div>
 +
</div>
 +
<div class="bevel bl"></div>
 +
</div>
 +
<div class="contentbox">
 +
<div class="bevel tr"></div>
 +
<div class="content">
 +
<h3> Results </h3>
 +
<h4> Results <i>in silico</i> </h4>
 +
<div class="article">
 +
<p>We choose our synthetases based on a good total score and a good ligand score. We checked the corresponding PDB-files, and rated the ligand and the binding pocket
 +
as satisfying, so that the ligand assumedly does not collide with residues in the near environment.
 +
The total scores for CBT are not as good as the scores for NPA. However, the ligand scores are acceptable in both cases. A visual evaluation confirms that the ligand
 +
fits into the binding pocket. </p>
 +
<p><b>Our results for this step </b></p>
 +
<p>We used this algorithm to simulate the evolution of the tyrosyl-tRNA with the amino acids Nitrophenylalanine(NPA) and N<sup>γ</sup>&#x2011;2&#x2011;cyanobenzothiazol&#x2011;6&#x2011;yl&#x2011;L&#x2011;asparagine (CBT-asparagine). </p>
 +
<p><b>NPA simulation: </b></p>
 +
<p>The .cst-file contained two blocks for the nitrogroup of NPA. Since there are two oxygen-atoms in the nitrogroup,
 +
we defined two atom nametags. As several possibilities are useful, we defined two possible constraint partners
 +
  for the hydrogen bonds. The first is asparagine (N) or glutamine (Q) and the second is glycine (G). We set the
 +
  possible distance to 2.8 &#8491;, as it is the optimal distance for hydrogenbonds, and a tolerance level of 0.5 &#8491;.
 +
  We set the angles to 120° with a tolerance of 40°, as recommended by Florian Richter during our discussion in cologne.
 +
  The torsion angles were set to 180° with a tolerance of 180° and a penalty of 0, such that the torsion angles can rotate
 +
  completely freely.(Richter, unpublished data) </p>
 +
<p><b>CBT-Asparagine simulation:</b> </p>
 +
<article><p>CBT-Asparagine can build hydrogen bonds in two ways. The first is a weak hydrogen bond on the
 +
sulfur atom and the other possibility is a normal hydrogen bond on the nitrogen (N<sub>2</sub>)
 +
after the C&gamma;. We wrote three cst-files, one for a possible bond with sulfur, one for a
 +
possible bond with nitrogen, and one for both bonds. As possible corresponding amino acids, we chose serine,
 +
threonine, tyrosine, asparagine, glutamine, and glycine. </p>
 +
<a class="hidden-expand">SHOW TECHNICAL DETAILS</a></article>
 +
<article class="hidden-block">
 +
It is recommended to write a “.flags”-file, because there are several input parameters to be defined,
 +
but it is also possible to define them via console user interface. </br>
 +
For the categorization of the scaffold, we chose the automatic determination and set the following cuts, like the Baker-lab commonly used: cut1: 6 &#8491;,</br> cut2: 8 &#8491;,</br> cut3: 10 &#8491;</br> and cut4: 12 &#8491 </br>
 +
</article>
 +
 +
We used this algrithm to simulate the evolution of the tyrosyl-tRNA with the amino acids Nitrophenylalanine and CBT-ASP
 +
We obtained 13 synthetase sequences for CBT-ASP, and 43 sequences for NPA, which fit well into the binding site according to the ROSETTA score.
 +
The sequences for the best synthetases for NPA is available
 +
<a target="_blank"href="https://static.igem.org/mediawiki/2017/1/12/T--Bielefeld-CeBiTec--DKE_NPAseq.pdf">here</a>.
 +
</div>
 +
<p class="table-headline"><b>Table 2: ROSETTA Enzyme Design Protocol Results</b><br>ROSETTA scores of the best modeled synthetases for CBT-Asparagine and NPA.</p>
 +
<table>
 +
<thead>
 +
<tr>
 +
<th style="width: auto" class="header">Sequence Number</th>
 +
<th style="width: auto" class="header">Total Score</th>
 +
<th style="width: auto" class="header">Ligand Score</th>
 +
<th style="width: auto" class="header">ncAA</th>
 +
</tr>
 +
</thead>
 +
<tbody>
 +
<tr>
 +
<td>15</td>
 +
<td>124.88</td>
 +
<td>-3.77</td>
 +
<td>NPA</td>
 +
</tr>
 +
<tr>
 +
<td>19</td>
 +
<td>23.55</td>
 +
<td>-3.93</td>
 +
<td>NPA</td>
 +
</tr>
 +
<tr>
 +
<td>31</td>
 +
<td>-3.40</td>
 +
<td>-2.47</td>
 +
<td>NPA</td>
 +
</tr>
 +
<tr>
 +
<td>32</td>
 +
<td>-1.57</td>
 +
<td>-3.82 </td>
 +
<td>NPA</td>
 +
</tr>
 +
<tr>
 +
<td>40</td>
 +
<td>11.67</td>
 +
<td>-4.33</td>
 +
<td>NPA</td>
 +
</tr>
 +
<tr>
 +
<td>41</td>
 +
<td>11.55</td>
 +
<td>-2.98</td>
 +
<td>NPA</td>
 +
</tr>
 +
<tr>
 +
<td>43</td>
 +
<td>66.36</td>
 +
<td>-5.05</td>
 +
<td>NPA</td>
 +
</tr>
 +
<tr>
 +
<td>2</td>
 +
<td>38.01</td>
 +
<td>-6.56</td>
 +
<td>CBT-Asparagine</td>
 +
</tr>
 +
<tr>
 +
<td>4</td>
 +
<td>58.45</td>
 +
<td>-4.37</td>
 +
<td>CBT-Asparagine</td>
 +
</tr>
 +
<tr>
 +
<td>5</td>
 +
<td>109.13</td>
 +
<td>-4.25</td>
 +
<td>CBT-Asparagine</td>
 +
</tr>
 +
</tbody>
 +
</table>
 +
<p class="table-headline"><b>Table 2: ROSETTA Enzyme Design Protocol Results</b><br>ROSETTA scores of the best modeled synthetases for CBT-Asparagine and NPA.</p>
 +
<table>
 +
<thead>
 +
<tr>
 +
<th style="width: auto" class="header">Position</th>
 +
<th style="width: auto" class="header">Synthetase Number</th>
 +
<th style="width: auto" class="header">Original Amino Acid</th>
 +
<th style="width: auto" class="header">Mutation Amino Acid</th>
 +
</tr>
 +
</thead>
 +
<tbody>
 +
  <tr>
 +
<td>30</td>
 +
<td>5</td>
 +
<td>Serine</td>
 +
<td>Asparagine</td>
 +
  </tr>
 +
  <tr>
 +
<td>32</td>
 +
<td>5</td>
 +
<td>Tyrosine</td>
 +
<td>Threonine</td>
 +
  </tr>
 +
  <tr>
 +
<td>34</td>
 +
<td>2, 4</td>
 +
<td>Glycine</td>
 +
<td>Alanine</td>
 +
  </tr>
 +
  <tr>
 +
<td>36</td>
 +
<td>2</td>
 +
<td>Glutamine acid</td>
 +
<td>Isoleucine</td>
 +
  </tr>
 +
  <tr>
 +
<td>61</td>
 +
<td>5</td>
 +
<td>Asparagine acid</td>
 +
<td>Arginine</td>
 +
  </tr>
 +
  <tr>
 +
<td>63</td>
 +
<td>5</td>
 +
<td>Isoleucine</td>
 +
<td>Alanine</td>
 +
  </tr>
 +
  <tr>
 +
<td>65</td>
 +
<td>4</td>
 +
<td>Leucine</td>
 +
<td>Glycine</td>
 +
  </tr>
 +
<tr>
 +
<td>65</td>
 +
<td>5</td>
 +
<td>Leucine</td>
 +
<td>Threonine</td>
 +
  </tr>
 +
<tr>
 +
<td>68</td>
 +
<td>4</td>
 +
<td>Asparagine acid</td>
 +
<td>Alanine</td>
 +
  </tr>
 +
<tr>
 +
<td>69</td>
 +
<td>4</td>
 +
<td>Leucine</td>
 +
<td>Alanine</td>
 +
  </tr>
 +
<tr>
 +
<td>70</td>
 +
<td>2</td>
 +
<td>Histidine</td>
 +
<td>Asparagine acid</td>
 +
  </tr>
 +
<tr>
 +
<td>70</td>
 +
<td>4</td>
 +
<td>Histidine</td>
 +
<td>Glycine</td>
 +
  </tr>
 +
<tr>
 +
<td>72</td>
 +
<td>4</td>
 +
<td>Tyrosine</td>
 +
<td>Glutamine acid</td>
 +
  </tr>
 +
<tr>
 +
<td>73</td>
 +
<td>2</td>
 +
<td>Leucine</td>
 +
<td>Alanine</td>
 +
  </tr>
 +
<tr>
 +
<td>73</td>
 +
<td>4</td>
 +
<td>Leucine</td>
 +
<td>Methionine</td>
 +
  </tr>
 +
<tr>
 +
<td>74</td>
 +
<td>2</td>
 +
<td>Asparagine</td>
 +
<td>Asparagine acid</td>
 +
  </tr>
 +
<tr>
 +
<td>76</td>
 +
<td>2</td>
 +
<td>Lysine</td>
 +
<td>Serine</td>
 +
  </tr>
 +
<tr>
 +
<td>79</td>
 +
<td>4</td>
 +
<td>Leucine</td>
 +
<td>Arginine</td>
 +
  </tr>
 +
<tr>
 +
<td>101</td>
 +
<td>5</td>
 +
<td>Lysine</td>
 +
<td>Glutamine acid</td>
 +
  </tr>
 +
<tr>
 +
<td>103</td>
 +
<td>5</td>
 +
<td>Valine</td>
 +
<td>Triptophane</td>
 +
  </tr>
 +
<tr>
 +
<td>104</td>
 +
<td>5</td>
 +
<td>Tyrosine</td>
 +
<td>Valine</td>
 +
  </tr>
 +
<tr>
 +
<td>105</td>
 +
<td>4, 5</td>
 +
<td>Glycine</td>
 +
<td>Serine</td>
 +
  </tr>
 +
<tr>
 +
<td>107</td>
 +
<td>5</td>
 +
<td>Glutamine acid</td>
 +
<td>Lysine</td>
 +
  </tr>
 +
<tr>
 +
<td>108</td>
 +
<td>4</td>
 +
<td>Phenylalanine</td>
 +
<td>Lysine</td>
 +
  </tr>
 +
<tr>
 +
<td>108</td>
 +
<td>5</td>
 +
<td>Phenylalanine</td>
 +
<td>Arginine</td>
 +
  </tr>
 +
<tr>
 +
<td>109</td>
 +
<td>4, 5</td>
 +
<td>Glutamine</td>
 +
<td>Alanine</td>
 +
  </tr>
 +
<tr>
 +
<td>114</td>
 +
<td>4</td>
 +
<td>Tyrosine</td>
 +
<td>Alanine</td>
 +
  </tr>
 +
<tr>
 +
<td>115</td>
 +
<td>4</td>
 +
<td>Threonine</td>
 +
<td>Triptophane</td>
 +
  </tr>
 +
<tr>
 +
<td>118</td>
 +
<td>4</td>
 +
<td>Valine</td>
 +
<td>Serine</td>
 +
  </tr>
 +
<tr>
 +
<td>134</td>
 +
<td>2</td>
 +
<td>Methionine</td>
 +
<td>Asparagine</td>
 +
  </tr>
 +
<tr>
 +
<td>137</td>
 +
<td>2</td>
 +
<td>Isoleucine</td>
 +
<td>Alanine</td>
 +
  </tr>
 +
<tr>
 +
<td>139</td>
 +
<td>2</td>
 +
<td>Arginine</td>
 +
<td>Serine</td>
 +
  </tr>
 +
<tr>
 +
<td>147</td>
 +
<td>2, 4</td>
 +
<td>Alanine</td>
 +
<td>Serine</td>
 +
  </tr>
 +
<tr>
 +
<td>148</td>
 +
<td>2</td>
 +
<td>Glutamine</td>
 +
<td>Lysine</td>
 +
  </tr>
 +
  <tr>
 +
<td>149</td>
 +
<td>2</td>
 +
<td>Valine</td>
 +
<td>Threonine</td>
 +
  </tr>
 +
  <tr>
 +
<td>150</td>
 +
<td>2, 4</td>
 +
<td>Isoleucine</td>
 +
<td>Leucine</td>
 +
  </tr>
 +
<tr>
 +
<td>151</td>
 +
<td>2</td>
 +
<td>Tyrosine</td>
 +
<td>Serine</td>
 +
  </tr>
 +
<tr>
 +
<td>152</td>
 +
<td>2</td>
 +
<td>Proline</td>
 +
<td>Threonine</td>
 +
  </tr>
 +
<tr>
 +
<td>153</td>
 +
<td>2</td>
 +
<td>Isoleucine</td>
 +
<td>Leucine</td>
 +
  </tr>
 +
<tr>
 +
<td>153</td>
 +
<td>4</td>
 +
<td>Isoleucine</td>
 +
<td>Threonine</td>
 +
  </tr>
 +
<tr>
 +
<td>154</td>
 +
<td>2</td>
 +
<td>Methionine</td>
 +
<td>Asparagine</td>
 +
  </tr>
 +
<tr>
 +
<td>154</td>
 +
<td>4, 5</td>
 +
<td>Methionine</td>
 +
<td>Serine</td>
 +
  </tr>
 +
<tr>
 +
<td>155</td>
 +
<td>2</td>
 +
<td>Glutamine</td>
 +
<td>Glycine</td>
 +
  </tr>
 +
<tr>
 +
<td>155</td>
 +
<td>4, 5</td>
 +
<td>Glutamine</td>
 +
<td>Alanine</td>
 +
  </tr>
 +
<tr>
 +
<td>156</td>
 +
<td>4</td>
 +
<td>Valine</td>
 +
<td>Alanine</td>
 +
  </tr>
 +
<tr>
 +
<td>157</td>
 +
<td>4</td>
 +
<td>Asparagine</td>
 +
<td>Alanine</td>
 +
  </tr>
 +
<tr>
 +
<td>158</td>
 +
<td>4, 5</td>
 +
<td>Asparagine</td>
 +
<td>Tyrosine</td>
 +
  </tr>
 +
<tr>
 +
<td>159</td>
 +
<td>4</td>
 +
<td>Isoleucine</td>
 +
<td>Glycine</td>
 +
  </tr>
 +
<tr>
 +
<td>161</td>
 +
<td>4</td>
 +
<td>Tyrosine</td>
 +
<td>Glutamine acid</td>
 +
  </tr>
 +
<tr>
 +
<td>161</td>
 +
<td>5</td>
 +
<td>Tyrosine</td>
 +
<td>Asparagine acid</td>
 +
  </tr>
 +
<tr>
 +
<td>162</td>
 +
<td>4</td>
 +
<td>Leucine</td>
 +
<td>Methionine</td>
 +
  </tr>
 +
<tr>
 +
<td>162</td>
 +
<td>5</td>
 +
<td>Leucine</td>
 +
<td>Alanine</td>
 +
  </tr>
 +
  <tr>
 +
<td>164</td>
 +
<td>5</td>
 +
<td>Valine</td>
 +
<td>Alanine</td>
 +
  </tr>
 +
<tr>
 +
<td>172</td>
 +
<td>2</td>
 +
<td>Glutamine acid</td>
 +
<td>Lysine</td>
 +
  </tr>
 +
<tr>
 +
<td>173</td>
 +
<td>2</td>
 +
<td>Glutamine</td>
 +
<td>Serine</td>
 +
  </tr>
 +
<tr>
 +
<td>176</td>
 +
<td>2</td>
 +
<td>Isoleucine</td>
 +
<td>Serine</td>
 +
  </tr>
 +
<tr>
 +
<td>204-210</td>
 +
<td>2, 4, 5</td>
 +
<td>varying</td>
 +
<td>-</td>
 +
  </tr>
 +
<tr>
 +
<td>307</td>
 +
<td>2</td>
 +
<td>-</td>
 +
<td>Alanine</td>
 +
  </tr>
 +
<tr>
 +
<td>307</td>
 +
<td>4, 5</td>
 +
<td>-</td>
 +
<td>Tyrosine</td>
 +
  </tr>
 +
</tbody>
 +
</table>
 
</div>
 
</div>
 +
<div class="bevel bl"></div>
 +
</div>
 +
<div class="contentbox">
 +
<div class="bevel tr"></div>
 +
<div class="content">
 +
<h4> <i>in vivo</i> validation of predicted tRNA synthetase structures </h4>
 +
<div class="article">
 +
In order to test the functionality and specificity of our modeled aaRS, we selected the most promising 11 amino acid sequences and ordered seven synthetase sequences of NPA via IDT, and four
 +
synthetase sequences of CBT-Asparagine by courtesy of Genscript, where we had previously won a grant of 500€.
 +
Since the DNA synthesis by GenScript and IDT were delayed by several weeks, we could not perform the best practice characterization of these parts.
 +
Finally, we received only three of the ordered syntheses in sufficient quality for further experiments. All of them encode predicted CBT-tRNA-synthetases.
 +
We subjected the sequences to a positive selection as initial characterization. Plasmids encoding the predicted best candidates were cotransformed with our
 +
positive selection plasmid into <i>E. coli(BL21 DE3)</i>. Due to the IPTG induced promoter, we tested different IPTG concentrations, including 0 mM, 5 mM,
 +
10 mM, and 15 mM. In addition, we tried different concentration of the antibiotics: kan15, cm15/kan15, and cm15/kan15/tet5. The numbers of resulting colonies is presented in figure 10. Our <i>in vivo</i> results show that our <i>in silico</i> designed enzymes kept their native function and are
 +
able to integrate amino acids through an amber codon matching tRNA. Since these results only indicate the acceptance and transfer of the non-canonical
 +
amino acid, additional experiment are required to demonstrate a high specificity of these enzymes. </br>
 +
We offer the predicted sequences to the community for further characterization via the parts-reg (<a target="_blank"href="http://parts.igem.org/Part:BBa_K2201300">BBa_K2201300</a>, <a target="_blank"href="http://parts.igem.org/Part:BBa_K2201301">BBa_K2201301</a>, <a target="_blank"href="http://parts.igem.org/Part:BBa_K2201302">BBa_K2201302</a>).
 +
 +
</div>
 +
<div class="contentline">
 +
<div class="third">
 +
<div class="figure large">
 +
<img class="figure image" src="https://static.igem.org/mediawiki/2017/8/8f/T--Bielefeld-CeBiTec--CDR-pl14pos.jpg ">
 +
<p class="positive selection with synthetase(BBa_K2201300)"><b>Figure 7: positive selection of BBa_K2201300</b><br> positive selection on kanamycin</p>
 +
</div>
 +
</div>
 +
<div class="third">
 +
<div class="figure large">
 +
<img class="figure image" src="https://static.igem.org/mediawiki/2017/c/c2/T--Bielefeld-CeBiTec--CDR-pl9pos.jpg">
 +
<p class="positive selection with synthetase(BBa_K2201301)"><b>Figure 8: positive selection of BBa_K2201301</b><br>  positive selection on kanamycin</p>
 +
</div>
 +
</div>
 +
<div class="third">
 +
<div class="figure large">
 +
<img class="figure image" src="https://static.igem.org/mediawiki/2017/7/7b/T--Bielefeld-CeBiTec--CDR-pl10pos.jpg">
 +
<p class="positive selection with synthetase(BBa_K2201302)"><b>Figure 9: positive selection of BBa_K2201302</b><br>  positive selection on kanamycin</p>
 +
</div>
 +
</div>
 +
</div>
 +
<div class="figure large">
 +
<img class="figure image" src="https://static.igem.org/mediawiki/2017/4/4a/T--Bielefeld-CeBiTec--CDR-resultspositivselection.png">
 +
<p class="figure subtitle"><b>Figure 10: bargraph of numbers of colonies </b><br> different IPTG concentrations</p>
 +
</div>
 +
</div>
 +
<div class="bevel bl"></div>
 +
</div>
  
 
+
<div class="contentbox">
 +
<div class="bevel tr"></div>
 +
<div class="content">
 +
<h3> References </h3>
 +
<div class="article">
 +
All our files can be obtained <a href="https://2017.igem.org/File:Igemfolder1.zip">here.</a> </br>
 +
<b>Liu, W., Brock, A., Chen, S., Chen, S., Schultz, P. G. </b>,(2007). Genetic incorporation of unnatural amino acids into proteins in mammalian cells. Nature methods,<b> 4(3)</b>, 239-244.<br>
 +
<b>Richter, F., Leaver-Fay, A., Khare, S. D., Bjelic, S., Baker, D. </b>(2011). De novo enzyme design using Rosetta3. PloS one,<b> 6(5)</b>: e19230.<br>
 +
<b>Simons, K. T., Kooperberg, C., Huang, E., Baker, D.</b> (1997). Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. Journal of molecular biology, <b>268(1)</b>, 209-225.<br>
 +
</div>
 
         </div>
 
         </div>
<div class="bevel bl"></div>
+
<div class="bevel bl"></div>
 
</div>
 
</div>
 +
 +
 +
 
</body>
 
</body>
 
<script>
 
<script>

Latest revision as of 03:54, 2 November 2017

Modeling

Organization of our modeling projects

On this page, we describe our main modeling project, which was integral for our whole project. However, besides this complex modeling, we also conducted and applied several straight-forward stochastic and statistical models to support and guide numerous steps of our laboratory work. Some of these modeling projects are briefly described in this box; however, we recommend reading the linked pages for further information.

Discriminant function model for the ICG prediction:

We conducted a discriminant function analysis for the classification of nucleotides in Oxford Nanopore sequencing reads at a specific position. This model is part of our iCG software module and enabled the successful detection of unnatural bases.

Calculation of the required library size for the selection system:

We applied combinatorics and statistics to calculate the optimal library size for the tRNA synthetase selection process. This was a trade off between putting lots of efforts into constructing a very huge library and missing diversity in a too small library. Therefore, we predicted the optimal library size. Experimental validation of this prediction was done via MiSeq analysis of the diversity of a subset of this library.

Strength prediction for a transcription signal amplification system (BBa_K2201373):

We modeled and visually compared the mRFP production over time for a normal mRFP reporter system and compared it to our enhanced signaling.. Validity was done as part of the positive selection process for an adapted tRNA synthetase.

Short Summary

As our project explores possibilities of an expanded genetic code via unnatural bases and non-canonical amino acids, we set out to complement and improve our lab work via modeling of novel amino acyl tRNA synthetases (aaRS) for a non-canonical amino acids, which were synthetized in our lab. In order to incorporate non-canonical amino acids into proteins via the translational process, the aaRS has to attach the amino acid to the respective tRNA. Thus, we designed aaRS sequences which were adjusted to link our own non-canonical amino acid to a fitting tRNA. Candidates were evaluated and selected, based on a ROSETTA score. Most promising sequences were ordered via gene synthesis for the experimental validation. Figure 1 provides a rough overview of our modeling project. Table 1 below summarizes the realization in practice.

Figure 1: Modeling Project Overview
A stylized overview of our modeling project, containing both in silico and in vivo components.

Table 1: Steps of our modeling project Our modeling project consists of seven main steps, combining in silico and in vivo components.

Step Software/Method Meaning
1. Ligand preparation Manually via Avogadro Due to the novelty of our amino acid, no information on the ligand is available in databases. Therefore, manual generation of a conformer ensemble, containing for example all energetically useful arrangements of atoms within the molecule, was required.
2. Scaffold categorization ROSETTA protocol The scaffold describes the rough layout of the synthetase. We downloaded the scaffold 1J1U , the aaRS of Methalonococcus janischii as a template, and then relaxed its structure to improve the outcome of the ROSETTA algorithm.
3. Set simulation constrains Manually via ROSETTA Constrains with regards to possible mutations of the synthetase ensure that the generated sequences fit to the amino acid. For example, we constrained the distance between certain atoms and their angle to a range optimal for hydrogen bonds.
4. Enzyme matching ROSETTA protocol ROSETTA combines information about the ligand and constrains to find possible hydrogen bonding partners and propose the shape of the scaffold within the set constraints.
5. Enzyme design ROSETTA protocol An algorithm uses the information from the previous step and information on the ligand to simulate the mutation process and generate sequences for optimized scaffolds with corresponding scores as measures of fit.
6. Evaluate results in silico Manually Based on the score values, we ordered the synthesis of the most promising sequences.
7. Evaluate results in vivo Manually The synthetases are validated in the lab with the corresponding ncAA via a positive-negative selection system.

Introduction

Overview

Figure 2: Tyrosyl-tRNA-synthetase
3D-structure based on "1J1U" from PDB edited with pymol

As part of our iGEM project, we faced the challenge of adapting the tRNA synthetase to non-canonical amino acids. For this purpose, we modeled possible candidates for synthetases as an alternative to carrying out a positive-negative selection according to (Liu et al., 2007) in the laboratory. Due to the rapid development in the field of protein and molecular structure analysis, there has been an increase in the availability of molecular 3D structure data. These data are organized in publicly available databases which provide a foundation for the modeling and simulation of chemical-biological processes in bioinformatics. As our ncAA has been synthetized in our lab, no such comprehensive information is available, yet. However, information of similarly structured amino acids can potentially serve as a basis for our modeling. Specifically, we focused on simulation to design an aaRS for the new ncAA CBT-Asparagine.

Method

We used the open-source software "ROSETTA" for the main part of our modeling project, which was introduced at the University of Washington by David Baker in 1997 (Simon et al.,1997), initially in the context of protein structure prediction. ROSETTA has grown through the addition of numerous modules and is currently widely used in research. In our application, we focus on the Rosetta module called the "Rosetta Enzyme Design Protocol"

ROSETTA Enzyme Design

Overview

Since the non-canonical amino acid synthesized in the laboratory is completely novel, there is no corresponding tRNA synthetase which can load the tRNA, yet. For this reason, we use the enzyme design protocol to design the binding pocket in a way that allows it to form an effective and specific enzyme. The protocol consists of two main steps: matching and designing (Richter et al., 2011) The enzyme design algorithm is briefly summarized in Figure 3.

Figure 3: Flowchart Enzym Design Protocol.

Matching Step

Figure 4: overview of constraints
all possible constraints, which can be set, with dashed lines indicating hydrogenbonds, normal lines indicating covalent bonds

The meaning of the matching step is to match the amino acids which constrains to the ligand, following specific constrains which ensure that the result is sensible and feasible. For this, ROSETTA analyzes the structural formula of the non- canonical amino acid and offers the possible hydrogen binding partners.

Matching step inputs

For the matching step, the following input-files are needed:
  • a “.params”-file specifying information about the ligand
  • a “.pdb”-file providing a rough scaffold layout
  • a “.cst”-file to define the bindings between ligand and scaffold
  • a “.pos”-file to define the positions of the amino acids in the scaffold
  • a “.flags” file to control all inputs. These files are necessary, as they describe the ligand and backbone and specify the parameters of the algorithm
To read about each file in further detail, please click the technical detail button below:
SHOW TECHNICAL DETAILS
  • “.params”-file:
    A conformer ensemble has to be generated using information about the ligand, as the non-canonical amino acids are not generally available in databases like PDB, making it necessary to build them manually using tools like pymol, Avogadro or Chemdraw. Using these tools, files can be saved in the desired format. The ligand needs to be specified in the “.sdf”, “.mol” or “.mol2” file format. Such a file can be obtained automatically by converting the relevant information from a “.pdb” file, if available. This conversion process usually also involves augmenting the data with hydrogen atoms in case they are missing from the “.pdb” file. Alternatively, the ligand can be designed using "Simplified Molecular Input Line Entry Specification" (SMILES) or manually using tools such as Avogadro, as we did. In the next step, the ligand file is used to create a conformer ensemble that is in turn used to create a Rosetta parameter (“.params”) file. In addition to the specific names of all atoms present in the ligand, this parameter file also stores all bonds between the individual atoms, including the binding angles and binding distances. Rosetta cannot generate the conformer ensemble by itself, so an additional tool is needed. Different tools are capable of creating the conformer ensemble automatically, but it is best to manually define constraints for the chi1, chi2 and backbone psi torsion angles that define the orientation of the ligand in the binding pocket. For this, we know of three tools: The first is OpenEye Omega, but the full license is very costly and the academic free version is hard to obtain. The second tool is Accelrys Discovery Studio, but Accerlys does not provide a free license. The third tool is TINKER, which is free, but poorly documented and depends on a specific keyfile, which requires a high amount of chemical expertise to generate. Conformers might also be generated without constrains, for which different tools are available, in our case, we used ConFlex. Conformers need to be stored in one file (“.sdf”, “.mol”, or “.mol2”).
  • “.pdb”-file:
    The input-file for the scaffold, in our case the tRNA synthetase, can be downloaded in PDB format from Protein Data Bank (PDB). It is then necessary to delete the natural ligand from the PDB-file, as we need to incorporate our own aaRS. Additionally, it is advised to relax the preferably, the structure should be relaxed in order to allow for flexibility with regards to the simulation outcomes. For further details, see the ROSETTA Relaxing documentation.
  • “.cst”-file:
    The .cst-file defines the potential hydrogen bonds between the ligand and the amino acid. For example, the code block characterized by the tags “CST::BEGIN” and “CST::END”, specifies the orientation or catalytic function of the enzyme.
    More specifically, the first record of the block begins with “TEMPLATE::ATOM_MAP”, followed by either “atom_name” or “atom_type”, depending on whether a specific residue or a specific type of residue is provided. In the latter case, it is not important to choose specific atoms. Instead, a catalytic residue of the amino acid such as “OH” or “Nhis” is specified. The next lines of the TEMPLATE::ATOM_MAP record define the residues using one-letter or three-letter-codes that are prefixed by “residue1” or “residue3”, respectively. The second record, beginning with the tag “CONSTRAINT”, contains all relevant distance, angle and torsion constraints for the matching. Each constraint is described with five parameters. In the case of the distance constraint, the first parameter describes the optimal distance “x0” between the chosen residues, the second parameter describes the tolerance “xtol”, the third parameter defines the strength “k” and the fourth parameter specifies the type of bond (1 for a covalent bond, 0 otherwise). If the modulus of the difference between the actual distance “x” and the specified optimal distance is smaller than the tolerance, then the penality score is zero. Otherwise, the constraint consists of the term

    k*(|x-x0|-xtol)

    to the penality score. For the angle and torsion constraints, the description is similar. If necessary, additional hydrogen bonds to other atoms of the ligand are specified in terms of additional blocks, using the tag “VARIABLE::CST”. Finally, most of the blocks described above can be optionally followed by an “ALGORITHM_INFO” record that stores details of the matching algorithm by parameter values. We refer to the Rosetta documentation for further details.
  • ”.pos”-file:
    The “.pos” file contains the allowed locations in the scaffold for the chosen catalytic residues in each constraint block of the “.cst” file.

Matching step outputs

Figure 5: example of an output-pdb-file from the matching step
CBT-Asparagine in purple, amino acid in green, created in pymol

The output generated in the matching step is the layout of the scaffold as well as one or more states of the amino acid which enable interaction with the ligand. This information is stored as a “.pdb” file and becomes part of the input for the design step.

Our results for this step

We used the “1J1U”-scaffold from PDB for our matching step. The “1J1U.pdb”-file contains the tyrosyl-tRNA-synthetase, which is labeld under “Chain A”, the orthogonol tRNA under “Chain B” and the natural ligand tyrosyl. For our project, we deleted the natural ligand and “Chain B”, because it was not necessary to change their structure or sequence and it was a way to save compute time. We designed the ligands manually via Avogadro based on a default matching algorithm for both amino acids, thus creating useful .cst-files.

Design Step

The design step applies an algorithm such that the binding pocket and the near environment are mutated and the remaining scaffold is repacked. Additionally, a badness-of-fit score is generated which indicates how well the mutation fits the amino acid. For every file from the matching step, a model with a score and a “.pdb-file” was generated, specifying where the sequence can be located. Additionally, the ".pdb-file" makes visual analysis of the 3D-structure possible. Notably, the amino acid structure can be extracted separately. The following section describes the structure of the design step. Further details on each step can be obtaind by showing the Technical Details Section.
1. Optimizing the catalytic interactions
For the first alternative, the file can be generated either by the Rosetta standard or a manually created .”res”- file. For more details, we refer to the Rosetta documentation.
For the latter alternative, residues are automatically categorized by their location of the Cα;lpha;. SHOW TECHNICAL DETAILS
Residues are catagorized as follows:
  • residues that have their Cα within a distance cut1 angstroms of any ligand heavyatom will be set to designable
  • res that have Cα within a distance cut2 of any ligand heavyatom and the Cβ closer to that ligand atom than the Calpha will be set to designable. cut2 has to be larger than cut1
  • res that have Cα within a certain distance cut3 of any ligand heavyatom will be set to repackable. cut3 has to be larger than cut2
  • res that have Cα within a distance cut4 of any ligand heavy atom and the Cβ closer to that ligand atom will be set to repackable. cut4 has to be larger than cut3
  • all residues not in any of the above 4 groups are kept static.

2. Cycles of sequence design and minimazation within constrains
To optimize the structure we applied an iterative optimization algorithm. This algorithm mutates all residues from the backbone, which are not part of the catalytic center, to alanine, and a small energy function refraction will place the ligand in an optimal position to the backbone.
SHOW TECHNICAL DETAILS
For this approach, bb_min and chi_min allow for backbone flexibility and the rotation of the torsions. An alternative for this minimization step is the Monte Carlo rigid body ligand sampling. For further information on this method, we refer to the ROSETTA documentation.

Design step inputs

The following input files are relevant for the design procedure:
  • “.pdb”-file generated in the matching step
  • “.cst”-file for the ligand
  • “.params”-file for the ligand and the scaffold
  • “.flags” to coordinate the inputs
For further information on these files, please refer to step 2 above.

Figure 6: example of CBT-Asparagine in the binding pocket
CBT-asparagine in purple, scaffold in green, created in pymol

Design step outputs

The output for the design step is a “.pdb”-file containing the mutated scaffold and a “.score”-file. For every .pdb-file, a line in the score-file is generated, so it is easy to evaluate the given structure. The first score in the file is the total score of the model. After that, the number of hydrogen bonds in the protein as a whole and in the constraints is listed, followed by the number of dismissed solutions in the catalytic residues as well in the whole protein and in the constraints. See the technical details below for a full overview of the output information
SHOW TECHNICAL DETAILS
  • total_score: energy (excluding the constraint energy)
  • fa_rep: full atom repulsive energy
  • hbond_sc: hbond sidechain energy
  • all_cst: all constraint energy
  • tot_pstat_pm: pack statistics, 0-1, 1 = fully packed
  • total_nlpstat_pm: pack statistics withouth the ligand present
  • tot_burunsat_pm: buried unsatisfied polar residues, higher = more buried unsat polars (just a count)
  • tot_hbond_pm: total number of hbonds
  • tot_NLconst_pm: total number of non-local contacts ( two residues form a nonlocal contact if they are farther than 8 residues apart in sequence but interact with a Rosetta score of lower than -1.0 )

Results

Results in silico

We choose our synthetases based on a good total score and a good ligand score. We checked the corresponding PDB-files, and rated the ligand and the binding pocket as satisfying, so that the ligand assumedly does not collide with residues in the near environment. The total scores for CBT are not as good as the scores for NPA. However, the ligand scores are acceptable in both cases. A visual evaluation confirms that the ligand fits into the binding pocket.

Our results for this step

We used this algorithm to simulate the evolution of the tyrosyl-tRNA with the amino acids Nitrophenylalanine(NPA) and Nγ‑2‑cyanobenzothiazol‑6‑yl‑L‑asparagine (CBT-asparagine).

NPA simulation:

The .cst-file contained two blocks for the nitrogroup of NPA. Since there are two oxygen-atoms in the nitrogroup, we defined two atom nametags. As several possibilities are useful, we defined two possible constraint partners for the hydrogen bonds. The first is asparagine (N) or glutamine (Q) and the second is glycine (G). We set the possible distance to 2.8 Å, as it is the optimal distance for hydrogenbonds, and a tolerance level of 0.5 Å. We set the angles to 120° with a tolerance of 40°, as recommended by Florian Richter during our discussion in cologne. The torsion angles were set to 180° with a tolerance of 180° and a penalty of 0, such that the torsion angles can rotate completely freely.(Richter, unpublished data)

CBT-Asparagine simulation:

CBT-Asparagine can build hydrogen bonds in two ways. The first is a weak hydrogen bond on the sulfur atom and the other possibility is a normal hydrogen bond on the nitrogen (N2) after the Cγ. We wrote three cst-files, one for a possible bond with sulfur, one for a possible bond with nitrogen, and one for both bonds. As possible corresponding amino acids, we chose serine, threonine, tyrosine, asparagine, glutamine, and glycine.

SHOW TECHNICAL DETAILS
It is recommended to write a “.flags”-file, because there are several input parameters to be defined, but it is also possible to define them via console user interface.
For the categorization of the scaffold, we chose the automatic determination and set the following cuts, like the Baker-lab commonly used: cut1: 6 Å,
cut2: 8 Å,
cut3: 10 Å
and cut4: 12 &#8491
We used this algrithm to simulate the evolution of the tyrosyl-tRNA with the amino acids Nitrophenylalanine and CBT-ASP We obtained 13 synthetase sequences for CBT-ASP, and 43 sequences for NPA, which fit well into the binding site according to the ROSETTA score. The sequences for the best synthetases for NPA is available here.

Table 2: ROSETTA Enzyme Design Protocol Results
ROSETTA scores of the best modeled synthetases for CBT-Asparagine and NPA.

Sequence Number Total Score Ligand Score ncAA
15 124.88 -3.77 NPA
19 23.55 -3.93 NPA
31 -3.40 -2.47 NPA
32 -1.57 -3.82 NPA
40 11.67 -4.33 NPA
41 11.55 -2.98 NPA
43 66.36 -5.05 NPA
2 38.01 -6.56 CBT-Asparagine
4 58.45 -4.37 CBT-Asparagine
5 109.13 -4.25 CBT-Asparagine

Table 2: ROSETTA Enzyme Design Protocol Results
ROSETTA scores of the best modeled synthetases for CBT-Asparagine and NPA.

Position Synthetase Number Original Amino Acid Mutation Amino Acid
30 5 Serine Asparagine
32 5 Tyrosine Threonine
34 2, 4 Glycine Alanine
36 2 Glutamine acid Isoleucine
61 5 Asparagine acid Arginine
63 5 Isoleucine Alanine
65 4 Leucine Glycine
65 5 Leucine Threonine
68 4 Asparagine acid Alanine
69 4 Leucine Alanine
70 2 Histidine Asparagine acid
70 4 Histidine Glycine
72 4 Tyrosine Glutamine acid
73 2 Leucine Alanine
73 4 Leucine Methionine
74 2 Asparagine Asparagine acid
76 2 Lysine Serine
79 4 Leucine Arginine
101 5 Lysine Glutamine acid
103 5 Valine Triptophane
104 5 Tyrosine Valine
105 4, 5 Glycine Serine
107 5 Glutamine acid Lysine
108 4 Phenylalanine Lysine
108 5 Phenylalanine Arginine
109 4, 5 Glutamine Alanine
114 4 Tyrosine Alanine
115 4 Threonine Triptophane
118 4 Valine Serine
134 2 Methionine Asparagine
137 2 Isoleucine Alanine
139 2 Arginine Serine
147 2, 4 Alanine Serine
148 2 Glutamine Lysine
149 2 Valine Threonine
150 2, 4 Isoleucine Leucine
151 2 Tyrosine Serine
152 2 Proline Threonine
153 2 Isoleucine Leucine
153 4 Isoleucine Threonine
154 2 Methionine Asparagine
154 4, 5 Methionine Serine
155 2 Glutamine Glycine
155 4, 5 Glutamine Alanine
156 4 Valine Alanine
157 4 Asparagine Alanine
158 4, 5 Asparagine Tyrosine
159 4 Isoleucine Glycine
161 4 Tyrosine Glutamine acid
161 5 Tyrosine Asparagine acid
162 4 Leucine Methionine
162 5 Leucine Alanine
164 5 Valine Alanine
172 2 Glutamine acid Lysine
173 2 Glutamine Serine
176 2 Isoleucine Serine
204-210 2, 4, 5 varying -
307 2 - Alanine
307 4, 5 - Tyrosine

in vivo validation of predicted tRNA synthetase structures

In order to test the functionality and specificity of our modeled aaRS, we selected the most promising 11 amino acid sequences and ordered seven synthetase sequences of NPA via IDT, and four synthetase sequences of CBT-Asparagine by courtesy of Genscript, where we had previously won a grant of 500€. Since the DNA synthesis by GenScript and IDT were delayed by several weeks, we could not perform the best practice characterization of these parts. Finally, we received only three of the ordered syntheses in sufficient quality for further experiments. All of them encode predicted CBT-tRNA-synthetases. We subjected the sequences to a positive selection as initial characterization. Plasmids encoding the predicted best candidates were cotransformed with our positive selection plasmid into E. coli(BL21 DE3). Due to the IPTG induced promoter, we tested different IPTG concentrations, including 0 mM, 5 mM, 10 mM, and 15 mM. In addition, we tried different concentration of the antibiotics: kan15, cm15/kan15, and cm15/kan15/tet5. The numbers of resulting colonies is presented in figure 10. Our in vivo results show that our in silico designed enzymes kept their native function and are able to integrate amino acids through an amber codon matching tRNA. Since these results only indicate the acceptance and transfer of the non-canonical amino acid, additional experiment are required to demonstrate a high specificity of these enzymes.
We offer the predicted sequences to the community for further characterization via the parts-reg (BBa_K2201300, BBa_K2201301, BBa_K2201302).

Figure 7: positive selection of BBa_K2201300
positive selection on kanamycin

Figure 8: positive selection of BBa_K2201301
positive selection on kanamycin

Figure 9: positive selection of BBa_K2201302
positive selection on kanamycin

Figure 10: bargraph of numbers of colonies
different IPTG concentrations

References

All our files can be obtained here.
Liu, W., Brock, A., Chen, S., Chen, S., Schultz, P. G. ,(2007). Genetic incorporation of unnatural amino acids into proteins in mammalian cells. Nature methods, 4(3), 239-244.
Richter, F., Leaver-Fay, A., Khare, S. D., Bjelic, S., Baker, D. (2011). De novo enzyme design using Rosetta3. PloS one, 6(5): e19230.
Simons, K. T., Kooperberg, C., Huang, E., Baker, D. (1997). Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. Journal of molecular biology, 268(1), 209-225.