Line 142: | Line 142: | ||
</ul> | </ul> | ||
To read about each file in further detail, please click the technical detail button below: </br> | To read about each file in further detail, please click the technical detail button below: </br> | ||
− | + | <ul> | |
+ | <li> “.params”-file: </br> | ||
+ | A conformer ensemble has to be generated using information about the ligand, as the non-canonical amino acids are not generally available in databases like PDB, making it necessary to build them manually using tools like pymol, Avogadro or Chemdraw. Using these tools, files can be saved in the desired format. The ligand needs to be specified in the “.sdf”, “.mol” or “.mol2” file format. Such a file can be obtained automatically by converting the relevant information from a “.pdb” file, if available. This conversion process usually also involves augmenting the data with hydrogen atoms in case they are missing from the “.pdb” file. Alternatively, the ligand can be designed using SMILES or manually using tools such as Avogadro, as we did. In the next step, the ligand file is used to create a conformer ensemble that is in turn used to create a Rosetta parameter (“.params”) file. In addition to the specific names of all atoms present in the ligand, this parameter file also stores all bonds between the individual atoms, including the binding angles and binding distances. Rosetta cannot generate the conformer ensemble by itself, so an additional tool is needed. Different tools are capable of creating the conformer ensemble automatically, but it is best to manually define constraints for the chi1, chi2 and backbone psi torsion angles that define the orientation of the ligand in the binding pocket. For this, we know of three tools: The first is OpenEye Omega, but the full license is very costly and the free version is hard to obtain. The second tool is Accelrys Discovery Studio, but Accerlys does not provide a free license. The third tool is TINKER, which is free, but poorly documented and depends on a specific keyfile, which requires a high amount of chemical expertise to generate. Conformers might also be generated without constrains, for which different tools are available, in our case, we used ConFlex. Conformers need to be stored in one file (“.sdf”, “.mol”, or “.mol2”). | ||
+ | <li> “.pdb”-file: </br> | ||
+ | The input-file for the scaffold, in our case the tRNA synthetase, can be downloaded in PDB format from Protein Data Bank (PDB). It is then necessary to delete the natural ligand from the PDB-file, as we need to incorporate our own aaRS. and,Additionally, it is advised to relax the preferably, the structure should be relaxedin order to allow for flexibility with regards to the simulation outcomes. For further details, see the (documentation: https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/relax.) | ||
+ | <li> “.cst”-file: </br> | ||
+ | The .cst-file defines the potential hydrogen bonds between the ligand and the amino acid. For example, the code block characterized by the tags “CST::BEGIN” and “CST::END”, specifies the orientation or catalytic function of the enzyme. </br> | ||
+ | More specifically, the first record of the block begins with “TEMPLATE::ATOM_MAP”, followed by either “atom_name” or “atom_type”, depending on whether a specific residue or a specific type of residue is provided. In the latter case, it is not important to choose specific atoms. Instead, a catalytic residue of the amino acid such as “OH” or “Nhis” is specified. The next lines of the TEMPLATE::ATOM_MAP record define the residues using one-letter or three-letter-codes that are prefixed by “residue1” or “residue3”, respectively. | ||
+ | The second record, beginning with the tag “CONSTRAINT”, contains all relevant distance, angle and torsion constraints for the matching. Each constraint is described with five parameters. In the case of the distance constraint, the first parameter describes the optimal distance “x0” between the chosen residues, the second parameter describes the tolerance “xtol”, the third parameter defines the strength “k” and the fourth parameter specifies the type of bond (1 for a covalent bond, 0 otherwise). If the modulus of the difference between the actual distance “x” and the specified optimal distance is smaller than the tolerance, then the penality score is zero. Otherwise, the constraint consists of the term | ||
+ | k* ( |x - x0| - xtol ) | ||
+ | to the penality score. For the angle and torsion constraints, the description is similar. | ||
+ | If necessary, additional hydrogen bonds to other atoms of the ligand are specified in terms of additional blocks, using the tag “VARIABLE::CST”. | ||
+ | Finally, most of the blocks described above can be optionally followed by an “ALGORITHM_INFO” record that stores details of the matching algorithm by parameter values. We refer to the Rosetta documentation for further details. | ||
+ | <li>”.pos”-file: </br> | ||
+ | The “.pos” file contains the allowed locations in the scaffold for the chosen catalytic residues in each constraint block of the “.cst” file. | ||
+ | </ul> </br> | ||
Matching step outputs </br> | Matching step outputs </br> | ||
The output generated in the matching step is the layout of the scaffold as well as one or more states of the amino acid which enable interaction with the ligand. This information is stored as a “.pdb” file and becomes part of the input for the design step. </br> | The output generated in the matching step is the layout of the scaffold as well as one or more states of the amino acid which enable interaction with the ligand. This information is stored as a “.pdb” file and becomes part of the input for the design step. </br> | ||
Line 156: | Line 171: | ||
The following section describes the structure of the design step. For further details on each step, click the technical details button. </br> | The following section describes the structure of the design step. For further details on each step, click the technical details button. </br> | ||
1. Optimizing the catalytic interactions </br> | 1. Optimizing the catalytic interactions </br> | ||
− | For the first alternative, the file can be generated either by the Rosetta standard or a manually created .”res”- file. For more details, we refer to the Rosetta documentation. (link:https://www.rosettacommons.org/manuals/archive/rosetta3.5_user_guide/d1/d97/resfiles.html) </br> | + | For the first alternative, the file can be generated either by the Rosetta standard or a manually created .”res”- file. For more details, we refer to the Rosetta documentation. (link:https://www.rosettacommons.org/manuals/archive/rosetta3.5_user_guide/d1/d97/resfiles.html). </br> |
− | + | For the latter alternative, residues are automatically categorized by their location of the Calpha. </br> | |
+ | Residues are catagorized as follows: </br> | ||
+ | <ul> | ||
+ | <li> residues that have their Calpha within a distance cut1 angstroms of any ligand heavyatom will be set to designable | ||
+ | <li> res that have Calpha within a distance cut2 of any ligand heavyatom and the Cbeta closer to that ligand atom than the Calpha will be set to designable. cut2 has to be larger than cut1 | ||
+ | <li> res that have Calpha within a certain distance cut3 of any ligand heavyatom will be set to repackable. cut3 has to be larger than cut2 | ||
+ | <li>res that have Calpha within a distance cut4 of any ligand heavy atom and the Cbeta closer to that ligand atom will be set to repackable. cut4 has to be larger than cut3 | ||
+ | <li> all residues not in any of the above 4 groups are kept static. | ||
+ | </ul> </br> | ||
2. Cycles of sequence design and minimazation within constrains </br> | 2. Cycles of sequence design and minimazation within constrains </br> | ||
To optimize the structure we used applied an iterative optimization algorithm. This algorithm mutates all residues from the backbone, which are not part of the catalytic center, to alanine, and a small energy function refraction will place the ligand in an optimal position to the backbone. </br> | To optimize the structure we used applied an iterative optimization algorithm. This algorithm mutates all residues from the backbone, which are not part of the catalytic center, to alanine, and a small energy function refraction will place the ligand in an optimal position to the backbone. </br> | ||
− | + | For this approach, bb_min and chi_min allow for backbone flexibility and the rotation of the torsions. An alternative for this minimization step is the Monte Carlo rigid body ligand sampling. For further information on this method, we refer to the ROSETTA documentation (https://www.rosettacommons.org/manuals/archive/rosetta3.5_user_guide/d6/dbc/enzyme_design.html). </br> | |
Design step inputs </br> | Design step inputs </br> | ||
The following input files are relevant for the design procedure: | The following input files are relevant for the design procedure: | ||
Line 175: | Line 198: | ||
The first score in the file is the total score of the model. After that, the number of hydrogen bonds in the protein as a whole and in the constraints is listed, followed by the number of dismissed polars in the catalytic residues as well in the whole protein and in the constraints. | The first score in the file is the total score of the model. After that, the number of hydrogen bonds in the protein as a whole and in the constraints is listed, followed by the number of dismissed polars in the catalytic residues as well in the whole protein and in the constraints. | ||
See the technical details below for a full overview of the output information </br> | See the technical details below for a full overview of the output information </br> | ||
− | + | <ul> | |
+ | <li>total_score: energy (excluding the constraint energy) | ||
+ | <li>fa_rep: full atom repulsive energy | ||
+ | <li>hbond_sc: hbond sidechain energy | ||
+ | <li>all_cst: all constraint energy | ||
+ | <li>tot_pstat_pm: pack statistics, 0-1, 1 = fully packed | ||
+ | <li>total_nlpstat_pm: pack statistics withouth the ligand present | ||
+ | <li>tot_burunsat_pm: buried unsatisfied polar residues, higher = more buried unsat polars (just a count) | ||
+ | <li>tot_hbond_pm: total number of hbonds | ||
+ | <li>tot_NLconst_pm: total number of non-local contacts ( two residues form a nonlocal | ||
+ | contact if they are farther than 8 residues apart in sequence but interact with a Rosetta score of lower than -1.0 ) | ||
+ | </ul> </br> | ||
We choose our synthetases because of a good total score and a good ligand score. We checked the corresponding PDB-files, and rated the ligand and the binding pocket as satisfying, so that the ligand assumedly does not collide with residues in the near environment. | We choose our synthetases because of a good total score and a good ligand score. We checked the corresponding PDB-files, and rated the ligand and the binding pocket as satisfying, so that the ligand assumedly does not collide with residues in the near environment. | ||
The total scores for CBT are not as good as the scores for NPA. However, the ligand scores are acceptable in both cases. A visual evaluation confirms that the ligand fits into the binding pocket. </br> | The total scores for CBT are not as good as the scores for NPA. However, the ligand scores are acceptable in both cases. A visual evaluation confirms that the ligand fits into the binding pocket. </br> | ||
Our results for this step </br> | Our results for this step </br> | ||
− | We used this | + | We used this algorithm to simulate the evolution of the tyrosyl-tRNA with the amino acids Nitrophenylalanine and CBT-ASP. </br> |
NPA simulation: </br> | NPA simulation: </br> | ||
We created one .cst-file-block for the nitrogroup of NPA. Since there are two oxygen-atoms in the nitrogroup, we defined two atom nametags. As several possibilities are useful, we defined two possible constraint partners for the hydrogen bonds. The first is asparagine (N) or glutamine (Q) and the second is glycine (G). We set the possible distance to 2.8 A, as it is the optimal distance for hydrogenbonds, and a tolerance level of 0.5 A. We set the angles to 120° with a tolerance of 40°, as recommended by Florian Richter during our talk in cologne. The torsion angles were set to 180° with a tolerance of 180° and a penalty of 0, such that the torsion angles can rotate completely freely. </br> | We created one .cst-file-block for the nitrogroup of NPA. Since there are two oxygen-atoms in the nitrogroup, we defined two atom nametags. As several possibilities are useful, we defined two possible constraint partners for the hydrogen bonds. The first is asparagine (N) or glutamine (Q) and the second is glycine (G). We set the possible distance to 2.8 A, as it is the optimal distance for hydrogenbonds, and a tolerance level of 0.5 A. We set the angles to 120° with a tolerance of 40°, as recommended by Florian Richter during our talk in cologne. The torsion angles were set to 180° with a tolerance of 180° and a penalty of 0, such that the torsion angles can rotate completely freely. </br> | ||
CBT-ASP simulation: </br> | CBT-ASP simulation: </br> | ||
CBT-ASP can build hydrogen bonds in two ways. The first is a weak hydrogen bond on the sulphur atom and the other possibility is a normal hydrogen bond on the nitrogen (N2) after the C-gamma. We wrote three cst-files, one for a possible bond with sulpur, one for a possible bond with nitrogen, and one for both bonds. As possible corresponding amino acids, we chose serine, threonine, tyrosine, asparagine, glutamine, and glycine. </br> | CBT-ASP can build hydrogen bonds in two ways. The first is a weak hydrogen bond on the sulphur atom and the other possibility is a normal hydrogen bond on the nitrogen (N2) after the C-gamma. We wrote three cst-files, one for a possible bond with sulpur, one for a possible bond with nitrogen, and one for both bonds. As possible corresponding amino acids, we chose serine, threonine, tyrosine, asparagine, glutamine, and glycine. </br> | ||
− | + | It is recommended to write a “.flags”-file, because there are several input- parameters to be defined, but it is also possible to define them via console user interface. </br> | |
+ | For the categorization of the scaffold, we chose the automatic determination and set the following cuts: cut1: 6 A, cut2: 8 A, cut3: 10 A and cut4: 12 A, like the baker-lab commonly used. | ||
+ | |||
<h3> Results </h3> | <h3> Results </h3> | ||
<h4> Results in silico </h4> | <h4> Results in silico </h4> | ||
Line 190: | Line 226: | ||
We obtained 13 synthetase sequences for CBT-ASP, and 43 sequences for NPA, which fit well into the binding site according to the ROSETTA score. | We obtained 13 synthetase sequences for CBT-ASP, and 43 sequences for NPA, which fit well into the binding site according to the ROSETTA score. | ||
<h4> Results in vivo </h4> | <h4> Results in vivo </h4> | ||
− | + | In order to test the functionality and specificity of our modeled aaRS, we translated a selection of the most promising amino acid sequences into DNA sequences optimized for E.coli and ordered them via gene synthesis. We then used a positive-negative selection system for characterization. The experiment proceeds as follows: | |
+ | Due to problems with regards to the protein- and salt-concentration, we retransformed the gensyntheses which had been cloned into pSB1C3. In a next step, these syntheses were cotransformed in E.coli(BL21) with our positive selection plasmid. | ||
+ | With regards to CBT2, only the original colony could be transformed. From CBT4 and CBT5, were used each originally isolated clone and its retransformed counterpart. | ||
+ | Due to the IPTC-induced promoter, we used variants without IPTG, and with 5 mM, 10 mM, and 15 mM added IPTS for all plasmids for the kanamycine resistance. | ||
+ | We chose additional variants with regards to the antibiotics; one variant each of kanamycine, kanamycine and chloramphenicole, and kanamycine, chloramphenicole and tetracycline. The number of resulting colonies for each variant are summarized in figure X. Our in vivo results show that our in silico designed enzymes did not lead to a loss of functioning. | ||
+ | |||
</article> | </article> | ||
Revision as of 21:41, 30 October 2017
Modeling
Short Summary
Step | Software/Method | Meaning |
---|---|---|
1. Ligand Preparation | Manually via Avogadro | Due to the novelty of our amino acid, no information on the ligand is available in databases. Therefore, all information has to be provided manually and then generate a conformer ensemble, containing for example all energetically useful arrangements of atoms within the molecule. |
2. Scaffold categorization | ROSETTA protocol | The scaffold describes the rough layout of the synthetase. We downloaded the scaffold 1j1u, the aaRS of Methalonococcus janischii as a template, and then relaxed its structure to improve the outcome of the ROSETTA algorithm. |
3. Set simulation constrains | Manually via ROSETTA | Constrains with regards to possible mutations of the synthetase ensure that the generated sequences fit to the amino acid. For example, we constrained the distance between certain atoms and their angle to a range optimal for hydrogen bonds. |
4. Enzyme Matching | ROSETTA protocol | ROSETTA combines information about the ligand and constrains to find possible hydrogen bonding partners and propose the shape of the scaffold within the set constraints. |
5. Enzyme Design | ROSETTA protocol | An algorithm uses the information from the previous step and information on the ligand to simulate the mutation process and generate sequences for optimized scaffolds with corresponding scores as measures of fit. |
6. Evaluate results in silico | Manually | We evaluate the visual output and the score values and order the sequences with the most promising results via gene synthesis. |
7. Evaluate results in vivo | Manually | The synthetases are validated in the lab with the corresponding ncAA via a positive-negative selection system. |
Introduction
Overview
Method
ROSETTA Enzyme Design
Overview
Figure (2): Flowchart Enzym Design Protocol
Matching Step
- a “.params”-file specifying information about the ligand
- a “.pdb”-file providing a rough scaffold layout
- a “.cst”-file to define the bindings between ligand and scaffold
- a “.pos”-file to define the positions of the amino acids of the scaffold
- a “.flags” file to control all inputs. These files are necessary, as they describe the ligand and backbone and specify the parameters of the algorithm
- “.params”-file: A conformer ensemble has to be generated using information about the ligand, as the non-canonical amino acids are not generally available in databases like PDB, making it necessary to build them manually using tools like pymol, Avogadro or Chemdraw. Using these tools, files can be saved in the desired format. The ligand needs to be specified in the “.sdf”, “.mol” or “.mol2” file format. Such a file can be obtained automatically by converting the relevant information from a “.pdb” file, if available. This conversion process usually also involves augmenting the data with hydrogen atoms in case they are missing from the “.pdb” file. Alternatively, the ligand can be designed using SMILES or manually using tools such as Avogadro, as we did. In the next step, the ligand file is used to create a conformer ensemble that is in turn used to create a Rosetta parameter (“.params”) file. In addition to the specific names of all atoms present in the ligand, this parameter file also stores all bonds between the individual atoms, including the binding angles and binding distances. Rosetta cannot generate the conformer ensemble by itself, so an additional tool is needed. Different tools are capable of creating the conformer ensemble automatically, but it is best to manually define constraints for the chi1, chi2 and backbone psi torsion angles that define the orientation of the ligand in the binding pocket. For this, we know of three tools: The first is OpenEye Omega, but the full license is very costly and the free version is hard to obtain. The second tool is Accelrys Discovery Studio, but Accerlys does not provide a free license. The third tool is TINKER, which is free, but poorly documented and depends on a specific keyfile, which requires a high amount of chemical expertise to generate. Conformers might also be generated without constrains, for which different tools are available, in our case, we used ConFlex. Conformers need to be stored in one file (“.sdf”, “.mol”, or “.mol2”).
- “.pdb”-file: The input-file for the scaffold, in our case the tRNA synthetase, can be downloaded in PDB format from Protein Data Bank (PDB). It is then necessary to delete the natural ligand from the PDB-file, as we need to incorporate our own aaRS. and,Additionally, it is advised to relax the preferably, the structure should be relaxedin order to allow for flexibility with regards to the simulation outcomes. For further details, see the (documentation: https://www.rosettacommons.org/docs/latest/application_documentation/structure_prediction/relax.)
- “.cst”-file: The .cst-file defines the potential hydrogen bonds between the ligand and the amino acid. For example, the code block characterized by the tags “CST::BEGIN” and “CST::END”, specifies the orientation or catalytic function of the enzyme. More specifically, the first record of the block begins with “TEMPLATE::ATOM_MAP”, followed by either “atom_name” or “atom_type”, depending on whether a specific residue or a specific type of residue is provided. In the latter case, it is not important to choose specific atoms. Instead, a catalytic residue of the amino acid such as “OH” or “Nhis” is specified. The next lines of the TEMPLATE::ATOM_MAP record define the residues using one-letter or three-letter-codes that are prefixed by “residue1” or “residue3”, respectively. The second record, beginning with the tag “CONSTRAINT”, contains all relevant distance, angle and torsion constraints for the matching. Each constraint is described with five parameters. In the case of the distance constraint, the first parameter describes the optimal distance “x0” between the chosen residues, the second parameter describes the tolerance “xtol”, the third parameter defines the strength “k” and the fourth parameter specifies the type of bond (1 for a covalent bond, 0 otherwise). If the modulus of the difference between the actual distance “x” and the specified optimal distance is smaller than the tolerance, then the penality score is zero. Otherwise, the constraint consists of the term k* ( |x - x0| - xtol ) to the penality score. For the angle and torsion constraints, the description is similar. If necessary, additional hydrogen bonds to other atoms of the ligand are specified in terms of additional blocks, using the tag “VARIABLE::CST”. Finally, most of the blocks described above can be optionally followed by an “ALGORITHM_INFO” record that stores details of the matching algorithm by parameter values. We refer to the Rosetta documentation for further details.
- ”.pos”-file: The “.pos” file contains the allowed locations in the scaffold for the chosen catalytic residues in each constraint block of the “.cst” file.
Design Step
- residues that have their Calpha within a distance cut1 angstroms of any ligand heavyatom will be set to designable
- res that have Calpha within a distance cut2 of any ligand heavyatom and the Cbeta closer to that ligand atom than the Calpha will be set to designable. cut2 has to be larger than cut1
- res that have Calpha within a certain distance cut3 of any ligand heavyatom will be set to repackable. cut3 has to be larger than cut2
- res that have Calpha within a distance cut4 of any ligand heavy atom and the Cbeta closer to that ligand atom will be set to repackable. cut4 has to be larger than cut3
- all residues not in any of the above 4 groups are kept static.
- “.pdb”-file generated in the matching step
- “.cst”-file for the ligand
- “.params”-file for the ligand and the scaffold
- “.flags” to coordinate the inputs
- total_score: energy (excluding the constraint energy)
- fa_rep: full atom repulsive energy
- hbond_sc: hbond sidechain energy
- all_cst: all constraint energy
- tot_pstat_pm: pack statistics, 0-1, 1 = fully packed
- total_nlpstat_pm: pack statistics withouth the ligand present
- tot_burunsat_pm: buried unsatisfied polar residues, higher = more buried unsat polars (just a count)
- tot_hbond_pm: total number of hbonds
- tot_NLconst_pm: total number of non-local contacts ( two residues form a nonlocal contact if they are farther than 8 residues apart in sequence but interact with a Rosetta score of lower than -1.0 )