Team:Bielefeld-CeBiTec/Results/translational system/library and selection

Library and Selection

Short Summary

The incorporation of a non‑canonical amino acid (ncAA) requires a tRNA/aminoacyl‑tRNA synthetase (tRNA/aaRS) pair which is able to accept and bind the ncAA (to charge the tRNA with the ncAA). Therefor aour aim is to generate a library of ncAAs with different binding sites. The aaRS, based on the wild type Methanococcus jannashii tyrosyl‑tRNA synthetase , is orthogonal to tRNA/aaRS from E.coli and suitable to incorporate novel ncAAs . To generate a library with random mutated amino acid binding sites we generated a template plasmid with an optical control and cloned the tyrosyl‑tRNA synthetase library, by using single stranded DNA annealing with randomized oligonucleotides. The library consist of approximately 150,000 library plasmids, including more than 27,672 variants of the tyrosyl‑tRNA synthetase. The tyrosyl‑tRNA synthetase library is the basis for positive and negative selection cycles , where an optimal adapted tyrosyl‑tRNA synthetase variant can be obtained.

Generating the Library

The integration of a new amino acid in the translational cycle implies the creation of a large number of tyrosyl-tRNA synthetase (TyrRS) variants. This can be achieved by generating a library. This library serves as a basis for selective processes or screenings, where an optimal tRNA/aaRS can be obtained. The library (BBa_K2201411) is based on the pSB1C3 plasmid, with the tyrosyl-tRNA synthetase (TyrRS) under control of a glnS promoter inserted between the iGEM BioBrick prefix and suffix. For the cloning of the vector, we used the primers 17jj and 17jk for the amplification of the pSB1C3 backbone and the primers 17hq and 17ht for the amplification of the TyrRS.

Figure 1: Library plasmid pSB1C3 containing the tyrosyl tRNA synthetase.
Library plasmid based on pSB1C3, containing the tyrosyl-tRNA synthetase under controle of a glnS promotor, a pMB1 origin of replication and chloramphenicole resistance.

For the generation of the tyrosyl-synthetase library, we used a method based on the dimerisation through the overlap of two randomized primers, which can be integrated on a certain position. The 3’ end of the two randomized primers face each other and overlap at the 3’ end. This homology, combined with the use of a Klenow Fragment is the key for this attempt. The Klenow Fragment is a purchasable Polymerase I fragment from Escherichia coli. It features a 5’-3’ Polymerase activity and a 3’-5’ Endonuklease activity (Beese et. al., 1993) and thereby can function as a polymerase to fill up smaller sequences based on a single stranded DNA. We use the Klenow Fragment to complete the dimer, generated from the primers, to a full double stranded sequence. This results in a small sequence which contains the randomized position and therefore filling the gap in the formed dimer by the use of the Klenow fragment. At this time, this formed small double stranded DNA fragment, containing the randomized position, can be integrated in the desired sequence by overlaps.
If a marker sequence is inserted in the position which should be randomized, the insertion of this certain sequence can be easily screened. We used a mRFP (BBa_J04450), under control of a lac promoter, lac operator and rrnB T1 terminator as an optical control. The primers 17vi and 17vj were designed with overlaps, homologous to the sequence around the binding pocket region synthetase sequence, allowing an optimal binding into the TyrRS. The position of the TyrRS chosen to be randomized, are Asp158, Ile159 and Leu162, the positions of the center of the binding pocket. The mRFP is inserted into the TyrRS in place of this binding side to function as an optical control. If the randomized DNA double strand is incorporated into the synthetase, the colonies color changes due to the absence of the mRFP, so the E. coli containing the randomized library plasmids can be easily picked for further positive and negative selection.

Figure 3: Generating a synthetase library by using oligo dimers and mRFP as an optical controle.
Two primers, one with a randomized position, are designed to form a dimer (1), which is completed to dsDNA by the Klenow fragment. The region of the TyrRS meant to be modified by randomization is replaced by mRFP as an optical control (2). In the case oft he incorporation oft he randomized dsDNA, the mRFP is replaced and thus the incorporation is visible directly.

We generated the TyrRS library with Gibson Assembly, after transformation we platet them out on LB-plates with chloramphenicol. Altogether, we received more than 130,000 colonies. In evidence due to th optical controle of the template used for the not randomized TyrRS plasmid backbone, we could easily determine the negative colonies. As depicted in Figure 4, , 48 of 1310 colonies approximately did not contain the randomized TyrRS library plasmid. Extrapolating this data, we received approximately 125,236.64 library plasmids out of 130,000 colonies, showing a cloning efficiency of 96,34 %, offering a wide diversity of different TyrRS variants.

Figure 4: Tyrosyl-tRNA/synthetase library on LB-plate with chloramphenicol.
The library was generated by using two primers, one with a randomized position, which are designed to form a dimer. This dimer is completed to a dsDNA by the Klenow fragment. As optical control, a mRFP is incorporated in this certain position to be ranomized, which is then replaced by the dsDNA. On this basis, we could see that xxx% of the cells incorporated the template plasmid, but most cells contained the plasmid with the incorporated randomized dsDNA.

Anaylzing the Tyrosyl tRNA/Aminoacyl-Synthetase Library

Sequencing by Sanger

Generating a tyrosyl‑tRNA/synthetase (Tyr‑RS) library using the NNK scheme for the randomization of three positions of the binding pocket leads to a large variety of different sequences. When randomizing three codons by using the NNK scheme, there is a possibility of 32,768 different sequence variants. Considering the different apportionments of the codons coding for the same amino acid lead to 8,000 different possibilities of amino acid sequences, having influence on the structure of the binding pocket of the TyrRS. Determined by the rules of combinatorics, we calculated 32,768 possible sequence variants. Following the equation 1, using the free statistical software R (R coreteam 2015) we obtained a statistically needed library size of 393,447 randomized plasmids, so that every possible sequence variant occurs at least once.

E(T)= n*Hn (1)


We achieved a library size of more than 27,672 TyrRS plasmids. When analyzing our Tyr‑RS library with the sanger sequencing, an unexpected signal distribution of the nucleotides occurred: Using certain fluorophores for the labeling in sanger sequencing, the fluorescence signal of the different labeled nucleotides is not identical.

Figure 3: Chromatograms of the TyrRS Library by sanger sequencing.
Depicting the chromatograms of four TyrRS library replicates being sequenced forward and reverse by sanger sequencing. The positions 158, 159, 162 of the TyrRS are randomized by NNK scheme.

First, the different signal intensities seemed to be caused by the use of the labelling fluorophores in sanger sequencing. In general, when using four-colour labelling on the dNTPs, the signal-to-noice ratio is reduced because of the spectral overlap of the fluorescence emission. This results in shorter and less accurate reads (Middendorf et.al.,2008). In addition to that, the excitation with one single wavelength compromises the sequencing results. That is due to the wide range of the varying absorption and fluorescence emission spectra (500‑800 nm) of the used fluorophores by a laser excited fluorescence of approximately 488 nm (Pfeufer et.al.,2015).

Regarding the chromatograms, depicted in Figure 4, the maximal fluorescence intensity of the thymidine is approximately 75 % up to 90 % lower than the maximal fluorescence intensity of the guanine. In comparison, the maximal fluorescence signal of the guanosine shows up to 97 % of the approximate maximal fluorescence signal of the cytosine.

Comparing these tendencies with the sequence results of the modified positions (NNK), lead to the following assumption: when generating a library using the NNK scheme, the rates for the incorporation of the different nucleotides is not evenly distributed. Originally, we expected an equal distribution of guanosine and thymidine on the K position, but the fluorescence signal of the thymidine is approximately higher than the fluorescence signal of the guanosine. Despite the given data of the maximal fluorescence intensity of the four nucleotides, at this position, the fluorescence signal of the thymidine is higher than the guanosines signal. This implies a higher incorporation rate of the thymidine if the sequence is randomized with a K in this position. Analogue to this experience, the distribution of the incorporation of the four nucleotides, resulting from an N randomization on this certain position, is not equal either. In relation to the other three incorporated nucleotides, there is an approximately higher cytosine signal on this position, also implying a higher incorporation rate of cytosine when using the N randomization.

Illumina Sequencing

Figure 4: Starting the Illumina MiSeq sequencing.

To attain a total number of sequence variants in our library, we used the Illumina Next Generation Sequencing (NGS) technique. The Illumina NGS is based on the binding of ssDNA fragments to adapters, presented on the surface on the flow cell. The fragments are amplified, forming double stranded DNA bridges. When being denaturated, single ssDNA fragments, anchored to the surface of the flow cell, are built, resulting in the several clusters of the same DNA fragment. When starting the process of sequencing, the primer, polymerase and four types of labelled reversible terminators are added to the flow cell. By excitation with a laser, after every incorporation, a certain nucleotide can be detected through the specific fluorescence. When repeating this cycle numerous times, the accurate sequence of bases in this fragment can be determined by aligning them.


We generated oligonucleotides containing a certain adapter sequence and a unique indice to separate our sequences from other libraries. The amplified region of our library had a maximal length of 500 bp, so the DNA fragments do not get entangled while bridge amplification, which would lead to an overlap of the different clusters. After amplification with the certain oligonucleotides, the PCR product was purified from a 1 % agarose gel. The quality of the library amplificate was controlled for the NGS by using the Agilent BioAnalyzer with High-Sensitivity DNA chip. This technology uses capillary electrophoresis for a sensitive quantification and sizing of DNA fragments to test if our library preparation matches the specifications of the Illumina MiSeq technology.


The electropherogram of the Agilent BioAnalyzer High Sensitivity DNA Assay shows our amplified library fragment as the largest peak of 2,874.13 pg/µL and a molarity of 7,735.3 pmol/L with a length of 563 bp and flanked by the two markers (35 bp and 10,380 bp). The image of the gel, depicted in figure 5, shows a thick band at 550-600 bp, fading out up to the band of 700 bp, matching the slightly uneven peak of the electropherogram. It is important, that there are as less as possible larger fragments, forcing a possible overload and therefore the abruption of sequencing.

The MiSeq sequencing showed 1,650,024 reads with the NNK motif, consisting of 30,440 different variants. Determined by the rules of combinatorics, we calculated 32,768 possible sequence variants.

When analyzing the sequences for the NNN motif, we obtain 1,652,553 reads. The difference between the number of reads for the NNK motif to the NNN motif gives us 2,529 variants with a coverage of 1 and implies, that reads with a covering of 1 are misinterpreted. For this reason, we consider all 2,768 variants, showing coverage of 1, as possible false instances. When combining the possible false instances with the 30,400 different variants results in a total library size of 27,672 different sequence variants.

We identified, that reads with a coverage of minimal two result in 8,787 different peptides. 8,464 different peptides can be translated of the sequence variants with a coverage higher than two, and 8,135 with a coverage higher than three. Based on this data, our 27,672 variants composed tyrosyl tRNA/synthetase library codes for more than 8,000 different peptides.

Considering that we continued the generation of the library after sequencing, nearly doubling the number of clones, we assume the tyrosyl-tRNA/synthetase library to be larger than the analyzed 27,672 different sequences and 8,000 peptides. We were not allowed to submit the complete library.


Figure 5: Electropherogram of the tyrosyl- synthetase library.
The Agilent BioAnalyzer High Sensitivity DNA Assay is used for the measurement. The library fragments are depicted as the peak in the center (563 bp), flanked by markers.

Figure 6: Gel image of the Agilent BioAnalyzer High Sensitivity DNA Assay of the tyrosyl- synthetase library.



Therefore, we submitted two versions of the basis library Plasmid ( BBa_K2201400 , BBa_K2201411 ) for the generation of a own library. In addition to that, the complete library is available to all future iGEM teams, after request.

Selection

Screening the whole library for its ability to incorporate the desired ncAAs specific would be too time-consuming. We decided to create a high throughput method for the selection of the clones that incorporate the target ncAA. The selection system is based on two selection steps that have to be repeated several times. In the first step, the positive selection, all clones that incorporate amino acids in response to the amber codon survive. The second step, the negative selection, is to select for specifity of the tyrosyl-tRNA synthetase. For both selection steps the library is cotransformed with the selection plasmid in pSB3T5 to prevent incompatibility to the library plasmid in pSB1C3. The plasmid charts of the selection plasmids are shown below as BioBricks.

The positive selection plasmid (BBa_K2201900) contains the Methanococcus jannaschii based tRNA (CUA) with an anticodon for the amber codon under the constitutive promoter proK. The essential part for the selection is the kanamycin resistance with two amber codons behind the translation start. If the tRNA/aminoacyl-synthetase mutant (encoded on a cotransformed library plasmid is able to charge the tRNA (CUA) with any amino acid) the cell could express the kanamycin resistance. Thus, these cells survive when plated out at LB agar plates with the ncAA and kanamycin.

Figure 7: Positive selection plasmid BBa_K2201900 .
Positive selection plasmid for the incorporation of ncAAs The positive selection plasmid contains a tRNA and a kanamycin resistance with two amber codons. Cotransformed with the library of tyrosyl-tRNA synthetase with random mutated binding sites, on kanamycin only the clones survive that could charge any amino acid to the tRNA in response to the UAG codon.

Figure 8: Negative selection plasmid Ba_K2201901
Negative selection plasmid against the incorporation of endogenous amino acids. The negative selection plasmid contains an tRNA with the anticodon for the amber codon and a barnase containing amber codons at permissive sites. In the negative selection the target amino acid is not supplemented to the media. If the cotransformend clones from the positive selection charge endogenous amino acids to the tRNA, the cells die. This provides a selection method for high specific aaRS.

In the negative selection only the cells that specific incorporate the ncAA, and not any endogenous amino acid, should survive. Therefore, the negative selection plasmid (BBa_K2201901) contains a toxin for E. coli, the barnase. Two amber codons are incorporated at permissive sites of the barnase and the plasmid contains the same tRNA (CUA) as the positive selection plasmid. In contrast to the positive selection, the cells are plated out on agar not containing the ncAA. Thus only synthetases charge the tRNA (CUA) which charge the tRNA with endogenous amino acids. These cells express the barnase and die.

Our goal is to generate a tRNA/synthetase which is able to incorporate 2-Nitro-L-phnylalanine, used for the photocleaving of the polypeptide backbone.

For the first round of selection, we cotransformed the library plasmid BBa_K2201400 with the (BBa_K2201900) and cultivated the cells on LB-plates with kanamycin and 2-nitrophenylalanine (2-NPA). Due to the amber stop codon, integrated in the kanamycine resistance on the positive selection plasmid, only the cells owning a functional aaRS survive. That is due to the amber stop codon on the kanamycin resistance. The resistance is only expressed, when a non canonical or endogenous amino acid is incorporated as response to the amber stop codon. Thus, the aaRS are selected for its function. To avoid an additional pressure on the cells, we did not used tetracycline or chloramphenicol for the cultivation, due to the dependency of the kanamycin expression on the library plasmid and the positive selection plasmid.

After the positive selection, we received approximately 800 colonies, showing that many of our generated TyrRS variants are able to bind a non canonical or endogenous amino acid despite the modifications. We washed these colonies off the plates, isolated the plasmids and cotransformed them with the negative selection plasmid (BBa_K2201901) and cultivated the cells on LB-plates with tetracycline and chloramphenicol to be certain to attain both plasmids.



Figure 9: Remaining colonies while the positive selection for 2-NPA, containing the positive selection plasmid and the library plasmid.
The remaining cells own an aaRS able to bind a non canonical or endogenous amino acid.

Figure 10: Remaining colonies while the negative selection and after the positive selection for 2-NPA, containing the neg selection plasmid and the library plasmid.
The remaining cells own an aaRS, specific to not bind an endogenous amino acid. Red colonies own a positive selection plasmid as result of the plasmid isolation and can be separated for further selection rounds easily.

Only cells, owning an aaRS which is as specifically that it does not bind an endogenous amino acid survived, due to the expression the barnase when responding to the amber stop codon and therefore binding of an endogenous amino acid. From the negative selection, we received < 100 colonies, showing a loss of more than 80 % of the aaRS candidates with which we first started the positive selection.
We combined the positive selection plasmid with a strengthening system (BBa_K2201373) , containing a T3 RNA-Polymerase with a reversed mRFP under T3 RNA-polymerase. With this system, the mRFP is expressed, resulting in a red colour of the colonies, still owning this positive selection plasmid. Thereby, it was possible to easily identify the clones owning the positive and not the negative selection plasmid while the negative selction. As it can be seen in Figure 10, the transformation efficiency of the positive selection plasmids, in contrast to the library plasmids, is low, resulting in one single false colony owning the positive selection plasmid.

Outlook – Additional Possible Selection Systems

We could show that our designed high throughput selection method does work and yield promising results. After generating our synthetase library, we performed the described selection process to acquire library mutants with the highest specifity to a given non-canonical amino acid with still intact orthogonality to its corresponding tRNA. Our selection is performed with positive and negative selection plasmids with an antibiotic resistence and a toxic gene with amber stop codons, respectively. We used a kanamycin resistence gene for the positive selection plasmid (BBa_K2201900) and a barnase gene for the negative selection plasmid (BBa_K2201901) (Figure 7 and 8). Both plasmids had a low copy origin of replication (pSB3T5) to not burden the cells with the overexpression of the genes contained on the plasmid.

While we designed our selection method we also constructed multiple parts to build further selection plasmids. On top of our submitted selection plasmids we provide the iGEM community with those parts. Hence any team wanting to perform selection processes for incorporating non-canonical amino acids can assembly selection plasmids based on their needs. Following we briefly describe the parts and suggest possible combinations for the positive and negative selection plasmid, respectively.

The most important part needed for both selection plasmids is the tRNAtyr (BBa_K2201408). It is the orthogonal tRNA pair to the synthetase library (BBa_K2201409) without any amber codons. This coding sequence can be adapted to be used with the promotor of choice and the amber codons at a desired location. Furthermore we provide the kanamycin resistance with two amber codons (K2201410) under the control of an araBAD promotor ( BBa_K808000). When browsing the literature for selection plasmids the chloramphenicol resistance CAT can be found ((Wang et al., 2001; Neumann et al., 2008)). We did not use CAT because it is the main resistance all the parts need to be submitted in, making it a bad choice for the iGEM community.

Figure 11: Possible Positive Selection Plasmid based on a Resistance Gene. The postive selection plasmid based on resistance containes the tRNAamber and the kanamycin resistance gene with two amber codons (S133Am and S154Am). The parts are shown here in pSB1C3.

Another approach for positive selection is a visual conformation. This can be achieved with a fluorescent protein. Based on Santoro et al. we provide a T7 RNA polymerase (T7 RNAP) and a GFP under control of the T7 promotor (Santoro et al., 2002). The T7 RNAP (BBa_K2201405) is under control of the inducible araBAD promotor ( BBa_K808000). Also we provide the coding sequence of T7 RNAP (BBa_K2201403) from the E. coli KRX strain without amber codons to be adapted as needed. The needed GFP under the T7 promotor can already be found in the parts reg (BBa_I746909). We reversed (BBa_K2201404) this part so the probability of the GFP being expressed subliminally trough read through by the endogenous RNA polymerase sinks.

Figure 12: Possible Positive Selection Plasmid based on Fluorescence. The postive selection plasmid based on fluorescence containes the tRNAamber, the T7 RNA polymerase with on amber codon and uvGFP under control of the T7 promotor. The parts are shown here in pSB1C3.

All parts for the positive selection could also be combined and used simultaneously (Liu and Schultz, 2010).

Figure 13: Possible Combined Positive Selection Plasmid based on a Resistance Gene and Fluorescence. This postive selection plasmid containes the tRNAamber, the T7 RNA polymerase with on amber codon,uvGFP under control of the T7 promotor and the kanamycin resistance gene with two amber codons (S133Am and S154Am). The parts are shown here in pSB1C3.

For the negative selection plasmid we just provide barnase as a toxic gene due to ccdB not being allowed anymore in the iGEM competition (Umehara et al., 2012). The coding sequence of barnase contains two amber codons (BBa_K2201406) (Liu and Schultz, 1999). We also combined the coding sequence with the inducible araBAD promotor ( BBa_K808000) and reversed the whole sequence.

Figure 14: Possible Negative Selection Plasmid. The negative selection plasmid containes the tRNAamber and barnase with two amber codons (Q4Am and D46Am). The barnase is under control of a araBAD promotor. The parts are shown here in pSB1C3.

Interestingly the question was raised if the library selection may be influenced by the used selection plasmids and selection protocols (Zhao and Arnold, 1997; Umehara et al., 2012; Guo et al., 2014). Umehara et al. could show that different selection plasmids influence the mutation profile yielding different synthetases then previously reported for a given library. Recently the Romesberg lab could also show, that a combination between positive selection and deep sequencing was enough to yield functional and specific synthetases (Zhang et al., 2017). During her stay in Berlin learned that the antibiotics concentration in the multiple positive selection steps also influences the selection, because the incorporation efficiency of the synthetase mutants differs. Generally the concentration of the main antibiotics (backbones) are kept constant throughout the selections but the antibiotics concentration for the resistant gene containing the amber codons are raised with every positive selection step. This is done because it is assumed that the incorporation efficiency of the synthetases increases with every selection step. Further research concerning this diverse aresa might provide compelling results for further projects.

We hope our part collection aids coming iGEM teams in constructing their own selection processes.

References

Pfeufer V., Schulze M. , (2015). Laser fluorescence powers sequencing advances. BioOptica World Beese,L.S, Derbyshire V, Steitz T.A. (1993). Structure of DNA Polymerase I Kienow Fragment Bound to Duplex DNA
Middendorf L.R., Humpfrey P.G., Narayanan N., Roemer S.C.. (2008) Chapter8 Sequencing Technology. WILEY Guo, L.-T., Wang, Y.-S., Nakamura, A., Eiler, D., Kavran, J.M., Wong, M., Kiessling, L.L., Steitz, T.A., O’Donoghue, P., and Söll, D. (2014). Polyspecific pyrrolysyl-tRNA synthetases from directed evolution. Proc. Natl. Acad. Sci. U. S. A. 111: 16724–16729.
Liu, C.C. and Schultz, P.G. (2010). Adding New Chemistries to the Genetic Code. Annu. Rev. Biochem. 79: 413–444.
Liu, D.R. and Schultz, P.G. (1999). Progress toward the evolution of an organism with an expanded genetic code. Proc. Natl. Acad. Sci. 96: 4780–4785.
Neumann, H., Peak-Chew, S.Y., and Chin, J.W. (2008). Genetically encoding N(epsilon)-acetyllysine in recombinant proteins. Nat. Chem. Biol. 4: 232–234.
Santoro, S.W., Wang, L., Herberich, B., King, D.S., and Schultz, P.G. (2002). An efficient system for the evolution of aminoacyl-tRNA synthetase specificity. Nat. Biotechnol. 20: 1044–1048.
Umehara, T., Kim, J., Lee, S., Guo, L.-T., Söll, D., and Park, H.-S. (2012). N-Acetyl lysyl-tRNA synthetases evolved by a CcdB-based selection possess N-acetyl lysine specificity in vitro and in vivo. FEBS Lett. 586: 729–733.
Wang, L., Brock, A., Herberich, B., and Schultz, P.G. (2001). Expanding the Genetic Code of Escherichia coli. Science 292: 498–500.
Zhang, Y., Lamb, B.M., Feldman, A.W., Zhou, A.X., Lavergne, T., Li, L., and Romesberg, F.E. (2017). A semisynthetic organism engineered for the stable expansion of the genetic alphabet. Proc. Natl. Acad. Sci. 114: 1317–1322.
Zhao, H. and Arnold, F.H. (1997). Combinatorial protein design: strategies for screening protein libraries. Curr. Opin. Struct. Biol. 7: 480–485.