Difference between revisions of "Team:Bielefeld-CeBiTec/Results/translational system/library and selection"

Revision as of 02:42, 31 October 2017

Library and Selection

Short Summary

The incorporation of a non‑canonical amino acid (ncAA) requires a tRNA/aminoacyl‑tRNA synthetase (tRNA/aaRS) pair which is able to accept and bind the ncAA (to charge the tRNA with the ncAA). Therefor aour aim is to generate a library of ncAAs with different binding sites. The aaRS, based on the wild type Methanococcus jannashii tyrosyl‑tRNA synthetase, is orthogonal to tRNA/aaRS from E.coli and suitable to incorporate novel ncAAs. To generate a library with random mutated amino acid binding sites we generated a template plasmid with an optical control and cloned the tyrosyl‑tRNA synthetase library, by using single stranded DNA annealing with randomized oligonucleotides. The library consist of approximately 150,000 library plasmids, including more than 27,672 variants of the tyrosyl‑tRNA synthetase. The tyrosyl‑tRNA synthetase library is the basis for positive and negative selection cycles, where an optimal adapted tyrosyl‑tRNA synthetase variant can be obtained.

Generating the library

The integration of a new amino acid in the translational cycle implies the creation of a large number of tRNA/aminoacyl tRNA synthetase (tRNA/aaRS) variants. This can be achieved by generating a library. This library serves as a basis for selective processes or screenings, where an optimal tRNA/aaRS can be obtained. The library is based on the pSB1C3 plasmid, with the tyrosyl-tRNA synthetase (TyrRS) under control of a glnS promoter inserted between the iGEM BioBrick prefix and suffix. For the cloning of the vector, we used the primers 17jj and 17jk for the amplification of the pSB1C3 backbone and the primers 17hq and 17ht for the amplification of the TyrRS.

Figure 1: Library plasmid pSB1C3 containing the tyrosyl tRNA synthetase.
Library plasmid based on pSB1C3, containing the tyrosyl tRNA synthetase under controle of a glnS promotor, a pMB1 origin of replication and chloramphenicole resistance.

For the generation of the tyrosyl-synthetase library, we used a method based on the dimerisation through the overlap of two randomized primers, which can be integrated on a certain position. The 3’ end of the two randomized primers face each other and overlap at the 3’ end. This homology, combined with the use of a Klenow Fragment is the key for this attempt. The Klenow Fragment is a purchasable Polymerase I fragment from Escherichia coli. It features a 5’-3’ Polymerase activity and a 3’-5’ Endonuklease activity (Beese et. al., 1993) and thereby can function as a polymerase to fill up smaller sequences based on a single stranded DNA. We use the Klenow Fragment to complete the dimer, generated from the primers, to a full double stranded sequence. This results in a small sequence which contains the randomized position and therefore filling the gap in the formed dimer by the use of the Klenow fragment. At this time, this formed small double stranded DNA fragment, containing the randomized position, can be integrated in the desired sequence by overlaps.
If a marker sequence is inserted in the position which should be randomized, the insertion of this certain sequence can be easily screened. We used a mRFP (BBa_J04450), under control of a lac promoter, lac operator and rrnB T1 terminator as an optical control. The primers were designed with overlaps, homologous to the sequence around the binding pocket region synthetase sequence, allowing an optimal binding into the TyrRS. The position of the TyrRS chosen to be randomized, are Asp158, Ile159 and Leu162, the positions of the center of the binding pocket. The mRFP is inserted into the TyrRS in place of this binding side to function as an optical control. If the randomized DNA double strand is incorporated into the synthetase, the colonies color changes due to the absence of the mRFP, so the E. coli containing the randomized library plasmids picked for the positive and negative selection.

Figure 3: Generating a synthetase library by using oligo dimers and mRFP as an optical controle.
Two primers, one with a randomized position, are designed to form a dimer (1), which is completed to dsDNA by the Klenow fragment. The region of the Tyr-aaRS meant to be modified by randomization is replaced by mRFP as an optical control (2). In the case oft he incorporation oft he randomized dsDNA, the mRFP is replaced and thus the incorporation is visible directly.

Anaylzing the tyrosyl tRNA/aminoacyl-synthetase library

Sequencing by Sanger

Generating a tyrosyl‑tRNA/synthetase (Tyr‑RS) library using the NNK scheme for the randomization of three positions of the binding pocket leads to a large variety of different sequences. When randomizing three codons by using the NNK scheme, there is a possibility of 32,768 different sequence variants. Considering the different apportionments of the codons coding for the same amino acid lead to 8,000 different possibilities of amino acid sequences, having influence on the structure of the binding pocket of the Tyr‑RS. Determined by the rules of combinatorics, we calculated 32,768 possible sequence variants. Following the equation 1, using the free statistical software R (R coreteam 2015) we obtained a statistically needed library size of 393,447 randomized plasmids, so that every possible sequence variant occurs at least once.

E(T)= n*Hn (1)

We achieved a library size of more than 27,672 Tyr‑RS plasmids. When analyzing our Tyr‑RS library with the sanger sequencing, an unexpected signal distribution of the nucleotides occurred: Using certain fluorophores for the labeling in sanger sequencing, the fluorescence signal of the different labeled nucleotides is not identical.

Figure 3: Chromatograms of the Tyr-RS Library by sanger sequencing.
Depicting the chromatograms of four Tyr‑RS library replicates being sequenced forward and reverse by sanger sequencing. The positions 158, 159, 162 of the Tyr‑RS are randomized by NNK scheme.

First, the different signal intensities seemed to be caused by the use of the labelling fluorophores in sanger sequencing. In general, when using four-colour labelling on the dNTPs, the signal-to-noice ratio is reduced because of the spectral overlap of the fluorescence emission. This results in shorter and less accurate reads (Middendorf et. al. , 2008). In addition to that, the excitation with one single wavelength compromises the sequencing results. That is due to the wide range of the varying absorption and fluorescence emission spectra (500‑800 nm) of the used fluorophores by a laser excited fluorescence of approximately 488 nm (Pfeufer et. al. , 2015).

Regarding the chromatograms, depicted in Figure 4, the maximal fluorescence intensity of the thymidine is approximately 75 % up to 90 % lower than the maximal fluorescence intensity of the guanine. In comparison, the maximal fluorescence signal of the guanosine shows up to 97 % of the approximate maximal fluorescence signal of the cytosine.

Comparing these tendencies with the sequence results of the modified positions (NNK), lead to the following assumption: when generating a library using the NNK scheme, the rates for the incorporation of the different nucleotides is not evenly distributed. Originally, we expected an equal distribution of guanosine and thymidine on the K position, but the fluorescence signal of the thymidine is approximately higher than the fluorescence signal of the guanosine. Despite the given data of the maximal fluorescence intensity of the four nucleotides, at this position, the fluorescence signal of the thymidine is higher than the guanosines signal. This implies a higher incorporation rate of the thymidine if the sequence is randomized with a K in this position. Analogue to this experience, the distribution of the incorporation of the four nucleotides, resulting from an N randomization on this certain position, is not equal either. In relation to the other three incorporated nucleotides, there is an approximately higher cytosine signal on this position, also implying a higher incorporation rate of cytosine when using the N randomization.

Illumina Sequencing

To attain a total number of sequence variants in our library, we used the Illumina Next Generation Sequencing (NGS) technique. The Illumina NGS is based on the binding of ssDNA fragments to adapters, presented on the surface on the flow cell. The fragments are amplified, forming double stranded DNA bridges. When being denaturated, single ssDNA fragments, anchored to the surface of the flow cell, are built, resulting in the several clusters of the same DNA fragment. When starting the process of sequencing, the primer, polymerase and four types of labelled reversible terminators are added to the flow cell. By excitation with a laser, after every incorporation, a certain nucleotide can be detected through the specific fluorescence. When repeating this cycle numerous times, the accurate sequence of bases in this fragment can be determined by aligning them.

We generated oligonucleotides containing a certain adapter sequence and a unique indice to separate our sequences from other libraries. The amplified region of our library had a maximal length of 500 bp, so the DNA fragments do not get entangled while bridge amplification, which would lead to an overlap of the different clusters. After amplification with the certain oligonucleotides, the PCR product was purified from a 1 % agarose gel. The quality of the library amplificate was controlled for the NGS by using the Agilent BioAnalyzer with High-Sensitivity DNA chip. This technology uses capillary electrophoresis for a sensitive quantification and sizing of DNA fragments to test if our library preparation matches the specifications of the Illumina MiSeq technology. The electropherogram of the Agilent BioAnalyzer High Sensitivity DNA Assay shows our amplified library fragment as the largest peak of 2,874.13 pg/µL and a molarity of 7,735.3 pmol/L with a length of 563 bp and flanked by the two markers (35 bp and 10,380 bp). The image of the gel, depicted in figure 5, shows a thick band at 550-600 bp, fading out up to the band of 700 bp, matching the slightly uneven peak of the electropherogram. It is important, that there are as less as possible larger fragments, forcing a possible overload and therefore the abruption of sequencing.

Figure 4: Electropherogram of the tyrosyl- synthetase library.
The Agilent BioAnalyzer High Sensitivity DNA Assay is used for the measurement. The library fragments are depicted as the peak in the center (563 bp), flanked by markers.

Figure 5: Gel image of the Agilent BioAnalyzer High Sensitivity DNA Assay of the tyrosyl- synthetase library.

The MiSeq sequencing showed 1,650,024 reads with the NNK motif, consisting of 30,440 different variants. Determined by the rules of combinatorics, we calculated 32,768 possible sequence variants. When analyzing the sequences for the NNN motif, we obtain 1,652,553 reads. The difference between the number of reads for the NNK motif to the NNN motif gives us 2,529 variants with a coverage of 1 and implies, that reads with a covering of 1 are misinterpreted. For this reason, we consider all 2,768 variants, showing coverage of 1, as possible false instances. When combining the possible false instances with the 30,400 different variants results in a total library size of 27,672 different sequence variants.

We identified, that reads with a coverage of minimal two result in 8,787 different peptides. 8,464 different peptides can be translated of the sequence variants with a coverage higher than two, and 8,135 with a coverage higher than three. Based on this data, our 27,672 variants composed tyrosyl tRNA/synthetase library codes for more than 8,000 different peptides. Considering that we continued the generation of the library after sequencing, nearly doubling the number of clones, we assume the tyrosyl-tRNA/synthetase library to be larger than the analyzed 27,672 different sequences and 8,000 peptides. We were not allowed to submit the complete library.

Therefore, we submitted two versions of the basis library Plasmid (BBa_K2201400, BBa_K2201411) for the generation of a own library. In addition to that, the complete library is available to all future iGEM teams, after request.

Selection

Screening the whole library for its ability to incorporate the desired ncAAs specific would be too time-consuming. We decided to create a high throughput method for the selection of the clones that incorporate the target ncAA. The selection system is based on two selection steps that have to be repeated several times. In the first step, the positive selection, all clones that incorporate amino acids in response to the amber codon survive. The second step, the negative selection, is to select for specifity of the tRNA/aminoacyl-synthetases. For both selection steps the library is cotransformed with the selection plasmid in pSB3T5 to prevent incompatibility to the library plasmid in pSB1C3. The plasmid charts of the selection plasmids are shown below as BioBricks.

The positive selection plasmid (BBa_K2201900) contains the Methanococcus jannaschii based tRNA(CUA) with an anticodon for the amber codon under the constitutive promoter proK. The essential part for the selection is the kanamycin resistance with two amber codons behind the translation start. If the tRNA/aminoacyl-synthetase mutant (encoded on a cotransformed library plasmid is able to charge the tRNA(CUA) with any amino acid) the cell could express the kanamycin resistance. Thus, these cells survive when plated out at LB agar plates with the ncAA and kanamycin.

Figure 6: Positive selection plasmid.

Figure 7): Negative selection plasmid

In the negative selection only the cells that specific incorporate the ncAA, and not any endogenous amino acid, should survive. Therefore, the negative selection plasmid (BBa_K2201901) contains a toxin for E. coli, the barnase. Two amber codons are incorporated at permissive sites of the barnase and the plasmid contains the same tRNA(CUA) as the positive selection plasmid. In contrast to the positive selection, the cells are plated out on agar not containing the ncAA. Thus only synthetases charge the tRNA(CUA) which charge the tRNA with endogenous amino acids. These cells express the barnase and die.

To prevent false positive results the positive and negative selection should be repeated at least three times.

References

Pfeufer V., Schulze M. , (2015). Laser fluorescence powers sequencing advances. BioOptica World Beese,L.S, Derbyshire V, Steitz T.A. (1993). Structure of DNA Polymerase I Kienow Fragment Bound to Duplex DNA
Middendorf L.R., Humpfrey P.G., Narayanan N., Roemer S.C.. (2008) Chapter8 Sequencing Technology. WILEY

@@ Line 18: / Line 18: @@
 					<!-- Ueberschriften -->
 					<h2> Short Summary </h2>
-					<h4> </h4>
 					<div class="article">
@@ Line 32: / Line 31: @@
 					<!-- Ueberschriften -->
 					<h2> Generating the library </h2>
-					<h4> </h4>
 					<!-- Normaler Text -->
@@ Line 138: / Line 136: @@
 						<div class="half right">
 							<div class="figure large">
-								<img class="figure image" src="https://2017.igem.org/File:T--Bielefeld-CeBiTec--Tyr-RS_Library_gel_image_Agilent.jpg">
+								<img class="figure image" src="https://static.igem.org/mediawiki/2017/8/87/T--Bielefeld-CeBiTec--Tyr-RS_Library_gel_image_Agilent.jpg">
 								<p class="figure subtitle"><b>Figure 5: Gel image of the Agilent BioAnalyzer High Sensitivity DNA Assay of the tyrosyl- synthetase library.</b> </p>
 							</div>
@@ Line 199: / Line 197: @@
 			</div>
 		</div>