Difference between revisions of "Team:Bielefeld-CeBiTec/Results/unnatural base pair/biosynthesis"

Revision as of 21:15, 1 November 2017

Biosynthesis

Short Summary

The plant Croton tiglium is of great importance to our project due to its ability to produce isoguanosine. In order to fully understand the production pathways of this nucleoside, we aimed at reproducing the biosynthesis of C. tiglium. Therefore, we collected samples from our own plant that we got from the botanical garden in Marburg. First, we prepared RNA libraries from all plant tissues. The expression libraries generated by trinity assembly were used to identify essential enzymes of purine metabolism existing in Croton tiglium. After extracting the DNA of these proteins of interest from the library, we expressed them in Escherichia coli and purified them using the IMPACT® Kit(New England Biolabs) After estimating the concentration of the purified proteins, we proved their functionality in iso-GMP and isoguanosine formation assays. Finally, the identity of the reaction products was confirmed by liquid chromatographic (LC) as well as mass spectrometry (MS) analysis.

RNA extraction

Croton tiglium samples were kindly provided by the botanic garden of the Philipps University in Marburg

. Total RNA was extracted from frozen tissue samples of young leaves, stem, inflorescence, seeds and roots. Mortar and pestle were used to grind the material in liquid nitrogen. Spectrum Plant Total RNA Kit was used for the RNA extracting according to the suppliers’ instructions. mRNAs were enriched based on their polyA tail via oligo-dT beads. DropSense16 (trinean) was used for quality control.

Library Preparation

Enriched mRNA samples from different tissues were used for the construction of a normalized library (vertis biotechnology ag

). In parallel, tissue specific samples were submitted to fragmentation prior to reverse transcription into cDNA via ProtoScriptII (NEB) based on suppliers’ recommendations. The Illumina TruSeq Stranded mRNA Sample Preparation Guide was used for the generation of five tissue specific libraries with an average insert size of 400 bp. Those libraries represent young leaves, stem, inflorescence, seed, and root.

Sequencing

Sequencing of the normalized library was performed on two lanes of an Illumina HiSeq1500 generating about 47.4 million 2x250 nt paired-end reads. Sequencing of the tissue specific libraries was performed on a HiSeq1500 generating between 20 and 44 million 2x75 nt paired-end reads per tissue-specific library (Table 1).

Table 1: Sequenced tissue-specfic libraries

Tissue	Number of sequenced fragments
Young leaves	24,704,122
Stem	20,032,1422
Inflorescence	22,804,752
Seed	43,447,889
Root	28,007,089

Data Processing

FastQC was applied to check the quality of all sequencing data. Low quality regions and adapter fragments were removed from the reads via trimmomatic 0.36 {Bolger, 2014}. Removal of adapters was performed based on all known illumina adapter sequences with the options 2:30:10. A sliding window of the length 4 was used to clip reads once the average PHRED score dropped below 15. Reads below the length cutoff of 100 nt were discarded. Pairs with only one surviving read were dropped after trimming. About 96 % of all paired-end reads survived this process.

Transcriptome assembly

Trinity v2.4.0 {Haas, 2014} was applied with default parameters for the de novo transcriptome assembly based on all 2x250 nt paired-end reads of the normalized library. The initial assembly was followed by the quality assessment and processing steps as recommended by {Haas, 2014}. Assembly completeness was investigated by computing assemblies for subsets of the data as well as through remapping of the reads to the assembly.

Expression quantification

Reads from tissue specific data sets were mapped to the initial transcriptome assembly via STAR {Dobin, 2013} with 90 % length and 95 % identity. A customized python script was developed to generate a matching annotation file for the assembly. featureCounts {Liao, 2014} was applied to quantify the expression of all sequences in the assembly. Since most transcripts are represented by multiple contigs representing different splice variants, we decided to include multi-mapped reads. Dedicated python scripts were used for further investigation of the expression data as well as for the generation of expression heatmaps via the seaborn module

. VENN diagram generation was performed at Bioinformatics.psb.

Results

In total, 45.5 million 2x250 nt paired-end reads were assembled into the 431.8 Mbp transcriptome comprising 388,181 contigs. The high continuity of the assembled contigs can be described by the E90N50 of 2,246 bp and the E90N90 of 452 bp. The completeness check indicated a sufficient amount of sequencing data were generated.

Usage of the Data Generated by the Trinity Assembly

Trinity assembly is a method for reconstruction of transcriptomes from RNA-sequencing data. The expression library of C. tiglium generated by this method revealed a lot of interesting genes and functions associated with iso-guanosine production. After we figured out their sequences by using different databases like uniprot , we used BLAST to identify them within all the trinity sequences. Out of these, we identified some that play an interesting role in the purine pathway.

Identification of Candidate Genes

The synthesis of purine bases is a paramount aspect for the creation of bases. As already described and explained at unnatural base pairs, different reactions are needed for the biosynthesis of the final products (GMP and AMP). Moreover, there are also many reactions that are needed for the organisms general metabolism, as well as for the catalysis of precursors and intermediates, which are important for additional reactions. Further examination of the enzyme reactions of the purine metabolism allowed us to figure out some that might be of specific interest for the incorporation and biosynthesis of unnatural bases in Croton tiglium.

Firstly, there is the guanosine monophosphate synthase (GMPS), an enzyme from the class of ligases that form carbon-nitrogen-bonds with glutamine as an amido-N-donor acceptors (see KEGG for more information). It is also known as Guanosine monophosphate synthetase and is abbreviated with GuaA. GMPS is important for us since it is needed for the amination of XMP (xanthosine monophosphate) to create GMP and possibly iso-GMP in the case of Croton tiglium. Besides, GMPS can be found in many organisms apart from Croton tiglium, including Homo sapiens and E.coli. Comparing the found sequences of the GMPS with the trinity assembly allowed us to figure out two slightly different sequences for it. These sequences have the size of 314 amino acids and molecular mass of 59.46 kDa. For our project, the GMPS is of special interest as it may be able to not only catalyze the reaction of XMP to GMP but also to iso-GMP.
Another interesting enzyme from the purine metabolism is the Inosine monophosphate-dehydrogenase (IMPDH) that matched with three trinity sequences. IMPDH is an enzyme fromthe class of the Oxydoreductases, which are acting on CH-OH groups of donors with NAD+ or NADP+ as acceptors (see KEGG). The different forms of IMPDH found within the trinity assembly have a molecular mass of 53-58 kDa depending on the exact sequence and a length of approximately 500 to 550 amino acids. In the purine metabolism, IMPDH is the catalysator of the synthesis of XMP out of inosine monophosphate (IMP). For C. tiglium it means that it could possibly enable the biosynthesis of an isoform of XMP that might then even be a substrate for the production of iso-GMP.
Further on, the cytidine deaminase seemed to be of immense potential. The CDA, which belongs to the class of hydrolases acting on carbon-nitrogen bonds different from peptide bonds(see KEGG)., is usually used to deaminate cytidine to uridine. However, there is also the possibility of the reverse reaction catalyzed by CDA. For our purposes, we thought of a reaction from xanthosine to iso-GMP. The found amino acid sequence for CDA from trinity assembly is 535 amino acids long and has a molecular mass of 33.95 kDa.
Aside from these enzymes, we thought of the adenylosuccinate synthetase as an interesting aspect of the purine pathway. The ADSS belonging to the class of ligases, which are forming carbon-nitrogen bonds (see KEGG), was recreated from the trinity assembly with one sequence only. This has a molecular weight of 53.32 kDa and a size of 489 amino acids. In Croton tiglium, it is supposed to catalyze the reaction of IMP to adenylosuccinate that will then be further processed into AMP.
Finally, we focused on the enzyme xanthine dehydrogenase(XDH). The XDH will usually convert xanthine into urate that will be further processed afterwards. XDH is an enzyme from the class of oxidoreductases that is acting on CH or CH2 groups with NAD+ or NADH+ as an acceptor (see KEGG), and could even be matched with six sequences of the trinity assembly. It has a molecular mass of 64.12 kDa and a size of 587 amino acids.

Extraction of Enzyme DNA out of the cDNA Library

After we had identified all interesting genes from the purine pathway, we had to extract them from the cDNA of the tissue samples. Thus, we designed primers for all sequences that could be found within the trinity assembly of Croton tiglium. In total, we had 13 pairs of primers that we used in separate PCRs with all tissue samples. We could extract at least one gene possibility for each protein of interest from these PCRs. However, as there were two possible outcomes for the GMPS that could work differently, we codon-optimized the one we could not gain from the tissues and ordered it via gene synthesis.

Protein purification

After the extraction, we had to figure out a way to gain the expressed proteins. Therefore, we used a modification of the NEB Impact® Kit

that you can find here. The protein purification via the Impact system works with the usage of an intein tag. Impact is short for “Intein Mediated Purification with Affinity Chitin-binding Tag”. In short, the target gene sequence of the protein is linked with an intein tag that enables the protein’s purification. As a vector, we used pTXB1 that is responsible for an c-terminal fusion of the intein tag. The modified version of the strain ER2566 used additionally contains the plasmid pRARE-2. This vector is used to compensate a bad codon usage as it encodes some rare tRNAs.

Estimation of the Protein Concentration

After the protein purification, the concentrations were estimated. For that we used a modification of the Bradford Assay (Bradford, M., 1976) Roti®-Nanoquant

by Carl Roth. You can find the protocol here. Depending on the protein, we reached concentrations from 1.35 up to 5.31 grams per liter. See Table 2 for more information:

Table (2): Concentrations of the proteins, estimated with Roti® Nanoquant. Multiple values are the result of multiple protein extractions of those proteins due to their importance.

Protein name	Concentration in mg/mL
GMPS iso-form 1	4.1775 2.1507 1.4610
GMPS iso-form 2	1.3497
IMPDH form 1	4.4007
IMPDH form 2	4.1763
ADSS	4.2616
XDH	4.1302 1.9349
CDA	5,3092 1,8054 1,3881

Afterwards, we examined whether we had extracted the proteins correctly using an SDS page as well as the MALDI.

Investigation of enzyme activity

After we knew that all proteins had been extracted properly, our next step was to test their functionality. To do so, we used the plate-reader “Tecan infinite® 200” and the program “Tecan i-control, 1.10.4.0”. For all enzyme reactions, we used room temperature to meet the plants preferred growing temperatures in the original botanical garden. After we finished our measurements, we completed our final analysis using the mean value of the three replicates and calculating the standard deviation. They were then plotted into graphs which showed their development before and after the addition of the enzyme or water (Figures 2-3 as well as 5-6, also see Final Discussion). All of them showed activity. We also did over-night enzyme activity assay. However, while they did not show any significant developments after the first hour, the enzyme reactions seemed to be very fast, mainly within the first minutes. Further, we wanted to show that the reactions brought out the desired products. For this purpose, we used the HPLC (high performance liquid chromatography) “LaChrom Ultra” from VWR (find more informationhere

) in combination with the MicroToFQ mass spectrometer from Bruker) to determine the structure of the substances. The combination of these allowed us to separate the substances of the reaction mixtures, analyze their molecular weight and compare them with standards. Generally, in an HPLC measurement, substances (or sample mixtures) are pumped through a certain separation column containing a stationary phase that interacts with the analytes. The more interaction, the longer the analyte needs to flow through the complete column. The duration of this flow is measured by a detector so that conclusions about the analytes can be made. In combination with the MicroTofQ system, a mass spectrometer, not only the duration of flow through can be measured but also the molecular mass of the substances can be estimated. In general, mass spectrometers transfer the analytes into their gas form and ionize them. Afterwards, they are accelerated and transferred to the analysis system that then separates them according to their masses. Combined, these two systems can give valuable statements about the substances included in a reaction mixture. For our purposes, we used parameters for the MicroTofQ like in (Ruwe et al., 2017) with a measurement in negative mode were the masses would be measured subtracting the mass of an H atom. However, since we wanted to differentiate between different forms of substances with the same mass, we had to try additional measurement methods for the HPLC. Eventually, we used the “Zip-pHILIC” column with a length of 150 mm and a diameter of 2.1 mm from Merck. For the mobile phase, we used Ammoniumbicarbonat (pH 9.3) and Actonitril in a ratio of 27 % to 73 %. This was used in isocratic mode with a flow-through of 0.2ml/min. The injection volume was set to 2 µL of the reaction mixture from the corresponding enzyme assay. The separations took place at 40 °C. Since our main goal was to produce iso-GMP or iso-Guanosine using the purified enzymes of Croton tiglium, we focused on the main promising candidate enzymes: Both iso-forms of GMPS( See parts BBa_K220160 and BBa_K220161), and CDA( Part BBa_K220162).

CDA

First, we set up an enzyme activity assay for CDA with cytidine to ensure its activity following the protocol by Robert M. Cohen and Richard Wolfenden from 1971 that stated that the disappearance of cytidine can be measured in relation to the decrease of absorption at 282 nm. Therefore, we set up the following reaction mixture containing 50 mM TRIS-HCl buffer (pH 7.5) and 0.167 mM cytidine as a substrate.
We created six identical measurement samples with 196 µL of the mixture and measured it for about 20 min (measurement all 30 sec) with the Tecan. We then paused the measurement program to add 4 µL (6 µg) of the previously extracted CDA or 4 µL of water to three samples each. Then, we immediately continued the measurement for about an hour.
After the general activity of the CDA was tested, we set up a possible reaction with xanthosine instead of cytidine, all other components being the same. However, since there was no real literature on this reaction, we first had to figure out the absorption rate at which xanthosine can be measured. This was done using a general spectrum analysis of different mixtures, three samples each:

without xanthosine, without cda
with xanthosine, without cda
without xanthosine , with cda
with xanthosine, with cda

We then calculated the mean values out of the absorption of the three samples each and compared them in different combinations, always calculating the positive difference between them:

1+3: difference between a reaction mixture with and without cda
1+2: difference between a reaction mixture with and without xanthosine
2+4: difference between no reaction and a possible reaction

We hereby could figure out the absorption rate at which xanthosine can be measured (B) as well as ensure that the peak was independent from the cda (A). Further on, we could identify the absorbance of cda at about 254-260 nm (A and C). (Figure 1) )

Figure (1): Results of the analysis of the absorbance of xanthosine at different nanometers.
All measurements made with the Tecan at room tempreature. The difference between a mixture with and without xanthosine(red) can clearly be made up at about 282 nnm.

Afterwards we set up new activity assays, using 196 µL of the reaction mixture in six of the well plates holes. After measuring the absorbance at 282 nm, we added 4 µL of either water or the enzyme(6 µg) to three biological replicates each, continuing the (previous) measurements for about an hour. As it can be seen in Figure 2, the absorption of cytidine at 282 nm began to continuously decrease after the addition of the cytidine deaminase, whereas the absorption remained more or less constant when only water was added. With these results, the activity of our extracted cytidine deaminase could be proven.

Figure (2): Enzyme activity assay for the reaction of the cytidine deaminase with cytidine.The reaction took place at room temperatue. Three biological replicates were used each. After the addition of water, the absorbance at 282 nm stayed the same whereas it decreased after the addition of the CDA.

The reaction of the cytidine deaminase with xanthosine showed diverse results (Figure 3). Here, also a slight decrease of the xanthosine concentration could be seen, which, however, was not significant.

Figure (3): Enzyme activity assay for the reaction of the cytidine deaminase with xanthosine as a substrate.The reaction was set up at room temperatue, using three biological replicates each. After adding CDA to the reaction mixture, a slight decrease in the absorbance at 282 nm was visible. However, as there is also a very small decrease for the addition of water, no significant statement can be made.

The HPLC-MicroTofQ Measurements could only make up the xanthosine and various other substances. However, there were no significant masses and peaks for guanosine or iso-guanosine. (Figure 4)

Figure (4): HPLC-MicroTofQ measurement for the products of the reaction of CDA with xanthosine. Reaction conditions as described earlier. Even if many different masses could be found, none of these could be matched to guanosine or iso-guanosine. For these, a peak should be at about 282 g/mol..

So, with only a slight decrease of the absorbance and no viewable products in the HPLC, it seems reliable that there is only a very small amount of xanthosine converted to isoguanosine, since the reaction is not specific to the CDA and thus rare. However, supplementary tests and experiments with different reaction mixtures would be needed to further analyze it.

GMPS

We set up the reaction mixture of the two iso- forms of the GMPS following a protocol for the enzyme activity assay by Abbott, J., Newell, J., Lightcap, C. et a.l. We also regarded the original paper from 1985 that stated the absorbance at 290 nm for the given amount of XMP within the mixture. For that, we set up the following reaction mixture:

60 mM HEPES
5mM ATP
0.2mM XMP
20mM MgCL2
200mM NH4CL
0.1mM DTT
0.8mM EDTA
Filled up with ddH₂O

Due to their instability, XMP and ATP were always added freshly. After the samples were set up, we measured them with the Tecan reader for about 20 minutes at an absorbance of 290 nm. Afterwards, 4 µL of either water or 4 µL (6 µg) of the iso-forms of the GMPS (iso-form1: BBa_K220160 and iso-form 2: BBa_K220161) were each added to three samples. The measurement was continued for approximately an hour. The activity assays of iso-forms 1 and 2 both proved that the GMPS enzymes are working correctly, reducing the amount of XMP in the reaction mixture significantly. Therefore, the absorption at 290 nm decreased a lot after adding the enzyme to the solution of iso-form 1 of GMPS, whereas the initial decrease was weaker for the codon-optimzed iso-form 2. However, both decreased the amount of XMP about the same within the hour in which their reaction was measured. Thus, it can be said that both, iso-form 1 and iso-form 2 are working as expected (See Figure 5 and Figure 6 for comparison)

Figure (5): Enzyme activity assay of iso-form 2 of the guanosine monophosphate synthetases.The reaction was set up at room temperature using three biological replicates. A significant decrease in the absorption at 290 nm can be made up after the addition of the synthetized GMPS whereas the negative control with water stays at the same absorption.

Figure (6):Enzyme activity assay of iso-form1 of the guanosine monophosphate synthetases. . Three biological replicates were used. The reaction was set up at room temperature. A significant decrease in the absorption at 290 nm can be made up after the addition of the GMPS whereas the negative control with water stays at the same absorption.

As described earlier, it took some time to figure out the right requirements for the HPLC-MicroTofQ measurements, since iso-GMP and GMP have the exact same mass and are thus only separable by their structure. However, with the method chosen in the end, it was possible to identify analytes that seem to represent iso-GMP. Therefore, at first, the general substances within the reaction mix had to be figured out to ensure that only those representing GMP/iso-GMP will be included in the analyses. The general analysis of all substances included showed significant values for all the interesting substrates and products that should be within the reaction mix, including AMP, ADP and ATP, some remaining traces of XMP and of course GMP/iso-GMP (Figures 7 and 8).

Figure (7): HPLC-MicroTofQ measurement for the substances within the reaction mixture of the fully extracted GMPS. Reaction conditions as described earlier. Next to the substrates, ATP and XMP, also resulting substances like AMP and GMP can be found.

Figure (8): HPLC-MicroTofQ measurement for the substances within the reaction mxture of the GMPS with the synthetized sequence. Reaction conditions as described earlier. Next to the substrates, ATP and XMP, also resulting substances like AMP and GMP can be found.

We then compared the resulting form of GMP with a GMP-standard (10^-5 diluted solution) and the exact measurements of the HPLC. For both, iso-form 2 and iso-form 1 of GMPS the peaks of the substance’s flow-through found at the molecular mass of GMP and iso-GMP (approximately 363.22 g/mol, in the graph at approximately 362 g/mol because of the missing H due to the measurement method) were significantly shifted to the right compared to the standard. Thus, the form of GMP that is created with the enzyme reactions of the two iso-forms of GMPS and the gene synthesis has to be another form of GMP, most likely iso-GMP. (Figure 9)

Figure (9): HPLC-MicroTofQ measurement comparing the GMP standard and the reaction products’ flow-through. In red the product of the gene synthesis. In blue, the one found for iso-form 1 of GMPS, in green the standard. Even though the standard as well as the mixtures contained compounds that have the same molecular mass, they show different behaviors on the HPLC. The ordinary GMP was significantly faster than the one generated in the enzyme reactions. Thus, the form of GMP that results from the reactions is likely to be iso-GMP.

In conclusion, we did not only figure out the synthesis pathways in Croton tiglium but could even recreate a part of it, showing that the enzymes expressed in Croton tiglium are more likely to generate a different form of GMP (presumably iso-GMP).

References

“Cytidine Deaminase from E.coli – Purificarion Properties and Inhibiton by the potential transition state analog 3,4,5,6-tetrahydrouridine” by Robert M. Cohen and Richard Wolfenden, published in “The Journal of Biological Chemistry” on December 25, 1971
“The Effects of Removing the GAT Domain from E.coli GMP Synthetase“ (Abbott, J., Newell, J., Lightcap, C. et al. Protein J (2006) 25: 483. https://doi.org/10.1007/s10930-006-9032-5)
“GMP Synthetase " by Howard Zalkin, https://doi.org/10.1016/S0076-6879(85)13037-5)

@@ Line 163: / Line 163: @@
 		<!-- Normaler Text -->
 		<article>
-Reads from tissue specific data sets were mapped to the initial transcriptome assembly via STAR {Dobin, 2013} with 90&nbsp;% length and 95&nbsp;% identity. A customized python script was developed to generate a matching annotation file for the assembly. featureCounts {Liao, 2014} was applied to quantify the expression of all sequences in the assembly. Since most transcripts are represented by multiple contigs representing different splice variants, we decided to include multi-mapped reads. Dedicated python scripts were used for further investigation of the expression data as well as for the generation of expression heatmaps via the <a href="http://seaborn.pydata.org/">seaborn module</a>. VENN diagram generation was performed at <a href="ttp://bioinformatics.psb.ugent.be/webtools/Venn/">Bioinformatics.psb</a>h.
+Reads from tissue specific data sets were mapped to the initial transcriptome assembly via STAR {Dobin, 2013} with 90&nbsp;% length and 95&nbsp;% identity. A customized python script was developed to generate a matching annotation file for the assembly. featureCounts {Liao, 2014} was applied to quantify the expression of all sequences in the assembly. Since most transcripts are represented by multiple contigs representing different splice variants, we decided to include multi-mapped reads. Dedicated python scripts were used for further investigation of the expression data as well as for the generation of expression heatmaps via the <a href="http://seaborn.pydata.org/">seaborn module</a>. VENN diagram generation was performed at <a href="http://bioinformatics.psb.ugent.be/webtools/Venn/">Bioinformatics.psb</a>.
 </article>
@@ Line 178: / Line 178: @@
 		<article>
 In total, 45.5 million 2x250&nbsp;nt paired-end reads were assembled into the 431.8&nbsp;Mbp transcriptome comprising 388,181 contigs. The high continuity of the assembled contigs can be described by the E90N50 of 2,246&nbsp;bp and the E90N90 of 452&nbsp;bp. The completeness check indicated a sufficient amount of sequencing data were generated.
-BUSCO analysis revealed the presence of 94&nbsp;% complete BUSCOs. In addition, 3&nbsp;% are present in fragmented form and only 3&nbsp;% are missing in this transcriptome assembly.
+<a href="http://busco.ezlab.org/>BUSCO analysis</a> revealed the presence of 94&nbsp;% complete BUSCOs. In addition, 3&nbsp;% are present in fragmented form and only 3&nbsp;% are missing in this transcriptome assembly.
 </article>