|
|
Line 225: |
Line 225: |
| <div class="figure large"> | | <div class="figure large"> |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/0/0b/T--Bielefeld-CeBiTec--PCR-UBP-A-H.png"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/0/0b/T--Bielefeld-CeBiTec--PCR-UBP-A-H.png"> |
− | <p class="figure subtitle"><b>Figure 9: PCRs</b> with Titanium Taq (<b>A</b>), Go Taq G2 (<b>B</b>), Allin HiFi DNA Polymerase (<b>C</b>), innuDRY polymerase (<b>D</b>), BioMaster-HS Taq PCR polymerase (<b>E</b>), FirePol DNA polymerase (<b>F</b>), Phusion DNA polymerase (<b>G</b>) and Q5 DNA polymerase (<b>H</b>). The template is pSB1C3_RuBisCo with the inserts mutA, mutT, mutG, mutC (5 ng µL<sup>-1</sup>) and UBP_target (25 ng µL<sup>-1</sup>) after the restriction digest with <i>Eci</i>I (mutA) and <i>Sap</i>I (mutG) for 2 h and <i>Bsa</i>I (mutT) and <i>Mnl</i>I (mutC) for 15 h.</p> | + | <p class="figure subtitle"><b>Figure 8: PCRs</b> with Titanium Taq (<b>A</b>), Go Taq G2 (<b>B</b>), Allin HiFi DNA Polymerase (<b>C</b>), innuDRY polymerase (<b>D</b>), BioMaster-HS Taq PCR polymerase (<b>E</b>), FirePol DNA polymerase (<b>F</b>), Phusion DNA polymerase (<b>G</b>) and Q5 DNA polymerase (<b>H</b>). The template is pSB1C3_RuBisCo with the inserts mutA, mutT, mutG, mutC (5 ng µL<sup>-1</sup>) and UBP_target (25 ng µL<sup>-1</sup>) after the restriction digest with <i>Eci</i>I (mutA) and <i>Sap</i>I (mutG) for 2 h and <i>Bsa</i>I (mutT) and <i>Mnl</i>I (mutC) for 15 h.</p> |
| </div> | | </div> |
| | | |
Line 239: |
Line 239: |
| <thead> | | <thead> |
| <tr> | | <tr> |
− | <th style="width: auto">Position in Figure 9</th> | + | <th style="width: auto">Position in Figure 8</th> |
| <th style="width: auto">DNA polymerase</th> | | <th style="width: auto">DNA polymerase</th> |
| <th style="width: auto">Distributor</th> | | <th style="width: auto">Distributor</th> |
Line 387: |
Line 387: |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
| <br> | | <br> |
− | <b>Fig. 1: Annealed oligonucleotides used for reference sample preparation.</b> | + | <b>Figure 9: Annealed oligonucleotides used for reference sample preparation.</b> |
| Sequences at the position of interest of DNA samples used as references for nanopore sequencing. | | Sequences at the position of interest of DNA samples used as references for nanopore sequencing. |
| </p> | | </p> |
Line 402: |
Line 402: |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/4/4f/T--Bielefeld-CeBiTec--UBP_oligos_ms_UPLC.jpg"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/4/4f/T--Bielefeld-CeBiTec--UBP_oligos_ms_UPLC.jpg"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 2: UPLC and MS data from oligos containing isoguanine or 5‑methyl isocytosine.</b> | + | <b>Figure 10: UPLC and MS data from oligos containing isoguanine or 5‑methyl isocytosine.</b> |
| Oligonucleotides containing isoguanine or 5‑methyl isocytosine were synthesized by Biolegio and analysed by ultra performance liquid chromatography (UPLC) and mass spectrometry (MS). Shown above are the results from the UPLC (above) and MS (below) analysis for each of the complementary oligos containing either isoguanine (A) or 5‑methyl isocytosine (B). | | Oligonucleotides containing isoguanine or 5‑methyl isocytosine were synthesized by Biolegio and analysed by ultra performance liquid chromatography (UPLC) and mass spectrometry (MS). Shown above are the results from the UPLC (above) and MS (below) analysis for each of the complementary oligos containing either isoguanine (A) or 5‑methyl isocytosine (B). |
| </p> | | </p> |
Line 416: |
Line 416: |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/8/8d/T--Bielefeld-CeBiTec--library_prep.jpg"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/8/8d/T--Bielefeld-CeBiTec--library_prep.jpg"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 3: Library preparation for Oxford Nanopore sequencing.</b> Purification of DNA containing the unnatural base pair, after the adapter ligation step of library preparation for Oxford Nanopore sequencing. | + | <b>Figure 11: Library preparation for Oxford Nanopore sequencing.</b> Purification of DNA containing the unnatural base pair, after the adapter ligation step of library preparation for Oxford Nanopore sequencing. |
| </p> | | </p> |
| </div> | | </div> |
Line 424: |
Line 424: |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/4/42/T--Bielefeld-CeBiTec--sequencer.jpg"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/4/42/T--Bielefeld-CeBiTec--sequencer.jpg"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 4: MinIon sequencer with R9.4 flowcell.</b> The MinIon sequencer that we used in our experiments, together with a R9.4 flowcell. | + | <b>Figure 12: MinIon sequencer with R9.4 flowcell.</b> The MinIon sequencer that we used in our experiments, together with a R9.4 flowcell. |
| </p> | | </p> |
| </div> | | </div> |
Line 432: |
Line 432: |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/3/39/T--Bielefeld-CeBiTec--sequencing_run.jpg"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/3/39/T--Bielefeld-CeBiTec--sequencing_run.jpg"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 5: Status of the pore grid during sequencing.</b> While sequencing, the software MinKnow gives online feedback about the pores in the flowcell. | + | <b>Figure 13: Status of the pore grid during sequencing.</b> While sequencing, the software MinKnow gives online feedback about the pores in the flowcell. |
| </p> | | </p> |
| </div> | | </div> |
Line 453: |
Line 453: |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/8/8e/T--Bielefeld-CeBiTec--signal_traces.svg"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/8/8e/T--Bielefeld-CeBiTec--signal_traces.svg"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 6: Normalized signal traces of analyzed DNA samples.</b> | + | <b>Figure 14: Normalized signal traces of analyzed DNA samples.</b> |
| Overlayed, normalized signal traces of DNA samples containing either isoG/isoC<sup>m</sup> or any natural base at the position of interest in the analyzed sequence context. The reads displayed in these plots were selected from their respective sequencing runs by using iCG filter, using the same filter settings for all DNA samples. To remove contaminating reads from previous sequencing runs, a quantile of 0.7 of the most deviating reads was removed previous to plotting with the help of iCG model. | | Overlayed, normalized signal traces of DNA samples containing either isoG/isoC<sup>m</sup> or any natural base at the position of interest in the analyzed sequence context. The reads displayed in these plots were selected from their respective sequencing runs by using iCG filter, using the same filter settings for all DNA samples. To remove contaminating reads from previous sequencing runs, a quantile of 0.7 of the most deviating reads was removed previous to plotting with the help of iCG model. |
| </p> | | </p> |
Line 470: |
Line 470: |
| | | |
| <div class=article> | | <div class=article> |
− | Based on the data presented in Figure 5, a cluster analysis based on linear discriminant analysis was conducted using iCG model. Figure 6 shows dot-plots for the forward and reverse models, presenting the linear discriminants of the reads each respective model was created with. The direct comparison of both models reveals that the model created upon the data of the reverse strand seems to perform better in terms of classification of the sequencing reads. Except for the groups containing A and G at the position of interest, which slightly overlap with each other, all other groups are well seperated from each other. On the other hand, the linear discriminant analysis of the data of the forward strand was unable to properly separate the reads containing A, G and iC<sup>m</sup> from each other, mainly due to widely scattered reads of the iC<sup>m</sup> group. Both results coincide with the visual assessment of the signal traces in Figure 5. | + | Based on the data presented in Figure 14, a cluster analysis based on linear discriminant analysis was conducted using iCG model. Figure 15 shows dot-plots for the forward and reverse models, presenting the linear discriminants of the reads each respective model was created with. The direct comparison of both models reveals that the model created upon the data of the reverse strand seems to perform better in terms of classification of the sequencing reads. Except for the groups containing A and G at the position of interest, which slightly overlap with each other, all other groups are well seperated from each other. On the other hand, the linear discriminant analysis of the data of the forward strand was unable to properly separate the reads containing A, G and iC<sup>m</sup> from each other, mainly due to widely scattered reads of the iC<sup>m</sup> group. Both results coincide with the visual assessment of the signal traces in Figure 14. |
| </div> | | </div> |
| | | |
Line 476: |
Line 476: |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/a/a5/T--Bielefeld-CeBiTec--model_dotplots.png"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/a/a5/T--Bielefeld-CeBiTec--model_dotplots.png"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 7: Dot-plots of the linear discriminant models of the forward and reverse strand.</b> Dot-plots of the linear discriminants of the reads used for the creation of the statistical models for base prediction at the position of interest in the forward and reverse strand. The data used for the linear discriminant analysis was previously filtered by removing 70 % of reads from each group, based on their deviation from the groups median signal in the neighboring sequence context of the position of interest. | + | <b>Figure 15: Dot-plots of the linear discriminant models of the forward and reverse strand.</b> Dot-plots of the linear discriminants of the reads used for the creation of the statistical models for base prediction at the position of interest in the forward and reverse strand. The data used for the linear discriminant analysis was previously filtered by removing 70 % of reads from each group, based on their deviation from the groups median signal in the neighboring sequence context of the position of interest. |
| Lorem Ipsum. | | Lorem Ipsum. |
| </p> | | </p> |
Line 482: |
Line 482: |
| | | |
| <div class="article"> | | <div class="article"> |
− | Since a statistical model should not be tested with the very data it was created with, we prepared a new set of DNA samples to properly evaluate the performance of both models concerning the prediction of bases at the position of interest in their respective sequence context. For this purpose, we modified the RuBisCo plasmid that was used for the first sample preparation by cloning five different sequences downstream RuBisCo with standard BioBrick assembly (<a target="_blank" href="S05406"></a>, <a target="_blank" href="S05407"></a>, <a target="_blank" href="S05408"></a>, <a target="_blank" href="S05409"></a>, <a target="_blank" href="S05410"></a>). Each of these plasmids contains a 25 nt sequence that is unique, while the remaining plasmid sequence is the same. These unique sequences can be used for identification assignment of sequencing reads comparable to the Nanopore barcoding approach. Starting with these five plasmids, we prepared new DNA samples according to the same procedure explained above. After ligation, all five samples were pooled in approximately equimolar proportion and further prepared for sequencing. After sequencing and basecalling of this pooled sample, the reads were assigned to their respective group by using iCG filter with the "--barcode" argument and each plasmid's unique sequence. After filtering, 50 reads of every group were randomly selected in order to be used for evaluating the performance of the linear discriminant models with iCG predict. The results of this evaluation are summarized in Figure 7. | + | Since a statistical model should not be tested with the very data it was created with, we prepared a new set of DNA samples to properly evaluate the performance of both models concerning the prediction of bases at the position of interest in their respective sequence context. For this purpose, we modified the RuBisCo plasmid that was used for the first sample preparation by cloning five different sequences downstream RuBisCo with standard BioBrick assembly (<a target="_blank" href="S05406"></a>, <a target="_blank" href="S05407"></a>, <a target="_blank" href="S05408"></a>, <a target="_blank" href="S05409"></a>, <a target="_blank" href="S05410"></a>). Each of these plasmids contains a 25 nt sequence that is unique, while the remaining plasmid sequence is the same. These unique sequences can be used for identification assignment of sequencing reads comparable to the Nanopore barcoding approach. Starting with these five plasmids, we prepared new DNA samples according to the same procedure explained above. After ligation, all five samples were pooled in approximately equimolar proportion and further prepared for sequencing. After sequencing and basecalling of this pooled sample, the reads were assigned to their respective group by using iCG filter with the "--barcode" argument and each plasmid's unique sequence. After filtering, 50 reads of every group were randomly selected in order to be used for evaluating the performance of the linear discriminant models with iCG predict. The results of this evaluation are summarized in Figure 16. |
| </div> | | </div> |
| | | |
Line 488: |
Line 488: |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/c/cc/T--Bielefeld-CeBiTec--model_test.png"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/c/cc/T--Bielefeld-CeBiTec--model_test.png"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 8: Evaluation of the linear discriminant analysis models.</b> | + | <b>Figure 16: Evaluation of the linear discriminant analysis models.</b> |
| Evaluation results of the linear discriminant analysis models for the forward and reverse strand. (A) Linear discriminants of the test data colored in accordance with their respective base prediction. (B) Distribution of predicted bases. Based on the assumption that every read in the test data set was correctly assigned with the barcoding approach, equal portions of 20 % for each base would be ideal, corresponding to 50 reads per test data group. (C) Fidelity of base prediction, revealing which base predictions were made for the reads of each group individually. | | Evaluation results of the linear discriminant analysis models for the forward and reverse strand. (A) Linear discriminants of the test data colored in accordance with their respective base prediction. (B) Distribution of predicted bases. Based on the assumption that every read in the test data set was correctly assigned with the barcoding approach, equal portions of 20 % for each base would be ideal, corresponding to 50 reads per test data group. (C) Fidelity of base prediction, revealing which base predictions were made for the reads of each group individually. |
| </p> | | </p> |
Line 494: |
Line 494: |
| | | |
| <div class="article"> | | <div class="article"> |
− | The results in Figure 8 indicate that the linear discriminant model for the reverse strand orientation is performing better than the model for the sense strand. The base prediction fidelity is especially high for reads containing an adenine, a cytosine or an isoguanine at the position of interest. Due to the hydrolysis of isoC<sup>m</sup> to T and the tautomerisation of isoG, leading to mispairing with T, the most common mutation that leading to a loss of the unnatural base pair between isoG and isoC<sup>m</sup> is the mutation from isoG to A (Bande et al., 2015). Considering the fidelity of base prediction for both A and isoG with the reverse strand model, we conclude that this linear discriminant analysis model is well suited for the discrimination between isoG and all natural bases in the sequence context <font face="courier new">5'-ggNct-3'</font>. Therefore, we could show that the software package iCG is applicable for the analysis of experiments | + | The results in Figure 16 indicate that the linear discriminant model for the reverse strand orientation is performing better than the model for the sense strand. The base prediction fidelity is especially high for reads containing an adenine, a cytosine or an isoguanine at the position of interest. Due to the hydrolysis of isoC<sup>m</sup> to T and the tautomerisation of isoG, leading to mispairing with T, the most common mutation that leading to a loss of the unnatural base pair between isoG and isoC<sup>m</sup> is the mutation from isoG to A (Bande et al., 2015). Considering the fidelity of base prediction for both A and isoG with the reverse strand model, we conclude that this linear discriminant analysis model is well suited for the discrimination between isoG and all natural bases in the sequence context <font face="courier new">5'-ggNct-3'</font>. Therefore, we could show that the software package iCG is applicable for the analysis of experiments with unnatural bases. |
| </div> | | </div> |
| | | |
Line 506: |
Line 506: |
| <div class="bevel tr"></div> | | <div class="bevel tr"></div> |
| <div class="content"> | | <div class="content"> |
| + | |
| + | |
| + | <h3>Orthogonal Analysis with M.A.X and iCG |
| + | |
| + | <div class="article"> |
| + | In order to test the orthogonality of both methods we developed for the analysis of DNA containing unnatural bases, we performed an further PCR reaction with a template containing isoG and isoC<sup>m</sup> and analyzed the amplified DNA with both M.A.X. and iCG. The experimental setup was identical to those of our previous PCR experiments, except for the choice of primers and the reaction volume. A GoTaq PCR reaction with the template containing the unnatural bases was prepared with a total reaction volume of 250 μL. As primers, the standard iGEM sequencing primers VR and VF2 were used in the reaction. |
| + | </div> |
| | | |
| <div class="figure large"> | | <div class="figure large"> |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/0/04/T--Bielefeld-CeBiTec--PCR_MAX_iCG.png"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/0/04/T--Bielefeld-CeBiTec--PCR_MAX_iCG.png"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 5: PCR of DNA containing isoG/isoC<sup>m</sup> with GoTaq analyzed with iCG and M.A.X.</b> A DNA template containing an unnatural base pair between isoG and isoC<sup>m</sup> was analyzed in an orthogonal approach with M.A.X. and iCG. | + | <b>Figure 17: PCR of DNA containing isoG/isoC<sup>m</sup> with GoTaq analyzed with iCG and M.A.X.</b> A DNA template containing an unnatural base pair between isoG and isoC<sup>m</sup> was analyzed in an orthogonal approach with M.A.X. and iCG. (A) After PCR reaction and PCR cleanup, four separate restriction reactions were performed with the addition of either <i>Eci</i>I, <i>Bsa</i>I, <i>Sap</i>I or no restriction enzyme. In the mentioned order, the amplified DNA contains a recognition sequence for these enzymes in case of a mutation of isoG to A, T and G. The reaction without restriction enzyme was used to control if a fragment of the expected size of 761 bp was amplified. (B) After PCR cleanup, the sample was sequenced with Oxford Nanopore sequencing and subsequently analyzed with the software suite iCG. A dot-plot showing the linear discriminants of each filtered read after base prediction is shown, as well as the numerical result of the base prediction and a plot showing the normalized signal traces at the position of interest. |
| </p> | | </p> |
| </div> | | </div> |
| + | |
| + | <div class="article"> |
| + | The results of both analysis methods presented in Figure 17 reveal that the fidelity UBP replication is not perfect for this experimental setup. In the gel electrophoresis after the digestion with the enzymes of the M.A.X system, a distinct band is visible for the <i>Eci</i>I digest, indicating that mutations of isoG to A were occurring during the PCR reaction. The analysis with Nanopore sequencing and iCG reveals a similar result, showing that approximately 54 % of the analyzed reads contained isoG at the position of interest, while 39 % were predicted to have an A at this position. Therefore, both methods reveal that the mutation of isoG to A is the most frequent mutational event leading to a loss of the unnatural base pair between isoG and isoC<sup>m</sup>, which is in accordance to our expectations and respective results in literature (Switzer <i>et al.</i>, 1993, Johnson <i>et al.</i>, 2004). Interestingly, the replication fidelity seems to be considerably lower compared to our first PCR experiments with the GoTaq polymerase. Possibly, the reason for this difference is due to the longer storage time of the DNA template at 4 ℃. It was shown that d-isoCTP is quite unstable even when stored at -20 ℃ (Switzer <i>et al.</i>, 1993), possibly due to hydrolysis of isoC. Therefore, we postulate that the reduced PCR fidelity might be caused by a a reduced template integrity. |
| + | </div> |
| + | <div class="article"> |
| + | The results in Figure 17 indicate that the analysis with Nanopore sequencing and iCG as well as the analysis with M.A.X. are orthogonal methods producing the same analysis results. Arguably, it can be supposed that the results produced by Nanopore sequencing are more precise, allowing the detection of uncommon mutation events that are not detectable with M.A.X. Nevertheless, the analysis with restriction digests is much faster and cost efficient. |
| + | </div> |
| + | |
| | | |
| </div> | | </div> |