|
|
(5 intermediate revisions by 3 users not shown) |
Line 18: |
Line 18: |
| <div class="content"> | | <div class="content"> |
| | | |
− | <h3>Background: Detection of Unnatural Bases in DNA</h3> | + | <h3>Short summary</h3> |
| | | |
| <div class="article"> | | <div class="article"> |
− | When working with unnatural bases, one of the major challenges is the detection of unnatural base pairs (UBPs) in DNA. For the analysis of UBP retention <i>in vivo</i>, <i>in vitro</i> replication and PCR experiments, it is mandatory to have a reliable method for UBP detection. In most cases, scientists working with unnatural bases have to develop their own methods specifically suitable for the detection of the very unnatural bases they are working with. Unfortunately, these methods often come along with unfavorable circumstances. | + | For our experiments, we developed two new methods which are potentially applicable for the analysis of experiments with most unnatural bases: The restriction enzyme based Mutational Analysis Xpolorer (M.A.X) and Oxford Nanopore sequencing of unnatural bases with the help of our software iCG. Both are comparably cost efficient methods that have the major advantage of enabling the direct analysis of mutational events in addition to the mere detection of unnatural bases. In our experiments, we showed that both methods are applicable for the analysis of experiments with the unnatural base pait between isoG and isoC<sup>m</sup>. Additionally, we analyzed the replication efficiency of this unnatural base pair with the help of both methods, showing that they produce similar results by orthogonal approaches. |
| </div> | | </div> |
− | <br> | + | |
| + | </div> |
| + | <div class="bevel bl"></div> |
| + | </div> |
| + | |
| + | |
| + | <div class="contentbox"> |
| + | <div class="bevel tr"></div> |
| + | <div class="content"> |
| + | |
| + | <h3>Background: Detection of Unnatural Bases in DNA</h3> |
| + | |
| <div class="article"> | | <div class="article"> |
− | One method commonly applied is the usage of molecular beacons, as described by Johnson <i>et al.</i>, 2004 for the detection if isoG and isoC<sup>m</sup> in PCR experiments. This is a quite circumstantial and expensive method, as for every ssDNA sample, an individual fluorescence labeled, specific oligonucleotide containing the unnatural nucleotide complementary to the one inverstigated is needed. Additionally, the influence of the analyzed unnatural bases on the annealing temperature has to be investigated previously, to prevent unspecific hybridization from influencing the analysis results. A different method used primarily for UBP retention analysis is the biotin shift assay (Zhang <i>et al.</i>, 2017). A DNA sample analyzed by thid approach is used as a template in a PCR reaction with biotin-labeled nucleotide triphosphates of one of the unnatural bases. Through specific interaction with Spreptavidin after amplification, molecules containing these labeled nucleotides are shifted upwards in comparison to DNA molecules of the same length in subsequent PAGE analysis. The use of specially labeled variants of the analyzed unnatural bases is generally not recommended, as even small structural changes might have a great impact on the interaction with other biomolecules and could lead to falsified experiment results. In addition, the necessity of a PCR amplification for UBP detection is disadvantageous too, as the proportion of DNA molecules with sustained and lost UBPs might be greatly impacted, particularly - but not exclusively - due to influences of the unnatural nucleoside triphosphates on the PCR reaction. | + | When working with unnatural bases, one of the major challenges is the detection of unnatural base pairs (UBPs) in the DNA. For the analysis of UBP retention <i>in vivo</i>, <i>in vitro</i> replication and PCR experiments, it is mandatory to have a reliable method for UBP detection. In most cases, scientists working with unnatural bases have to develop their own methods specifically suitable for the detection of the very unnatural bases they are working with. Unfortunately, these methods often come along with unfavorable circumstances. |
| </div> | | </div> |
| <br> | | <br> |
| <div class="article"> | | <div class="article"> |
− | For our experiments, we developed two new methods which are potentially applicable for the analysis of experiments with most unnatural bases: The Mutational Analysis Xpolorer (M.A.X) and Oxford Nanopore sequencing of unnatural bases with iCG. Both are comparably cost efficient methods that have the major advantage of enabling the direct analysis of mutational events in addition to the mere detection of unnatural bases. In our experiments, we showed that both methods are applicable for the analysis of experiments with the unnatural bases isoG and isoC<sup>m</sup>. | + | One method commonly applied is the usage of molecular beacons, as described by Johnson <i>et al.</i>, 2004 for the detection if isoG and isoC<sup>m</sup> in PCR experiments. This is a quite circumstantial and expensive method, as for every ssDNA sample, an individual fluorescence labeled, specific oligonucleotide containing the unnatural nucleotide complementary to the one which should be investigated is needed. Additionally, the influence of the analyzed unnatural bases on the annealing temperature has to be investigated previously, to prevent unspecific hybridization from influencing the analysis results. A different method used primarily for UBP retention analysis is the biotin shift assay (Zhang <i>et al.</i>, 2017). In this approach, a DNA sample is used as a template in a PCR reaction with biotin-labeled nucleotide triphosphates of one of the unnatural bases. Through specific interaction with streptavidine after amplification, molecules containing these labeled nucleotides are shifted upwards in comparison to DNA molecules of the same length in subsequent PAGE analysis. The use of specially labeled variants of the analyzed unnatural bases is generally not recommended, as even small structural changes might have a great impact on the interaction with other biomolecules and could lead to falsified experiment results. In addition, the necessity of a PCR amplification for UBP detection is disadvantageous too, as the proportion of DNA molecules with sustained and lost UBPs might be greatly impacted, particularly - but not exclusively - due to influences of the unnatural nucleoside triphosphates on the PCR reaction. |
| </div> | | </div> |
| | | |
Line 42: |
Line 53: |
| | | |
| <!-- Ueberschriften --> | | <!-- Ueberschriften --> |
− | | + | <span class="anchor-jump" id="MAX"></span> |
− | <h3> Mutation Analysis Xplorer – Results </h3> | + | <div class="section"></div> |
| + | <h3> Analysis of DNA containing isoG and isoC<sup>m</sup> with the Mutation Analysis Xplorer</h3> |
| | | |
| <h4>Primer annealing</h4> | | <h4>Primer annealing</h4> |
Line 196: |
Line 208: |
| | | |
| <!-- Ueberschriften --> | | <!-- Ueberschriften --> |
− | | + | <span class="anchor-jump" id="UBPPCR"></span> |
| + | <div class="section"></div> |
| <h3> PCR with UBPs </h3> | | <h3> PCR with UBPs </h3> |
| | | |
Line 225: |
Line 238: |
| <div class="figure large"> | | <div class="figure large"> |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/0/0b/T--Bielefeld-CeBiTec--PCR-UBP-A-H.png"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/0/0b/T--Bielefeld-CeBiTec--PCR-UBP-A-H.png"> |
− | <p class="figure subtitle"><b>Figure 9: PCRs</b> with Titanium Taq (<b>A</b>), Go Taq G2 (<b>B</b>), Allin HiFi DNA Polymerase (<b>C</b>), innuDRY polymerase (<b>D</b>), BioMaster-HS Taq PCR polymerase (<b>E</b>), FirePol DNA polymerase (<b>F</b>), Phusion DNA polymerase (<b>G</b>) and Q5 DNA polymerase (<b>H</b>). The template is pSB1C3_RuBisCo with the inserts mutA, mutT, mutG, mutC (5 ng µL<sup>-1</sup>) and UBP_target (25 ng µL<sup>-1</sup>) after the restriction digest with <i>Eci</i>I (mutA) and <i>Sap</i>I (mutG) for 2 h and <i>Bsa</i>I (mutT) and <i>Mnl</i>I (mutC) for 15 h.</p> | + | <p class="figure subtitle"><b>Figure 8: PCRs</b> with Titanium Taq (<b>A</b>), Go Taq G2 (<b>B</b>), Allin HiFi DNA Polymerase (<b>C</b>), innuDRY polymerase (<b>D</b>), BioMaster-HS Taq PCR polymerase (<b>E</b>), FirePol DNA polymerase (<b>F</b>), Phusion DNA polymerase (<b>G</b>) and Q5 DNA polymerase (<b>H</b>). The template is pSB1C3_RuBisCo with the inserts mutA, mutT, mutG, mutC (5 ng µL<sup>-1</sup>) and UBP_target (25 ng µL<sup>-1</sup>) after the restriction digest with <i>Eci</i>I (mutA) and <i>Sap</i>I (mutG) for 2 h and <i>Bsa</i>I (mutT) and <i>Mnl</i>I (mutC) for 15 h.</p> |
| </div> | | </div> |
| | | |
Line 239: |
Line 252: |
| <thead> | | <thead> |
| <tr> | | <tr> |
− | <th style="width: auto">Position in Figure 9</th> | + | <th style="width: auto">Position in Figure 8</th> |
| <th style="width: auto">DNA polymerase</th> | | <th style="width: auto">DNA polymerase</th> |
| <th style="width: auto">Distributor</th> | | <th style="width: auto">Distributor</th> |
Line 363: |
Line 376: |
| <div class="bevel tr"></div> | | <div class="bevel tr"></div> |
| <div class="content"> | | <div class="content"> |
− | | + | <span class="anchor-jump" id="ONSseq"></span> |
| + | <div class="section"></div> |
| <h3>Isoguanine & 5-methyl isocytosine in Nanopore Sequencing</h3> | | <h3>Isoguanine & 5-methyl isocytosine in Nanopore Sequencing</h3> |
| <h4>Nanopore Sequencing</h4> | | <h4>Nanopore Sequencing</h4> |
| <div class="article"> | | <div class="article"> |
− | Oxford Nanopore Technology's (ONT) sequencing technology offers a great potential as a tool for the detection of unnatural bases in DNA. In ONT sequencing, protein nanopores are distributed inside a synthetic membrane of high electrical resistance. When applying an electrical field across this membrane, an ionic current passes through each nanopore which is being measured and recorded. If a biomolecule, such as proteins, RNA or DNA are located inside the Nanopore, the ionic current is influenced. These characteristic changes can be used to identify which molecule is passing through the nanopore. (Feng <i>et al.</i>, 2015) This way, an algorithm called the "basecaller" is able to predict the nucleotide sequence of a single stranded DNA or RNA molecule based on the raw data that is recorded when it is pulled through a nanopore. Since the commercial availability of the portable sequencer MinION in 2015, strong improvements have been made in terms of increasing the bascalling accuracy. Even though the error rate is still high compared to other sequencing techniques, the advantage of having long reads of several kilobases is often preferential regarding sequencing of DNA containing repetitive sequences or mobile genetic elements like transposable elements (Debladis <i>et al.</i>, 2017) More recently, efforts have been made towards the analysis of epigenetic information based on the identification of modified bases in nucleic acids with nanopore sequencing. For example, methylated cytosine was shown to be distinguishable from unmodified cytosine by training a hidden Markov model (Simpson <i>et al.</i>, 2017). | + | <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Partners">Oxford Nanopore Technology's (ONT)</a> sequencing technology offers a great potential as a tool for the detection of unnatural bases in DNA. In ONT sequencing, protein nanopores are distributed inside a synthetic membrane of high electrical resistance. When applying an electrical field across this membrane, an ionic current passes through each nanopore which is being measured and recorded. If a biomolecule, such as proteins, RNA or DNA are located inside the Nanopore, the ionic current is influenced. These characteristic changes can be measured and used to identify which molecule is passing through the nanopore. (Feng <i>et al.</i>, 2015) This way, an algorithm called the "basecaller" is able to predict the nucleotide sequence of a single stranded DNA or RNA molecule based on the raw data that is recorded when it is pulled through the nanopore. Since the commercial availability of the portable sequencer MinION in 2015 (Check Hayden, 2015), strong improvements have been made in terms of increasing the bascalling accuracy. Even though the error rate is still high compared to other sequencing techniques, the advantage of having long reads of several kilobases is often preferential regarding sequencing of DNA containing repetitive sequences or mobile genetic elements like transposable elements (Debladis <i>et al.</i>, 2017) More recently, efforts have been made towards the analysis of epigenetic information based on the identification of modified bases in nucleic acids with nanopore sequencing. For example, methylated cytosine was shown to be distinguishable from unmodified cytosine by training a hidden Markov model (Simpson <i>et al.</i>, 2017). |
| </div> | | </div> |
| <br> | | <br> |
| <div class="article"> | | <div class="article"> |
− | Compared to other sequencing technologies, nanopore sequencing offers several advantages regarding the detection of unnatural bases. Most importantly, no PCR amplification of the DNA sample is needed in the process of library preparation. This way, no information gets lost prior to sequencing as a result of a potentially lower PCR amplification fidelity of the unnatural base pair. Another big advantage is that no additional chemistry is needed in the process of sequencing. Other sequencing technologies such as 454, Sanger, Illumina and PacBio are based on polymerases that synthesize a DNA strand complement to a template being sequenced. When a specially labeled nucleotide is incorporated, a detectable signal is emitted. This is problematic regarding sequencing of DNA containing unnatural bases, as additional labeled nucleotides would be needed for a continuous strand synthesis and to produce a unique signal for the unnatural bases. Considering the development costs for this new chemistry, the necessary process adaptations and increased complexity of data analysis, the sequencing of orthogonal unnatural bases is unlikely to be feasible with these technologies. In contrast, nanopore sequencing omits the necessity for additional chemistry and it is unlikely that sequencing will be interrupted by unnatural bases passing through the nanopore. On top of that, Nanopore sequencing was shown to be applicable for direct sequencing of RNA, without prior transcription into cDNA (Garalde et al., 2016). Therefore, it promises to be suitable for transcription studies involving unnatural bases too. | + | Compared to other sequencing technologies, nanopore sequencing offers several advantages regarding the detection of unnatural bases. Most importantly, no PCR amplification of the DNA sample is needed in the process of library preparation. This way, no information gets lost prior to sequencing due to potentially lower PCR amplification fidelity of the unnatural base pair. Another big advantage is that no additional chemistry is needed in the process of sequencing. Other sequencing technologies such as 454, Sanger, Illumina and PacBio are based on polymerases that synthesize a DNA strand complement to a template being sequenced. When a specially labeled nucleotide is incorporated, a detectable signal is emitted. This is problematic regarding sequencing of DNA containing unnatural bases, as additional labeled nucleotides would be needed for a continuous strand synthesis and to produce a unique signal for the unnatural bases. Considering the development costs for this new chemistry, the necessary process adaptations and increased complexity of data analysis, the sequencing of orthogonal unnatural bases is unlikely to be feasible with these technologies. In contrast, nanopore sequencing omits the necessity for additional chemistry and it is unlikely that sequencing will be interrupted by unnatural bases passing through the nanopore. On top of that, Nanopore sequencing was shown to be applicable for direct sequencing of RNA, without prior transcription into cDNA (Garalde <i>et al.</i>, 2016). Therefore, it promises to be suitable for transcription studies involving unnatural bases too. |
| </div> | | </div> |
| <br> | | <br> |
| <div class="article"> | | <div class="article"> |
− | We aim to examine if Oxford Nanopore sequencing is suitable for sequencing DNA containing unnatural bases. Therefore, we sequenced different DNA samples containing either the unnatural nucleotides isoguanosine and 5‑methyl isocytidine or any natural bases in the same sequence context to see if the output signal differs significantly between these groups. The data processing and evaluation was performed with the help of our own <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software#iCG">software iCG</a>, that we developed specifically for analyzing Nanopore sequencing data of DNA containing unnatural bases. Our aim is to create a linear discriminant analysis model that is able to discriminate between isoG/isoC<sup>m</sup> and natural bases in the given neighboring sequence context of two bases upstream and two bases downstream of the position of interest. For a detailed description of how the software works, please refer to our <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software">Software</a> page. | + | We aimed to examine if Oxford Nanopore sequencing is suitable for sequencing DNA containing unnatural bases. Therefore, we sequenced different DNA samples containing either the unnatural bases isoG and isoC<sup>m</sup> or any natural bases in the same sequence context to see if the output signal differs significantly between these groups. The data processing and evaluation was performed with the help of our own <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software#iCG">software iCG</a>, that we developed specifically for analyzing Nanopore sequencing data of DNA containing unnatural bases. Our aim is to create a linear discriminant analysis model that is able to discriminate between isoG/isoC<sup>m</sup> and natural bases in the given neighboring sequence context of two bases upstream and two bases downstream of the position of interest. For a detailed description of how the software works, please refer to our <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software">Software</a> page. |
| </div> | | </div> |
| | | |
Line 381: |
Line 395: |
| <h4>Reference Sample Preparation & Sequencing</h4> | | <h4>Reference Sample Preparation & Sequencing</h4> |
| <div class="article"> | | <div class="article"> |
− | In order to examine if the unnatural bases isoG and isoC<sup>m</sup> are differentiable from the natural bases through nanopore sequencing, five different DNA samples were prepared that differed only at a single sequence position, containing either an unnatural base or one of the four natural bases at this position of interest. For our experiments, we started by sequencing isoC<sup>m</sup> in the sequence context <font face="courier new">5'‑AG\iC<sup>m</sup>\CC‑3'</font> and, on the reverse strand, isoG in the sequence context <font face="courier new">5'‑GG\iG<sup>\CT‑3'</font>. For this purpose, we constructed five reference DNA samples with the following sequences: | + | In order to examine if the unnatural bases isoG and isoC<sup>m</sup> are differentiable from the natural bases through nanopore sequencing, five different DNA samples were prepared that differed only at a single sequence position, containing either an unnatural base or one of the four natural bases at this position of interest. For our experiments, we started by sequencing isoC<sup>m</sup> in the sequence context <font face="courier new">5'‑AG\iC<sup>m</sup>\CC‑3'</font> and, on the reverse strand, isoG in the sequence context <font face="courier new">5'‑GG\iG\CT‑3'</font>. For this purpose, we constructed five reference DNA samples with the following sequences: |
| </div> | | </div> |
| <div class="figure seventy"> | | <div class="figure seventy"> |
Line 387: |
Line 401: |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
| <br> | | <br> |
− | <b>Fig. 1: Annealed oligonucleotides used for reference sample preparation.</b> | + | <b>Figure 9: Annealed oligonucleotides used for reference sample preparation.</b> |
| Sequences at the position of interest of DNA samples used as references for nanopore sequencing. | | Sequences at the position of interest of DNA samples used as references for nanopore sequencing. |
| </p> | | </p> |
| </div> | | </div> |
| <div class="article"> | | <div class="article"> |
− | Each reference DNA sample was prepared starting from a pair of complementary synthetic oligonucleotides. The oligos containing isoguanine or 5‑methyl isocytosine were synthesized by <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Partners">Biolegio</a>, including subsequent purification through polyacrylamid gel electrophoresis (PAGE). Mass spectrometry and ultra performance liquid chromatography data (Figure 1) provided by Biolegio indicate that the concentrations of unmodified side products are below detection limits. In general, the manufacturer specifies the purity of PAGE purified oligonuleotides containing modified bases to be greater than 95 %. The oligonucleotides containing exclusively natural bases were ordered from metabion and were purified by desalting. | + | Each reference DNA sample was prepared starting from a pair of complementary synthetic oligonucleotides. The oligos containing isoguanine or 5‑methyl isocytosine were synthesized by <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Partners">Biolegio</a>, including subsequent purification through polyacrylamid gel electrophoresis (PAGE). Mass spectrometry and ultra performance liquid chromatography data (Figure 1) provided by Biolegio indicate that the concentrations of unmodified side products are below detection limits. In general, the manufacturer specifies the purity of PAGE purified oligonuleotides containing modified bases to be greater than 95 %. The oligonucleotides containing exclusively natural bases were ordered from <a target="_blank" href="http://www.metabion.com/">metabion</a> and were purified by desalting. |
| </div> | | </div> |
| <br> | | <br> |
Line 402: |
Line 416: |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/4/4f/T--Bielefeld-CeBiTec--UBP_oligos_ms_UPLC.jpg"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/4/4f/T--Bielefeld-CeBiTec--UBP_oligos_ms_UPLC.jpg"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 2: UPLC and MS data from oligos containing isoguanine or 5‑methyl isocytosine.</b> | + | <b>Figure 10: UPLC and MS data from oligos containing isoguanine or 5‑methyl isocytosine.</b> |
− | Oligonucleotides containing isoguanine or 5‑methyl isocytosine were synthesized by Biolegio and analysed by ultra performance liquid chromatography (UPLC) and mass spectrometry (MS). Shown above are the results from the UPLC (above) and MS (below) analysis for each of the complementary oligos containing either isoguanine (A) or 5‑methyl isocytosine (B). | + | Oligonucleotides containing isoguanine or 5‑methyl isocytosine were synthesized and analysed by Biolegio by ultra performance liquid chromatography (UPLC) and mass spectrometry (MS). Shown above are the results from the UPLC (above) and MS (below) analysis for each of the complementary oligos containing either isoguanine (A) or 5‑methyl isocytosine (B). |
| </p> | | </p> |
| </div> | | </div> |
| | | |
| <div class="article"> | | <div class="article"> |
− | For each DNA sample, a complementary pair of oligonucleotides was <a target="_blank" href="https://static.igem.org/mediawiki/2017/d/d1/T--Bielefeld-CeBiTec--protocol_ssDNA_annealing.pdf">annealed</a> and ligated into a plasmid backbone (<a target="_blank" href="http://parts.igem.org/Part:BBa_K1465202">BBa_K1465202</a>) previously linearized by <i>Xba</i>I and <i>Bmt</i>I. For this purpose, a leading <i>Bmt</i>I and a tailing <i>Spe</i>I recognition site were included into the oligonucleotide sequences. After ligation, re-ligated backbone was linearized by digestion with <i>Xba</i>I. After consecutive digestion of double and single stranded linear DNA fragments with lambda exonuclease and E. coli exonuclease I, the DNA samples were linearized through <i>Eco</i>RV digestion and purified for sequencing library preparation. An individual library was prepared for each DNA sample, according to the 1D Library Protocol for SQK-LSK108, starting from the end repair step. | + | For each DNA sample, a complementary pair of oligonucleotides was <a target="_blank" href="https://static.igem.org/mediawiki/2017/d/d1/T--Bielefeld-CeBiTec--protocol_ssDNA_annealing.pdf">annealed</a> and ligated into a plasmid backbone (<a target="_blank" href="http://parts.igem.org/Part:BBa_K1465202">BBa_K1465202</a>) previously linearized by <i>Xba</i>I and <i>Bmt</i>I. For this purpose, a leading <i>Bmt</i>I and a tailing <i>Spe</i>I recognition site were included into the oligonucleotide sequences. After ligation, re-ligated backbone was linearized by digestion with <i>Xba</i>I. After consecutive digestion of double and single stranded linear DNA fragments with lambda exonuclease and <i>E. coli</i> exonuclease I, the DNA samples were linearized through <i>Eco</i>RV digestion and purified for sequencing library preparation. An individual library was prepared for each DNA sample, according to the 1D Library Protocol for SQK-LSK108, starting from the end repair step. |
| </div> | | </div> |
| | | |
Line 416: |
Line 430: |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/8/8d/T--Bielefeld-CeBiTec--library_prep.jpg"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/8/8d/T--Bielefeld-CeBiTec--library_prep.jpg"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 3: Library preparation for Oxford Nanopore sequencing.</b> Purification of DNA containing the unnatural base pair, after the adapter ligation step of library preparation for Oxford Nanopore sequencing. | + | <b>Figure 11: Library preparation for Oxford Nanopore sequencing.</b> Purification of DNA containing the unnatural base pair, after the adapter ligation step of library preparation for Oxford Nanopore sequencing. |
| </p> | | </p> |
| </div> | | </div> |
Line 424: |
Line 438: |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/4/42/T--Bielefeld-CeBiTec--sequencer.jpg"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/4/42/T--Bielefeld-CeBiTec--sequencer.jpg"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 4: MinIon sequencer with R9.4 flowcell.</b> The MinIon sequencer that we used in our experiments, together with a R9.4 flowcell. | + | <b>Figure 12: MinIon sequencer with R9.4 flowcell.</b> The MinIon sequencer that we used in our experiments, together with a R9.4 flowcell. |
| </p> | | </p> |
| </div> | | </div> |
Line 432: |
Line 446: |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/3/39/T--Bielefeld-CeBiTec--sequencing_run.jpg"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/3/39/T--Bielefeld-CeBiTec--sequencing_run.jpg"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 5: Status of the pore grid during sequencing.</b> While sequencing, the software MinKnow gives online feedback about the pores in the flowcell. | + | <b>Figure 13: Status of the pore grid during sequencing.</b> While sequencing, the software MinKnow gives online feedback about the pores in the flowcell. |
| </p> | | </p> |
| </div> | | </div> |
Line 447: |
Line 461: |
| </div> | | </div> |
| <div class="article"> | | <div class="article"> |
− | In the first step, the reads were filtered by iCG filter in order to identify reads that contain the region of interest and have a high basecalling quality. Regarding the parameters minimum length, maximum length and minimum mean quality qscore, the the default argument settings of iCG filter were used for filtering. Of the remaining reads, only those containing the neighboring sequence context of 15 bases upstream and downstream of the POI were selected, without considering the close sequence context (blur region) of 3 ±1 bases around the POI, where influences of the unnatural bases may lead to unpredictable behavior of the basecaller. The matching reads were allowed to contain a maximum of 2 mismatches, including indels. The maximum deviation in length was set to 1 base and reads containing the region of interest multiple times were rejected. Additionally, the selected reads were further filtered for a minimum mean quality score of 14 in this restricted sequence context and sorted by their stand orientation. For further information about, please read more about iCG on our <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software#iCG">Software</a> page. | + | In the first step, the reads were filtered by iCG filter in order to identify reads that contain the region of interest and have a high basecalling quality. Regarding the parameters minimum length, maximum length and minimum mean Phred qscore, the default argument settings of iCG filter were used for filtering. Of the remaining reads, only those containing the neighboring sequence context of 15 bases upstream and downstream of the POI were selected, without considering the close sequence context (blur region) of 3 ±1 bases around the POI, where influences of the unnatural bases may lead to unpredictable behavior of the basecaller. The matching reads were tolerated to contain a maximum of 2 mismatches, including indels. The maximum deviation in length was set to 1 base and reads containing the region of interest multiple times were rejected. Additionally, the selected reads were further filtered for a minimum mean quality score of 14 in this restricted sequence context and sorted by their stand orientation. For further information about, please read more about iCG on our <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software#iCG">Software</a> page. |
| </div> | | </div> |
| | | |
Line 453: |
Line 467: |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/8/8e/T--Bielefeld-CeBiTec--signal_traces.svg"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/8/8e/T--Bielefeld-CeBiTec--signal_traces.svg"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 6: Normalized signal traces of analyzed DNA samples.</b> | + | <b>Figure 14: Normalized signal traces of analyzed DNA samples.</b> |
− | Overlayed, normalized signal traces of DNA samples containing either isoG/isoC<sup>m</sup> or any natural base at the position of interest in the analyzed sequence context. The reads displayed in these plots were selected from their respective sequencing runs by using iCG filter, using the same filter settings for all DNA samples. To remove contaminating reads from previous sequencing runs, a quantile of 0.7 of the most deviating reads was removed previous to plotting with the help of iCG model. | + | Overlayed, normalized signal traces of DNA samples containing either isoG/isoC<sup>m</sup> or any natural base at the position of interest in the analyzed sequence context. The reads displayed in these plots were selected from their respective sequencing runs by using iCG filter, using the same filter settings for all DNA samples. To remove contaminating reads from previous sequencing runs, a quantile of 0.7 of the most deviating reads was removed previous to plotting with the help of <i>iCG model</i>. |
| </p> | | </p> |
| </div> | | </div> |
| | | |
| <div class="article"> | | <div class="article"> |
− | Afterwards, iCG model was used to create linear discriminant models based on the filtered groups of template reads gathered by iCG filter. Different setting for the amount of removed, deviating reads were tested. Figure 5 shows plots of the Region of interest for both the forward and the reverse strand and all five template groups, with a quantile of 0.7 removed reads. For both strand orientations, there is a distinct difference in the mean, normalized signal trace detectable comparing the sequences containing an unnatural base with those containing a natural base at the position of interest. | + | Afterwards, <i>iCG model</i> was used to create linear discriminant models based on the filtered groups of template reads gathered by iCG filter. Different setting for the amount of removed, deviating reads were tested. Figure 5 shows plots of the Region of interest for both the forward and the reverse strand and all five template groups, with a quantile of 0.7 removed reads. For both strand orientations, there is a distinct difference in the mean, normalized signal trace detectable comparing the sequences containing an unnatural base with those containing a natural base at the position of interest. |
| </div> | | </div> |
| <br> | | <br> |
Line 470: |
Line 484: |
| | | |
| <div class=article> | | <div class=article> |
− | Based on the data presented in Figure 5, a cluster analysis based on linear discriminant analysis was conducted using iCG model. Figure 6 shows dot-plots for the forward and reverse models, presenting the linear discriminants of the reads each respective model was created with. The direct comparison of both models reveals that the model created upon the data of the reverse strand seems to perform better in terms of classification of the sequencing reads. Except for the groups containing A and G at the position of interest, which slightly overlap with each other, all other groups are well seperated from each other. On the other hand, the linear discriminant analysis of the data of the forward strand was unable to properly separate the reads containing A, G and iC<sup>m</sup> from each other, mainly due to widely scattered reads of the iC<sup>m</sup> group. Both results coincide with the visual assessment of the signal traces in Figure 5. | + | Based on the data presented in Figure 14, a cluster analysis based on linear discriminant analysis was conducted using <i>iCG model</i>. Figure 15 shows dot-plots for the forward and reverse models, presenting the linear discriminants of the reads each respective model was created with. The direct comparison of both models reveals that the model created upon the data of the reverse strand seems to perform better in terms of classification of the sequencing reads. Except for the groups containing A and G at the position of interest, which slightly overlap with each other, all other groups are well separated from each other. On the other hand, the linear discriminant analysis of the data of the forward strand was unable to properly separate the reads containing A, G and iC<sup>m</sup> from each other, mainly due to widely scattered reads of the iC<sup>m</sup> group. Both results coincide with the visual assessment of the signal traces in Figure 14. |
| </div> | | </div> |
| | | |
Line 476: |
Line 490: |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/a/a5/T--Bielefeld-CeBiTec--model_dotplots.png"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/a/a5/T--Bielefeld-CeBiTec--model_dotplots.png"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 7: Dot-plots of the linear discriminant models of the forward and reverse strand.</b> Dot-plots of the linear discriminants of the reads used for the creation of the statistical models for base prediction at the position of interest in the forward and reverse strand. The data used for the linear discriminant analysis was previously filtered by removing 70 % of reads from each group, based on their deviation from the groups median signal in the neighboring sequence context of the position of interest. | + | <b>Figure 15: Dot-plots of the linear discriminant models of the forward and reverse strand.</b> Dot-plots of the linear discriminants of the reads used for the creation of the statistical models for base prediction at the position of interest in the forward and reverse strand. The data used for the linear discriminant analysis was previously filtered by removing 70 % of reads from each group, based on their deviation from the groups median signal in the neighboring sequence context of the position of interest. |
− | Lorem Ipsum.
| + | |
| </p> | | </p> |
| </div> | | </div> |
| | | |
| <div class="article"> | | <div class="article"> |
− | Since a statistical model should not be tested with the very data it was created with, we prepared a new set of DNA samples to properly evaluate the performance of both models concerning the prediction of bases at the position of interest in their respective sequence context. For this purpose, we modified the RuBisCo plasmid that was used for the first sample preparation by cloning five different sequences downstream RuBisCo with standard BioBrick assembly (<a target="_blank" href="S05406"></a>, <a target="_blank" href="S05407"></a>, <a target="_blank" href="S05408"></a>, <a target="_blank" href="S05409"></a>, <a target="_blank" href="S05410"></a>). Each of these plasmids contains a 25 nt sequence that is unique, while the remaining plasmid sequence is the same. These unique sequences can be used for identification assignment of sequencing reads comparable to the Nanopore barcoding approach. Starting with these five plasmids, we prepared new DNA samples according to the same procedure explained above. After ligation, all five samples were pooled in approximately equimolar proportion and further prepared for sequencing. After sequencing and basecalling of this pooled sample, the reads were assigned to their respective group by using iCG filter with the "--barcode" argument and each plasmid's unique sequence. After filtering, 50 reads of every group were randomly selected in order to be used for evaluating the performance of the linear discriminant models with iCG predict. The results of this evaluation are summarized in Figure 7. | + | Since a statistical model should not be tested with the very data it was created with, we prepared a new set of DNA samples to properly evaluate the performance of both models concerning the prediction of bases at the position of interest in their respective sequence context. For this purpose, we modified the RuBisCO plasmid that was used for the first sample preparation by cloning five different sequences downstream RuBisCO with <a target="_blank" href="https://static.igem.org/mediawiki/2017/8/84/T--Bielefeld-CeBiTec--protocol_Standard_Biobrick.pdf">Standard BioBrick assembly</a> (part numbers <a target="_blank" href="S05406"></a>, <a target="_blank" href="S05407"></a>, <a target="_blank" href="S05408"></a>, <a target="_blank" href="S05409"></a>, <a target="_blank" href="S05410"></a>). Each of these plasmids contains a 25 nt sequence that is unique, while the remaining plasmid sequence is the same. These unique sequences can be used for identification assignment of sequencing reads comparable to the Nanopore barcoding approach. Starting with these five plasmids, we prepared new DNA samples according to the same procedure explained above. After ligation, all five samples were pooled in approximately equimolar proportion and further prepared for sequencing. After sequencing and basecalling of this pooled sample, the reads were assigned to their respective group by using iCG filter with the "--barcode" argument and each plasmid's unique sequence. After filtering, 50 reads of every group were randomly selected in order to be used for evaluating the performance of the linear discriminant models with iCG predict. The results of this evaluation are summarized in Figure 16. |
| </div> | | </div> |
| | | |
Line 488: |
Line 501: |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/c/cc/T--Bielefeld-CeBiTec--model_test.png"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/c/cc/T--Bielefeld-CeBiTec--model_test.png"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 8: Evaluation of the linear discriminant analysis models.</b> | + | <b>Figure 16: Evaluation of the linear discriminant analysis models.</b> |
| Evaluation results of the linear discriminant analysis models for the forward and reverse strand. (A) Linear discriminants of the test data colored in accordance with their respective base prediction. (B) Distribution of predicted bases. Based on the assumption that every read in the test data set was correctly assigned with the barcoding approach, equal portions of 20 % for each base would be ideal, corresponding to 50 reads per test data group. (C) Fidelity of base prediction, revealing which base predictions were made for the reads of each group individually. | | Evaluation results of the linear discriminant analysis models for the forward and reverse strand. (A) Linear discriminants of the test data colored in accordance with their respective base prediction. (B) Distribution of predicted bases. Based on the assumption that every read in the test data set was correctly assigned with the barcoding approach, equal portions of 20 % for each base would be ideal, corresponding to 50 reads per test data group. (C) Fidelity of base prediction, revealing which base predictions were made for the reads of each group individually. |
| </p> | | </p> |
Line 494: |
Line 507: |
| | | |
| <div class="article"> | | <div class="article"> |
− | The results in Figure 8 indicate that the linear discriminant model for the reverse strand orientation is performing better than the model for the sense strand. The base prediction fidelity is especially high for reads containing an adenine, a cytosine or an isoguanine at the position of interest. Due to the hydrolysis of isoC<sup>m</sup> to T and the tautomerisation of isoG, leading to mispairing with T, the most common mutation that leading to a loss of the unnatural base pair between isoG and isoC<sup>m</sup> is the mutation from isoG to A (Bande et al., 2015). Considering the fidelity of base prediction for both A and isoG with the reverse strand model, we conclude that this linear discriminant analysis model is well suited for the discrimination between isoG and all natural bases in the sequence context <font face="courier new">5'-ggNct-3'</font>. Therefore, we could show that the software package iCG is applicable for the analysis of experiments | + | The results in Figure 16 indicate that the linear discriminant model for the reverse strand orientation is performing better than the model for the sense strand. The base prediction fidelity is especially high for reads containing an adenine, a cytosine or an isoguanine at the position of interest. Due to the hydrolysis of isoC<sup>m</sup> to T and the tautomerisation of isoG, leading to mispairing with T, the most common mutation that leading to a loss of the unnatural base pair between isoG and isoC<sup>m</sup> is the mutation from isoG to A (Bande <i>et al.</i>, 2015). Considering the fidelity of base prediction for both A and isoG with the reverse strand model, we conclude that this linear discriminant analysis model is well suited for the discrimination between isoG and all natural bases in the sequence context <font face="courier new">5'-ggNct-3'</font>. Therefore, we could show that the <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software#iCG">software package iCG</a> is applicable for the analysis of experiments with unnatural bases. |
| </div> | | </div> |
| | | |
Line 506: |
Line 519: |
| <div class="bevel tr"></div> | | <div class="bevel tr"></div> |
| <div class="content"> | | <div class="content"> |
| + | |
| + | |
| + | <h3>Orthogonal Analysis with M.A.X and iCG</h3> |
| + | |
| + | <div class="article"> |
| + | In order to test the orthogonality of both methods we developed for the analysis of DNA containing unnatural bases, we performed an further PCR reaction with a template containing isoG and isoC<sup>m</sup> and analyzed the amplified DNA with both M.A.X. and iCG. The experimental setup was identical to those of our previous PCR experiments, except for the choice of primers and the reaction volume. A GoTaq PCR reaction with the template containing the unnatural bases was prepared with a total reaction volume of 250 μL. As primers, the standard iGEM sequencing primers VR and VF2 were used in the reaction. |
| + | </div> |
| | | |
| <div class="figure large"> | | <div class="figure large"> |
| <img class="figure image" src="https://static.igem.org/mediawiki/2017/0/04/T--Bielefeld-CeBiTec--PCR_MAX_iCG.png"> | | <img class="figure image" src="https://static.igem.org/mediawiki/2017/0/04/T--Bielefeld-CeBiTec--PCR_MAX_iCG.png"> |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
− | <b>Fig. 5: PCR of DNA containing isoG/isoC<sup>m</sup> with GoTaq analyzed with iCG and M.A.X.</b> A DNA template containing an unnatural base pair between isoG and isoC<sup>m</sup> was analyzed in an orthogonal approach with M.A.X. and iCG. | + | <b>Figure 17: PCR of DNA containing isoG/isoC<sup>m</sup> with GoTaq analyzed with iCG and M.A.X.</b> A DNA template containing an unnatural base pair between isoG and isoC<sup>m</sup> was analyzed in an orthogonal approach with M.A.X. and iCG. (A) After PCR reaction and PCR cleanup, four separate restriction reactions were performed with the addition of either <i>Eci</i>I, <i>Bsa</i>I, <i>Sap</i>I or no restriction enzyme. In the mentioned order, the amplified DNA contains a recognition sequence for these enzymes in case of a mutation of isoG to A, T and G. The reaction without restriction enzyme was used to control if a fragment of the expected size of 761 bp was amplified. (B) After PCR cleanup, the sample was sequenced with Oxford Nanopore sequencing and subsequently analyzed with the software suite iCG. A dot-plot showing the linear discriminants of each filtered read after base prediction is shown, as well as the numerical result of the base prediction and a plot showing the normalized signal traces at the position of interest. |
| </p> | | </p> |
| </div> | | </div> |
| + | |
| + | <div class="article"> |
| + | The results of both analysis methods presented in Figure 17 reveal that the fidelity UBP replication is not perfect for this experimental setup. In the gel electrophoresis after the digestion with the enzymes of the M.A.X system, a distinct band is visible for the <i>Eci</i>I digest, indicating that mutations of isoG to A were occurring during the PCR reaction. The analysis with Nanopore sequencing and iCG reveals a similar result, showing that approximately 54 % of the analyzed reads contained isoG at the position of interest, while 39 % were predicted to have an A at this position. Therefore, both methods reveal that the mutation of isoG to A is the most frequent mutational event leading to a loss of the unnatural base pair between isoG and isoC<sup>m</sup>, which is in accordance to our expectations and respective results in literature (Switzer <i>et al.</i>, 1993, Johnson <i>et al.</i>, 2004). Interestingly, the replication fidelity seems to be considerably lower compared to our first PCR experiments with the GoTaq polymerase. Possibly, the reason for this difference is due to the longer storage time of the DNA template at 4 ℃. It was shown that d-isoCTP is quite unstable even when stored at -20 ℃ (Switzer <i>et al.</i>, 1993), possibly due to hydrolysis of isoC. Therefore, we postulate that the reduced PCR fidelity might be caused by a a reduced template integrity. |
| + | </div> |
| + | <div class="article"> |
| + | The results in Figure 17 indicate that the analysis with Nanopore sequencing and iCG as well as the analysis with M.A.X. are orthogonal methods producing the same analysis results. Arguably, it can be supposed that the results produced by Nanopore sequencing are more precise, allowing the detection of uncommon mutation events that are not detectable with M.A.X. Nevertheless, the analysis with restriction digests is much faster and cost efficient. |
| + | </div> |
| + | |
| | | |
| </div> | | </div> |
Line 528: |
Line 556: |
| <div class="article"> | | <div class="article"> |
| <b>Bande, O., Abu El Asrar, R., Braddick, D., Dumbre, S., Pezo, V., Schepers, G., Pinheiro, V.B., Lescrinier, E., Holliger, P., Marlière, P., and Herdewijn, P.</b> (2015). Isoguanine and 5-Methyl-Isocytosine Bases, In Vitro and In Vivo. Chem. - A Eur. J. <b>21</b>: 5009–5022. | | <b>Bande, O., Abu El Asrar, R., Braddick, D., Dumbre, S., Pezo, V., Schepers, G., Pinheiro, V.B., Lescrinier, E., Holliger, P., Marlière, P., and Herdewijn, P.</b> (2015). Isoguanine and 5-Methyl-Isocytosine Bases, In Vitro and In Vivo. Chem. - A Eur. J. <b>21</b>: 5009–5022. |
| + | </div> |
| + | <div class="article"> |
| + | <b>Check Hayden, E.</b> (2015). Pint-sized DNA sequencer impresses first users. Nature <b>521</b>: 15–16. |
| </div> | | </div> |
| <div class="article"> | | <div class="article"> |