|
|
Line 18: |
Line 18: |
| <div class="content"> | | <div class="content"> |
| | | |
− | <h3>Background: Detection of Unnatural Bases in DNA</h3> | + | <h3>Short summary</h3> |
| | | |
| <div class="article"> | | <div class="article"> |
− | When working with unnatural bases, one of the major challenges is the detection of unnatural base pairs (UBPs) in DNA. For the analysis of UBP retention <i>in vivo</i>, <i>in vitro</i> replication and PCR experiments, it is mandatory to have a reliable method for UBP detection. In most cases, scientists working with unnatural bases have to develop their own methods specifically suitable for the detection of the very unnatural bases they are working with. Unfortunately, these methods often come along with unfavorable circumstances. | + | For our experiments, we developed two new methods which are potentially applicable for the analysis of experiments with most unnatural bases: The restriction enzyme based Mutational Analysis Xpolorer (M.A.X) and Oxford Nanopore sequencing of unnatural bases with the help of our software iCG. Both are comparably cost efficient methods that have the major advantage of enabling the direct analysis of mutational events in addition to the mere detection of unnatural bases. In our experiments, we showed that both methods are applicable for the analysis of experiments with the unnatural base pait between isoG and isoC<sup>m</sup>. Additionally, we analyzed the replication efficiency of this unnatural base pair with the help of both methods, showing that they produce similar results by orthogonal approaches. |
| </div> | | </div> |
− | <br> | + | |
| + | </div> |
| + | <div class="bevel bl"></div> |
| + | </div> |
| + | |
| + | |
| + | <div class="contentbox"> |
| + | <div class="bevel tr"></div> |
| + | <div class="content"> |
| + | |
| + | <h3>Background: Detection of Unnatural Bases in DNA</h3> |
| + | |
| <div class="article"> | | <div class="article"> |
− | One method commonly applied is the usage of molecular beacons, as described by Johnson <i>et al.</i>, 2004 for the detection if isoG and isoC<sup>m</sup> in PCR experiments. This is a quite circumstantial and expensive method, as for every ssDNA sample, an individual fluorescence labeled, specific oligonucleotide containing the unnatural nucleotide complementary to the one inverstigated is needed. Additionally, the influence of the analyzed unnatural bases on the annealing temperature has to be investigated previously, to prevent unspecific hybridization from influencing the analysis results. A different method used primarily for UBP retention analysis is the biotin shift assay (Zhang <i>et al.</i>, 2017). A DNA sample analyzed by thid approach is used as a template in a PCR reaction with biotin-labeled nucleotide triphosphates of one of the unnatural bases. Through specific interaction with Spreptavidin after amplification, molecules containing these labeled nucleotides are shifted upwards in comparison to DNA molecules of the same length in subsequent PAGE analysis. The use of specially labeled variants of the analyzed unnatural bases is generally not recommended, as even small structural changes might have a great impact on the interaction with other biomolecules and could lead to falsified experiment results. In addition, the necessity of a PCR amplification for UBP detection is disadvantageous too, as the proportion of DNA molecules with sustained and lost UBPs might be greatly impacted, particularly - but not exclusively - due to influences of the unnatural nucleoside triphosphates on the PCR reaction. | + | When working with unnatural bases, one of the major challenges is the detection of unnatural base pairs (UBPs) in the DNA. For the analysis of UBP retention <i>in vivo</i>, <i>in vitro</i> replication and PCR experiments, it is mandatory to have a reliable method for UBP detection. In most cases, scientists working with unnatural bases have to develop their own methods specifically suitable for the detection of the very unnatural bases they are working with. Unfortunately, these methods often come along with unfavorable circumstances. |
| </div> | | </div> |
| <br> | | <br> |
| <div class="article"> | | <div class="article"> |
− | For our experiments, we developed two new methods which are potentially applicable for the analysis of experiments with most unnatural bases: The Mutational Analysis Xpolorer (M.A.X) and Oxford Nanopore sequencing of unnatural bases with iCG. Both are comparably cost efficient methods that have the major advantage of enabling the direct analysis of mutational events in addition to the mere detection of unnatural bases. In our experiments, we showed that both methods are applicable for the analysis of experiments with the unnatural bases isoG and isoC<sup>m</sup>. | + | One method commonly applied is the usage of molecular beacons, as described by Johnson <i>et al.</i>, 2004 for the detection if isoG and isoC<sup>m</sup> in PCR experiments. This is a quite circumstantial and expensive method, as for every ssDNA sample, an individual fluorescence labeled, specific oligonucleotide containing the unnatural nucleotide complementary to the one which should be investigated is needed. Additionally, the influence of the analyzed unnatural bases on the annealing temperature has to be investigated previously, to prevent unspecific hybridization from influencing the analysis results. A different method used primarily for UBP retention analysis is the biotin shift assay (Zhang <i>et al.</i>, 2017). In this approach, a DNA sample is used as a template in a PCR reaction with biotin-labeled nucleotide triphosphates of one of the unnatural bases. Through specific interaction with streptavidine after amplification, molecules containing these labeled nucleotides are shifted upwards in comparison to DNA molecules of the same length in subsequent PAGE analysis. The use of specially labeled variants of the analyzed unnatural bases is generally not recommended, as even small structural changes might have a great impact on the interaction with other biomolecules and could lead to falsified experiment results. In addition, the necessity of a PCR amplification for UBP detection is disadvantageous too, as the proportion of DNA molecules with sustained and lost UBPs might be greatly impacted, particularly - but not exclusively - due to influences of the unnatural nucleoside triphosphates on the PCR reaction. |
| </div> | | </div> |
| | | |
Line 42: |
Line 53: |
| | | |
| <!-- Ueberschriften --> | | <!-- Ueberschriften --> |
− | <span class="anchor-jump" id="M.A.X"></span> | + | |
− | <div class="section"></div>
| + | <h3> Analysis of DNA containing isoG and isoC<sup>m</sup> with the Mutation Analysis Xplorer</h3> |
− | <h3> Mutation Analysis Xplorer – Results </h3>
| + | |
| | | |
| <h4>Primer annealing</h4> | | <h4>Primer annealing</h4> |
Line 365: |
Line 375: |
| <div class="bevel tr"></div> | | <div class="bevel tr"></div> |
| <div class="content"> | | <div class="content"> |
− | <span class="anchor-jump" id="ONSseq"></span>
| + | |
− | <div class="section"></div>
| + | |
| <h3>Isoguanine & 5-methyl isocytosine in Nanopore Sequencing</h3> | | <h3>Isoguanine & 5-methyl isocytosine in Nanopore Sequencing</h3> |
| <h4>Nanopore Sequencing</h4> | | <h4>Nanopore Sequencing</h4> |
| <div class="article"> | | <div class="article"> |
− | Oxford Nanopore Technology's (ONT) sequencing technology offers a great potential as a tool for the detection of unnatural bases in DNA. In ONT sequencing, protein nanopores are distributed inside a synthetic membrane of high electrical resistance. When applying an electrical field across this membrane, an ionic current passes through each nanopore which is being measured and recorded. If a biomolecule, such as proteins, RNA or DNA are located inside the Nanopore, the ionic current is influenced. These characteristic changes can be used to identify which molecule is passing through the nanopore. (Feng <i>et al.</i>, 2015) This way, an algorithm called the "basecaller" is able to predict the nucleotide sequence of a single stranded DNA or RNA molecule based on the raw data that is recorded when it is pulled through a nanopore. Since the commercial availability of the portable sequencer MinION in 2015, strong improvements have been made in terms of increasing the bascalling accuracy. Even though the error rate is still high compared to other sequencing techniques, the advantage of having long reads of several kilobases is often preferential regarding sequencing of DNA containing repetitive sequences or mobile genetic elements like transposable elements (Debladis <i>et al.</i>, 2017) More recently, efforts have been made towards the analysis of epigenetic information based on the identification of modified bases in nucleic acids with nanopore sequencing. For example, methylated cytosine was shown to be distinguishable from unmodified cytosine by training a hidden Markov model (Simpson <i>et al.</i>, 2017). | + | <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Partners">Oxford Nanopore Technology's (ONT)</a> sequencing technology offers a great potential as a tool for the detection of unnatural bases in DNA. In ONT sequencing, protein nanopores are distributed inside a synthetic membrane of high electrical resistance. When applying an electrical field across this membrane, an ionic current passes through each nanopore which is being measured and recorded. If a biomolecule, such as proteins, RNA or DNA are located inside the Nanopore, the ionic current is influenced. These characteristic changes can be measured and used to identify which molecule is passing through the nanopore. (Feng <i>et al.</i>, 2015) This way, an algorithm called the "basecaller" is able to predict the nucleotide sequence of a single stranded DNA or RNA molecule based on the raw data that is recorded when it is pulled through the nanopore. Since the commercial availability of the portable sequencer MinION in 2015 (Check Hayden, 2015), strong improvements have been made in terms of increasing the bascalling accuracy. Even though the error rate is still high compared to other sequencing techniques, the advantage of having long reads of several kilobases is often preferential regarding sequencing of DNA containing repetitive sequences or mobile genetic elements like transposable elements (Debladis <i>et al.</i>, 2017) More recently, efforts have been made towards the analysis of epigenetic information based on the identification of modified bases in nucleic acids with nanopore sequencing. For example, methylated cytosine was shown to be distinguishable from unmodified cytosine by training a hidden Markov model (Simpson <i>et al.</i>, 2017). |
| </div> | | </div> |
| <br> | | <br> |
| <div class="article"> | | <div class="article"> |
− | Compared to other sequencing technologies, nanopore sequencing offers several advantages regarding the detection of unnatural bases. Most importantly, no PCR amplification of the DNA sample is needed in the process of library preparation. This way, no information gets lost prior to sequencing as a result of a potentially lower PCR amplification fidelity of the unnatural base pair. Another big advantage is that no additional chemistry is needed in the process of sequencing. Other sequencing technologies such as 454, Sanger, Illumina and PacBio are based on polymerases that synthesize a DNA strand complement to a template being sequenced. When a specially labeled nucleotide is incorporated, a detectable signal is emitted. This is problematic regarding sequencing of DNA containing unnatural bases, as additional labeled nucleotides would be needed for a continuous strand synthesis and to produce a unique signal for the unnatural bases. Considering the development costs for this new chemistry, the necessary process adaptations and increased complexity of data analysis, the sequencing of orthogonal unnatural bases is unlikely to be feasible with these technologies. In contrast, nanopore sequencing omits the necessity for additional chemistry and it is unlikely that sequencing will be interrupted by unnatural bases passing through the nanopore. On top of that, Nanopore sequencing was shown to be applicable for direct sequencing of RNA, without prior transcription into cDNA (Garalde et al., 2016). Therefore, it promises to be suitable for transcription studies involving unnatural bases too. | + | Compared to other sequencing technologies, nanopore sequencing offers several advantages regarding the detection of unnatural bases. Most importantly, no PCR amplification of the DNA sample is needed in the process of library preparation. This way, no information gets lost prior to sequencing due to potentially lower PCR amplification fidelity of the unnatural base pair. Another big advantage is that no additional chemistry is needed in the process of sequencing. Other sequencing technologies such as 454, Sanger, Illumina and PacBio are based on polymerases that synthesize a DNA strand complement to a template being sequenced. When a specially labeled nucleotide is incorporated, a detectable signal is emitted. This is problematic regarding sequencing of DNA containing unnatural bases, as additional labeled nucleotides would be needed for a continuous strand synthesis and to produce a unique signal for the unnatural bases. Considering the development costs for this new chemistry, the necessary process adaptations and increased complexity of data analysis, the sequencing of orthogonal unnatural bases is unlikely to be feasible with these technologies. In contrast, nanopore sequencing omits the necessity for additional chemistry and it is unlikely that sequencing will be interrupted by unnatural bases passing through the nanopore. On top of that, Nanopore sequencing was shown to be applicable for direct sequencing of RNA, without prior transcription into cDNA (Garalde <i>et al.</i>, 2016). Therefore, it promises to be suitable for transcription studies involving unnatural bases too. |
| </div> | | </div> |
| <br> | | <br> |
| <div class="article"> | | <div class="article"> |
− | We aim to examine if Oxford Nanopore sequencing is suitable for sequencing DNA containing unnatural bases. Therefore, we sequenced different DNA samples containing either the unnatural nucleotides isoguanosine and 5‑methyl isocytidine or any natural bases in the same sequence context to see if the output signal differs significantly between these groups. The data processing and evaluation was performed with the help of our own <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software#iCG">software iCG</a>, that we developed specifically for analyzing Nanopore sequencing data of DNA containing unnatural bases. Our aim is to create a linear discriminant analysis model that is able to discriminate between isoG/isoC<sup>m</sup> and natural bases in the given neighboring sequence context of two bases upstream and two bases downstream of the position of interest. For a detailed description of how the software works, please refer to our <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software">Software</a> page. | + | We aimed to examine if Oxford Nanopore sequencing is suitable for sequencing DNA containing unnatural bases. Therefore, we sequenced different DNA samples containing either the unnatural bases isoG and isoC<sup>m</sup> or any natural bases in the same sequence context to see if the output signal differs significantly between these groups. The data processing and evaluation was performed with the help of our own <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software#iCG">software iCG</a>, that we developed specifically for analyzing Nanopore sequencing data of DNA containing unnatural bases. Our aim is to create a linear discriminant analysis model that is able to discriminate between isoG/isoC<sup>m</sup> and natural bases in the given neighboring sequence context of two bases upstream and two bases downstream of the position of interest. For a detailed description of how the software works, please refer to our <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software">Software</a> page. |
| </div> | | </div> |
| | | |
Line 384: |
Line 393: |
| <h4>Reference Sample Preparation & Sequencing</h4> | | <h4>Reference Sample Preparation & Sequencing</h4> |
| <div class="article"> | | <div class="article"> |
− | In order to examine if the unnatural bases isoG and isoC<sup>m</sup> are differentiable from the natural bases through nanopore sequencing, five different DNA samples were prepared that differed only at a single sequence position, containing either an unnatural base or one of the four natural bases at this position of interest. For our experiments, we started by sequencing isoC<sup>m</sup> in the sequence context <font face="courier new">5'‑AG\iC<sup>m</sup>\CC‑3'</font> and, on the reverse strand, isoG in the sequence context <font face="courier new">5'‑GG\iG<sup>\CT‑3'</font>. For this purpose, we constructed five reference DNA samples with the following sequences: | + | In order to examine if the unnatural bases isoG and isoC<sup>m</sup> are differentiable from the natural bases through nanopore sequencing, five different DNA samples were prepared that differed only at a single sequence position, containing either an unnatural base or one of the four natural bases at this position of interest. For our experiments, we started by sequencing isoC<sup>m</sup> in the sequence context <font face="courier new">5'‑AG\iC<sup>m</sup>\CC‑3'</font> and, on the reverse strand, isoG in the sequence context <font face="courier new">5'‑GG\iG\CT‑3'</font>. For this purpose, we constructed five reference DNA samples with the following sequences: |
| </div> | | </div> |
| <div class="figure seventy"> | | <div class="figure seventy"> |
Line 395: |
Line 404: |
| </div> | | </div> |
| <div class="article"> | | <div class="article"> |
− | Each reference DNA sample was prepared starting from a pair of complementary synthetic oligonucleotides. The oligos containing isoguanine or 5‑methyl isocytosine were synthesized by <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Partners">Biolegio</a>, including subsequent purification through polyacrylamid gel electrophoresis (PAGE). Mass spectrometry and ultra performance liquid chromatography data (Figure 1) provided by Biolegio indicate that the concentrations of unmodified side products are below detection limits. In general, the manufacturer specifies the purity of PAGE purified oligonuleotides containing modified bases to be greater than 95 %. The oligonucleotides containing exclusively natural bases were ordered from metabion and were purified by desalting. | + | Each reference DNA sample was prepared starting from a pair of complementary synthetic oligonucleotides. The oligos containing isoguanine or 5‑methyl isocytosine were synthesized by <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Partners">Biolegio</a>, including subsequent purification through polyacrylamid gel electrophoresis (PAGE). Mass spectrometry and ultra performance liquid chromatography data (Figure 1) provided by Biolegio indicate that the concentrations of unmodified side products are below detection limits. In general, the manufacturer specifies the purity of PAGE purified oligonuleotides containing modified bases to be greater than 95 %. The oligonucleotides containing exclusively natural bases were ordered from <a target="_blank" href="http://www.metabion.com/">metabion</a> and were purified by desalting. |
| </div> | | </div> |
| <br> | | <br> |
Line 406: |
Line 415: |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
| <b>Figure 10: UPLC and MS data from oligos containing isoguanine or 5‑methyl isocytosine.</b> | | <b>Figure 10: UPLC and MS data from oligos containing isoguanine or 5‑methyl isocytosine.</b> |
− | Oligonucleotides containing isoguanine or 5‑methyl isocytosine were synthesized by Biolegio and analysed by ultra performance liquid chromatography (UPLC) and mass spectrometry (MS). Shown above are the results from the UPLC (above) and MS (below) analysis for each of the complementary oligos containing either isoguanine (A) or 5‑methyl isocytosine (B). | + | Oligonucleotides containing isoguanine or 5‑methyl isocytosine were synthesized and analysed by Biolegio by ultra performance liquid chromatography (UPLC) and mass spectrometry (MS). Shown above are the results from the UPLC (above) and MS (below) analysis for each of the complementary oligos containing either isoguanine (A) or 5‑methyl isocytosine (B). |
| </p> | | </p> |
| </div> | | </div> |
| | | |
| <div class="article"> | | <div class="article"> |
− | For each DNA sample, a complementary pair of oligonucleotides was <a target="_blank" href="https://static.igem.org/mediawiki/2017/d/d1/T--Bielefeld-CeBiTec--protocol_ssDNA_annealing.pdf">annealed</a> and ligated into a plasmid backbone (<a target="_blank" href="http://parts.igem.org/Part:BBa_K1465202">BBa_K1465202</a>) previously linearized by <i>Xba</i>I and <i>Bmt</i>I. For this purpose, a leading <i>Bmt</i>I and a tailing <i>Spe</i>I recognition site were included into the oligonucleotide sequences. After ligation, re-ligated backbone was linearized by digestion with <i>Xba</i>I. After consecutive digestion of double and single stranded linear DNA fragments with lambda exonuclease and E. coli exonuclease I, the DNA samples were linearized through <i>Eco</i>RV digestion and purified for sequencing library preparation. An individual library was prepared for each DNA sample, according to the 1D Library Protocol for SQK-LSK108, starting from the end repair step. | + | For each DNA sample, a complementary pair of oligonucleotides was <a target="_blank" href="https://static.igem.org/mediawiki/2017/d/d1/T--Bielefeld-CeBiTec--protocol_ssDNA_annealing.pdf">annealed</a> and ligated into a plasmid backbone (<a target="_blank" href="http://parts.igem.org/Part:BBa_K1465202">BBa_K1465202</a>) previously linearized by <i>Xba</i>I and <i>Bmt</i>I. For this purpose, a leading <i>Bmt</i>I and a tailing <i>Spe</i>I recognition site were included into the oligonucleotide sequences. After ligation, re-ligated backbone was linearized by digestion with <i>Xba</i>I. After consecutive digestion of double and single stranded linear DNA fragments with lambda exonuclease and <i>E. coli</i> exonuclease I, the DNA samples were linearized through <i>Eco</i>RV digestion and purified for sequencing library preparation. An individual library was prepared for each DNA sample, according to the 1D Library Protocol for SQK-LSK108, starting from the end repair step. |
| </div> | | </div> |
| | | |
Line 450: |
Line 459: |
| </div> | | </div> |
| <div class="article"> | | <div class="article"> |
− | In the first step, the reads were filtered by iCG filter in order to identify reads that contain the region of interest and have a high basecalling quality. Regarding the parameters minimum length, maximum length and minimum mean quality qscore, the the default argument settings of iCG filter were used for filtering. Of the remaining reads, only those containing the neighboring sequence context of 15 bases upstream and downstream of the POI were selected, without considering the close sequence context (blur region) of 3 ±1 bases around the POI, where influences of the unnatural bases may lead to unpredictable behavior of the basecaller. The matching reads were allowed to contain a maximum of 2 mismatches, including indels. The maximum deviation in length was set to 1 base and reads containing the region of interest multiple times were rejected. Additionally, the selected reads were further filtered for a minimum mean quality score of 14 in this restricted sequence context and sorted by their stand orientation. For further information about, please read more about iCG on our <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software#iCG">Software</a> page. | + | In the first step, the reads were filtered by iCG filter in order to identify reads that contain the region of interest and have a high basecalling quality. Regarding the parameters minimum length, maximum length and minimum mean Phred qscore, the default argument settings of iCG filter were used for filtering. Of the remaining reads, only those containing the neighboring sequence context of 15 bases upstream and downstream of the POI were selected, without considering the close sequence context (blur region) of 3 ±1 bases around the POI, where influences of the unnatural bases may lead to unpredictable behavior of the basecaller. The matching reads were tolerated to contain a maximum of 2 mismatches, including indels. The maximum deviation in length was set to 1 base and reads containing the region of interest multiple times were rejected. Additionally, the selected reads were further filtered for a minimum mean quality score of 14 in this restricted sequence context and sorted by their stand orientation. For further information about, please read more about iCG on our <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software#iCG">Software</a> page. |
| </div> | | </div> |
| | | |
Line 457: |
Line 466: |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
| <b>Figure 14: Normalized signal traces of analyzed DNA samples.</b> | | <b>Figure 14: Normalized signal traces of analyzed DNA samples.</b> |
− | Overlayed, normalized signal traces of DNA samples containing either isoG/isoC<sup>m</sup> or any natural base at the position of interest in the analyzed sequence context. The reads displayed in these plots were selected from their respective sequencing runs by using iCG filter, using the same filter settings for all DNA samples. To remove contaminating reads from previous sequencing runs, a quantile of 0.7 of the most deviating reads was removed previous to plotting with the help of iCG model. | + | Overlayed, normalized signal traces of DNA samples containing either isoG/isoC<sup>m</sup> or any natural base at the position of interest in the analyzed sequence context. The reads displayed in these plots were selected from their respective sequencing runs by using iCG filter, using the same filter settings for all DNA samples. To remove contaminating reads from previous sequencing runs, a quantile of 0.7 of the most deviating reads was removed previous to plotting with the help of <i>iCG model</i>. |
| </p> | | </p> |
| </div> | | </div> |
| | | |
| <div class="article"> | | <div class="article"> |
− | Afterwards, iCG model was used to create linear discriminant models based on the filtered groups of template reads gathered by iCG filter. Different setting for the amount of removed, deviating reads were tested. Figure 5 shows plots of the Region of interest for both the forward and the reverse strand and all five template groups, with a quantile of 0.7 removed reads. For both strand orientations, there is a distinct difference in the mean, normalized signal trace detectable comparing the sequences containing an unnatural base with those containing a natural base at the position of interest. | + | Afterwards, <i>iCG model</i> was used to create linear discriminant models based on the filtered groups of template reads gathered by iCG filter. Different setting for the amount of removed, deviating reads were tested. Figure 5 shows plots of the Region of interest for both the forward and the reverse strand and all five template groups, with a quantile of 0.7 removed reads. For both strand orientations, there is a distinct difference in the mean, normalized signal trace detectable comparing the sequences containing an unnatural base with those containing a natural base at the position of interest. |
| </div> | | </div> |
| <br> | | <br> |
Line 473: |
Line 482: |
| | | |
| <div class=article> | | <div class=article> |
− | Based on the data presented in Figure 14, a cluster analysis based on linear discriminant analysis was conducted using iCG model. Figure 15 shows dot-plots for the forward and reverse models, presenting the linear discriminants of the reads each respective model was created with. The direct comparison of both models reveals that the model created upon the data of the reverse strand seems to perform better in terms of classification of the sequencing reads. Except for the groups containing A and G at the position of interest, which slightly overlap with each other, all other groups are well seperated from each other. On the other hand, the linear discriminant analysis of the data of the forward strand was unable to properly separate the reads containing A, G and iC<sup>m</sup> from each other, mainly due to widely scattered reads of the iC<sup>m</sup> group. Both results coincide with the visual assessment of the signal traces in Figure 14. | + | Based on the data presented in Figure 14, a cluster analysis based on linear discriminant analysis was conducted using <i>iCG model</i>. Figure 15 shows dot-plots for the forward and reverse models, presenting the linear discriminants of the reads each respective model was created with. The direct comparison of both models reveals that the model created upon the data of the reverse strand seems to perform better in terms of classification of the sequencing reads. Except for the groups containing A and G at the position of interest, which slightly overlap with each other, all other groups are well separated from each other. On the other hand, the linear discriminant analysis of the data of the forward strand was unable to properly separate the reads containing A, G and iC<sup>m</sup> from each other, mainly due to widely scattered reads of the iC<sup>m</sup> group. Both results coincide with the visual assessment of the signal traces in Figure 14. |
| </div> | | </div> |
| | | |
Line 480: |
Line 489: |
| <p class="figure subtitle"> | | <p class="figure subtitle"> |
| <b>Figure 15: Dot-plots of the linear discriminant models of the forward and reverse strand.</b> Dot-plots of the linear discriminants of the reads used for the creation of the statistical models for base prediction at the position of interest in the forward and reverse strand. The data used for the linear discriminant analysis was previously filtered by removing 70 % of reads from each group, based on their deviation from the groups median signal in the neighboring sequence context of the position of interest. | | <b>Figure 15: Dot-plots of the linear discriminant models of the forward and reverse strand.</b> Dot-plots of the linear discriminants of the reads used for the creation of the statistical models for base prediction at the position of interest in the forward and reverse strand. The data used for the linear discriminant analysis was previously filtered by removing 70 % of reads from each group, based on their deviation from the groups median signal in the neighboring sequence context of the position of interest. |
− | Lorem Ipsum.
| |
| </p> | | </p> |
| </div> | | </div> |
| | | |
| <div class="article"> | | <div class="article"> |
− | Since a statistical model should not be tested with the very data it was created with, we prepared a new set of DNA samples to properly evaluate the performance of both models concerning the prediction of bases at the position of interest in their respective sequence context. For this purpose, we modified the RuBisCo plasmid that was used for the first sample preparation by cloning five different sequences downstream RuBisCo with standard BioBrick assembly (<a target="_blank" href="S05406"></a>, <a target="_blank" href="S05407"></a>, <a target="_blank" href="S05408"></a>, <a target="_blank" href="S05409"></a>, <a target="_blank" href="S05410"></a>). Each of these plasmids contains a 25 nt sequence that is unique, while the remaining plasmid sequence is the same. These unique sequences can be used for identification assignment of sequencing reads comparable to the Nanopore barcoding approach. Starting with these five plasmids, we prepared new DNA samples according to the same procedure explained above. After ligation, all five samples were pooled in approximately equimolar proportion and further prepared for sequencing. After sequencing and basecalling of this pooled sample, the reads were assigned to their respective group by using iCG filter with the "--barcode" argument and each plasmid's unique sequence. After filtering, 50 reads of every group were randomly selected in order to be used for evaluating the performance of the linear discriminant models with iCG predict. The results of this evaluation are summarized in Figure 16. | + | Since a statistical model should not be tested with the very data it was created with, we prepared a new set of DNA samples to properly evaluate the performance of both models concerning the prediction of bases at the position of interest in their respective sequence context. For this purpose, we modified the RuBisCO plasmid that was used for the first sample preparation by cloning five different sequences downstream RuBisCO with <a target="_blank" href="https://static.igem.org/mediawiki/2017/8/84/T--Bielefeld-CeBiTec--protocol_Standard_Biobrick.pdf">Standard BioBrick assembly</a> (part numbers <a target="_blank" href="S05406"></a>, <a target="_blank" href="S05407"></a>, <a target="_blank" href="S05408"></a>, <a target="_blank" href="S05409"></a>, <a target="_blank" href="S05410"></a>). Each of these plasmids contains a 25 nt sequence that is unique, while the remaining plasmid sequence is the same. These unique sequences can be used for identification assignment of sequencing reads comparable to the Nanopore barcoding approach. Starting with these five plasmids, we prepared new DNA samples according to the same procedure explained above. After ligation, all five samples were pooled in approximately equimolar proportion and further prepared for sequencing. After sequencing and basecalling of this pooled sample, the reads were assigned to their respective group by using iCG filter with the "--barcode" argument and each plasmid's unique sequence. After filtering, 50 reads of every group were randomly selected in order to be used for evaluating the performance of the linear discriminant models with iCG predict. The results of this evaluation are summarized in Figure 16. |
| </div> | | </div> |
| | | |
Line 497: |
Line 505: |
| | | |
| <div class="article"> | | <div class="article"> |
− | The results in Figure 16 indicate that the linear discriminant model for the reverse strand orientation is performing better than the model for the sense strand. The base prediction fidelity is especially high for reads containing an adenine, a cytosine or an isoguanine at the position of interest. Due to the hydrolysis of isoC<sup>m</sup> to T and the tautomerisation of isoG, leading to mispairing with T, the most common mutation that leading to a loss of the unnatural base pair between isoG and isoC<sup>m</sup> is the mutation from isoG to A (Bande et al., 2015). Considering the fidelity of base prediction for both A and isoG with the reverse strand model, we conclude that this linear discriminant analysis model is well suited for the discrimination between isoG and all natural bases in the sequence context <font face="courier new">5'-ggNct-3'</font>. Therefore, we could show that the software package iCG is applicable for the analysis of experiments with unnatural bases. | + | The results in Figure 16 indicate that the linear discriminant model for the reverse strand orientation is performing better than the model for the sense strand. The base prediction fidelity is especially high for reads containing an adenine, a cytosine or an isoguanine at the position of interest. Due to the hydrolysis of isoC<sup>m</sup> to T and the tautomerisation of isoG, leading to mispairing with T, the most common mutation that leading to a loss of the unnatural base pair between isoG and isoC<sup>m</sup> is the mutation from isoG to A (Bande <i>et al.</i>, 2015). Considering the fidelity of base prediction for both A and isoG with the reverse strand model, we conclude that this linear discriminant analysis model is well suited for the discrimination between isoG and all natural bases in the sequence context <font face="courier new">5'-ggNct-3'</font>. Therefore, we could show that the <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software#iCG">software package iCG</a> is applicable for the analysis of experiments with unnatural bases. |
| </div> | | </div> |
| | | |
Line 546: |
Line 554: |
| <div class="article"> | | <div class="article"> |
| <b>Bande, O., Abu El Asrar, R., Braddick, D., Dumbre, S., Pezo, V., Schepers, G., Pinheiro, V.B., Lescrinier, E., Holliger, P., Marlière, P., and Herdewijn, P.</b> (2015). Isoguanine and 5-Methyl-Isocytosine Bases, In Vitro and In Vivo. Chem. - A Eur. J. <b>21</b>: 5009–5022. | | <b>Bande, O., Abu El Asrar, R., Braddick, D., Dumbre, S., Pezo, V., Schepers, G., Pinheiro, V.B., Lescrinier, E., Holliger, P., Marlière, P., and Herdewijn, P.</b> (2015). Isoguanine and 5-Methyl-Isocytosine Bases, In Vitro and In Vivo. Chem. - A Eur. J. <b>21</b>: 5009–5022. |
| + | </div> |
| + | <div class="article"> |
| + | <b>Check Hayden, E.</b> (2015). Pint-sized DNA sequencer impresses first users. Nature <b>521</b>: 15–16. |
| </div> | | </div> |
| <div class="article"> | | <div class="article"> |