MarkusHaak (Talk | contribs) |
MarkusHaak (Talk | contribs) |
||
Line 33: | Line 33: | ||
<h3>Nanopore Sequencing of DNA Containing Unnatural Bases</h3> | <h3>Nanopore Sequencing of DNA Containing Unnatural Bases</h3> | ||
<div class="article"> | <div class="article"> | ||
− | As described on the <a target="_blank" href="">New Methods</a> page of this wiki, Nanopore sequencing promises to be especially beneficial regarding the analysis of experiments involving unnatural bases in DNA and RNA. As a single-molecule sequencing technology without the need for special, expensive chemistry or PCR amplification of DNA samples prior to analysis, it could serve as an innovative, novel approach for the analysis of experiments involving unnatural bases. | + | As described on the <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Results/unnatural_base_pair/development_of_new_methods">New Methods</a> page of this wiki, Nanopore sequencing promises to be especially beneficial regarding the analysis of experiments involving unnatural bases in DNA and RNA. As a single-molecule sequencing technology without the need for special, expensive chemistry or PCR amplification of DNA samples prior to analysis, it could serve as an innovative, novel approach for the analysis of experiments involving unnatural bases. |
</div> | </div> | ||
<br> | <br> | ||
Line 118: | Line 118: | ||
<br> | <br> | ||
<div class="article"> | <div class="article"> | ||
− | With all dependencies installed, download the latest version of iCG together with example data sets in the <a target="_blank" href=""> | + | With all dependencies installed, download the latest version of iCG together with example data sets in the <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software#Downloads">Downloads</a> section of this page. To learn about how to use iCG, please continue reading or start right ahead by executing the following command in a terminal, having navigated to the folder where the downloaded code is located: |
</div> | </div> | ||
<div class="codeblock"> | <div class="codeblock"> | ||
Line 244: | Line 244: | ||
<div class="article"> | <div class="article"> | ||
− | Further help regarding the usage of iCG filter is given in the <a target="_blank" href="">help pages</a> that are accessible by passing the argument <font face="courier new">-h</font> or <font face="courier new">--help</font> to the program on execution. They explain every positional and optional argument in detail, including their default values. | + | Further help regarding the usage of iCG filter is given in the <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software#FilterHelp">help pages</a> that are accessible by passing the argument <font face="courier new">-h</font> or <font face="courier new">--help</font> to the program on execution. They explain every positional and optional argument in detail, including their default values. |
</div> | </div> | ||
− | <h5>Help pages</h5> | + | <h5 id="FilterHelp">Help pages</h5> |
<font face="courier new"> | <font face="courier new"> | ||
Line 457: | Line 457: | ||
<br> | <br> | ||
<div class="article"> | <div class="article"> | ||
− | This approach is best described by taking a closer look at an example. Let us take aside the unnatural bases for a moment and consider that we wanted to create a model that is able to discriminate between the natural bases A, G, T and C at the position of interest in the sequence context presented in | + | This approach is best described by taking a closer look at an example. Let us take aside the unnatural bases for a moment and consider that we wanted to create a model that is able to discriminate between the natural bases A, G, T and C at the position of interest in the sequence context presented in Figure 2. First of all, we take sequencing data of four different DNA samples, each one having a different natural base at the position of interest. With the help of iCG filter, the intividual sets of sequencing data are filtered for reads containing the region of interest. Now, the paths of each folder containing one of the four groups are passed to iCG model. Figure 4</a> shows the group means of the five positions (mean_0 to mean_5), the coefficients of three linear discriminants resulting from the linear discriminant analysis, and a two dimensional dot-plot showing the linear discriminants of the template dataset. |
</div> | </div> | ||
<div class="figure large"> | <div class="figure large"> | ||
Line 506: | Line 506: | ||
<div class="article"> | <div class="article"> | ||
− | Further help regarding the usage of iCG model is given in the <a target="_blank" href="">help pages</a> that are accessible by passing the argument <font face="courier new">-h</font> or <font face="courier new">--help</font> to the program on execution. They explain every positional and optional argument in detail, including their default values. | + | Further help regarding the usage of iCG model is given in the <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software#ModelHelp">help pages</a> that are accessible by passing the argument <font face="courier new">-h</font> or <font face="courier new">--help</font> to the program on execution. They explain every positional and optional argument in detail, including their default values. |
</div> | </div> | ||
− | <h5>Help pages</h5> | + | <h5 id="ModelHelp">Help pages</h5> |
<font face="courier new"> | <font face="courier new"> | ||
Line 681: | Line 681: | ||
<div class="article"> | <div class="article"> | ||
− | Further help regarding the usage of iCG predict is given in the <a target="_blank" href="">help pages</a> that are accessible by passing the argument <font face="courier new">-h</font> or <font face="courier new">--help</font> to the program on execution. They explain every positional and optional argument in detail, including their default values. | + | Further help regarding the usage of iCG predict is given in the <a target="_blank" href="https://2017.igem.org/Team:Bielefeld-CeBiTec/Software#PredictHelp">help pages</a> that are accessible by passing the argument <font face="courier new">-h</font> or <font face="courier new">--help</font> to the program on execution. They explain every positional and optional argument in detail, including their default values. |
</div> | </div> | ||
− | <h5>Help pages</h5> | + | <h5 id="PredictHelp">Help pages</h5> |
<font face="courier new"> | <font face="courier new"> | ||
Line 741: | Line 741: | ||
+ | <div class="contentbox"> | ||
+ | <div class="bevel tr"></div> | ||
+ | <div class="content"> | ||
+ | |||
+ | <h3 id="Downloads"> | ||
+ | |||
+ | </div> | ||
+ | <div class="bevel bl"></div> | ||
+ | </div> | ||
Revision as of 16:51, 1 November 2017
Software
Nanopore Sequencing of DNA Containing Unnatural Bases
iCG
Background
Overview
Fig. 1: Overview of the iCG functionality.
Dependencies & Installation
> cd bwa; make
> ./bwa index ref.fa
> ./bwa mem ref.fa read-se.fq.gz | gzip -3 > aln-se.sam.gz
> ./bwa mem ref.fa read1.fq read2.fq | gzip -3 > aln-pe.sam.gz
> install.packages("cowplot")
> install.packages("MASS")
> pip2.7 install biopython
> pip2.7 install -Iv rpy2==2.8.0 # on 10-26-2017, rpy2 v2.9 was incompatible with python2.7
> pip2.7 install nanoraw[plot]
> python2 iCG.py model iCG_test/results -UBP iCG_test/UBP/filtered -AT iCG_test/A/filtered -GC iCG_test/G/filtered -TA iCG_test/T/filtered -CG iCG_test/C/filtered --UBPsense iCme --UBPantisense iG
> python2 iCG.py predict iCG_test/results/antisense_0.3.model iCG_test/results/antisense_test_reads --dst_dir results/prediction
iCG filter
Description
Fig. 2: Terminology and regular expression used for pattern matching.In order to identify reads containing the region of interest, a regular expression is used for pattern matching that is omitting the immediate sequence context of the position of interest, called "blur" region. The displayed regular expression is resulting from the default parameter settings for the exemplary reference sequence.
Usage & Output
67402 removed (40.81 % of remaining)
97739 remaining (59.19 % of total)
filtering for minimal length of 1887.75
removed 91117 (93.22 % of remaining)
6622 remaining (4.01 % of total)
filtering for maximal length of 2517
removed 1330 (20.08 % of remaining)
5292 remaining (3.20 % of total)
adding fastq information
filtering for sequence context: 15 bases upstream and downstream, with 3 bases of blur and 1 bases blur_deviation
sense pattern: (TCTAGTACCGAA){e<=2}.{6,8}?(CCTATCATCGCT){e<=2}
antisense pattern: (AGCGATGATAGG){e<=2}.{6,8}?(TTCGGTACTAGA){e<=2}
deviations in sequence length of remaining sequences (* selected):
sense antisense
-5: 6 3
-4: 9 9
-3: 34 10
*-2: 71 32
*-1: 135 94
* 0: 875 705
* 1: 84 47
* 2: 44 24
3: 14 12
4: 1 1
5: 4 0
------ ------
* 1209 902
removed 3181 (60.11 % of remaining), 19 due to multiple RE matches
2111 remaining (1.28 % of total)
filtering for context quality higher than 14.0 (* selected)
sense antisense
7-8: 0 1
8-9: 3 1
9-10: 12 5
10-11: 18 6
11-12: 38 24
12-13: 70 41
13-14: 108 82
*14-15: 142 101
*15-16: 193 128
*16-17: 155 109
*17-18: 167 116
*18-19: 127 95
*19-20: 84 78
*20-21: 56 54
*21-22: 25 29
*22-23: 7 19
*23-24: 3 10
*24-25: 1 1
*25-26: 0 2
------ ------
* 960 742
removed 409 (19.37 % of remaining)
1702 remaining (1.03 % of total)
Help pages
usage: filter.py [-h] --bwa_mem_exe BWA_MEM_EXE -r REFERENCE
[--dst_dir_name DST_DIR_NAME]
[--min_mean_qscore MIN_MEAN_QSCORE] [--min_length MIN_LENGTH]
[--max_length MAX_LENGTH] [--radius RADIUS] [--blur BLUR]
[--blur_deviation BLUR_DEVIATION]
[--context_deviation CONTEXT_DEVIATION] [--no_indels]
[--max_length_deviation MAX_LENGTH_DEVIATION]
[--min_mean_context_qscore MIN_MEAN_CONTEXT_QSCORE]
[--greedy_regex_search] [--cpts_limit CPTS_LIMIT]
[--normalization_type {median,pA,pA_raw,none}]
[--processes PROCESSES] [--plot_poi]
[--disp_bases DISP_BASES] [--barcode BARCODE]
[--barcode_deviation BARCODE_DEVIATION]
[basecalled_dirs [basecalled_dirs ...]]
This sub-script takes a set of directories containing basecalled Nanopore reads as input and filters the contained reads by several criteria given as arguments to this script. Afterwards, the remaining reads are resquigled by nanoraw and copied to a new folder inside the basecalling directory.
positional arguments:
optional arguments:
iCG model
Description
Fig. 3: Determination of Mean Normalized Signals at positions -2 to +2.The plot on the left hand side shows the normalized signal traces of reads originating from a DNA sample containing C at the position of interest, plotted against the normalized and corrected sequence position using nanoraw. For positions -2 to +2 relative to the position of interest, the mean normalized signal is calculated for every read individually. The table on the right hand side lists the average characteristics of this group regarding these sequence positions.
Fig. 4: Example output of iCG model.Group means, coefficients of linear discriminants, proportion of trace and dot-plot of linear discriminants as an exemplary result of a linear discriminant analysis of a template dataset containing sequencing reads with natural bases at the position of interest. The portion of traces indicates the effect of each linear discriminant regarding the discrimination of the different groups. In this particular example, first two linear discriminants are sufficient for classification, as the effect of the third linear discriminant is negligible.
Fig. 5: Effect of removing most deviating reads on the normalized signal traces.With an increasing portion of reads that are removed due to their deviation from the median signal at the position of interest, the normalized signals traces become more and more focused to the mean signal of the group being analyzed. Contaminations of previous sequencing libraries are effectively removed.
Fig. 6: Changes in the dot-plot representation of the linear discriminant analysis model with increasing portion of removed, deviating reads.With an increasing portion of reads that are removed due to their deviation from the median signal at the position of interest, the linear discriminant analysis generates linear discriminant coefficients that are increasingly
Usage & Output
Help pages
usage: model.py [-h] -UBP UBP_FLTRD_FAST5_DIR [-AT AT_FLTRD_FAST5_DIR]
[-GC GC_FLTRD_FAST5_DIR] [-TA TA_FLTRD_FAST5_DIR]
[-CG CG_FLTRD_FAST5_DIR] [-NN NN_FLTRD_FAST5_DIR]
[--print_group_characteristics] [--no_data_plots]
[--disp_bases DISP_BASES]
[-q [QUANTILS_TO_REMOVE [QUANTILS_TO_REMOVE ...]]]
[--no_model_plots] [--no_model_stats] [--UBsense UBSENSE]
[--UBantisense UBANTISENSE]
dst_dir
This sub-script creates a linear discriminant analysis model to discriminate between unnatural bases and natural bases at a given position of interest in a reference sequence. Therefore, the script is provided with paths to directories containing reads that were previously filtered and resquiggled by the script "iCG filter".
positional arguments:
optional arguments:
iCG predict
Description
Fig. 5: Example of a base prediction result created by iCG predict. Example for the text based output (left) and dot-plot (right) of a prediction result based on a linear discriminant model which needs only one linear discriminant for base prediction.
Usage & Output
Help pages
usage: predict.py [-h] [--dst_dir DST_DIR] model in_dir
Based on a linear discriminant analysis model previously created by "iCG model", this sub-script classifies the base at a given position of interest for a set of filtered reads as input.
positional arguments:
optional arguments: