Difference between revisions of "Team:Munich/Software"

Latest revision as of 01:47, 2 November 2017


Software ################################################################### # # # CascAID V1.0 # # # # Thu Nov 2 04:23:54 2017 # # # # IGEM Munich 2017 # # # ###################################################################
CascAID is a potentially universal tool for nucleic acid detection. Fast adaptation of our platform to new targets requires in silico verification of the crRNA design. Crucial factors for the development of these crRNA designs are the binding of the crRNA to Cas13a, which is mainly determined by its secondary structure, and the uniqueness of the targeting sequence in the transcriptome (to rule out false positive results). To ensure the integrity of the Cas13a-crRNA complex, we developed a python script that uses the established program packages for secondary structures, NUPACK and Mfold. In order to verify the specificity of the targeting sequence, we used the BLASTN-short program to check for similar structures in a transcriptome databank. Additionally, we created a database of crRNA designs that have already worked and made it as extensive as possible given the limited time, checking for collaboration with other teams working with Cas13a, mainly TU Delft. The second branch of software we developed is needed for hardware control in our project. They allow user's devices such as computers and smartphones to control our hardware, Heatbringer and Lightbringer. The repository to our software can be found here.
crRNA Design Verification There are two main problems regarding the design of crRNA for a diagnostic test. First, the secondary structure of the crRNA needed for Cas13a activity needs to be verified. Secondly, the sequence targeted by the crRNA has to be specific, i.e. there must be no identical sequence in the reference transcriptome of an healthy patient. Otherwise, off-target effects will lead to false positive results since Cas13a is activated even though the pathogen is not present. To address these issues, we developed a software relying on bioinformatic principles such as secondary structure prediction and Basic Local Alignment Search Tool (BLAST).
Secondary Structure Prediction For secondary structure prediction of the crRNA we utilised the two established program packages in the field, NUPACK and Mfold to compare newly designed crRNA with secondary structures of crRNAs that were already known to be active. These reference crRNA structures were either obtained from actual crystallography data of crRNA in complex with Cas13a, or from structure prediction data of experimentally tested crRNAs. Using secondary structure verification we were able to rule out misfolding crRNA designs prior to experiment. We developed a script for the end user automating this procedure.
NUPACK NUPACK is a RNA Secondary Structure Prediction package developed by several contributors under the guidance of Prof. Niles A. Pierce at the California Insitute of Technology (Caltech). The source-code is available free-of-charge for academic usage. NUPACK allows the analysis of the partition function, the minimum free energy and the equillibrium base-pairing probabilities of a RNA sequence. For offline usage, we implemented NUPACK locally. We proceeded to implement Mfold as a webserver request. This decision was made because we experienced that in certain cases, only one of the program packages was able to predict the secondary structure of crRNA as described in previous papers, predominantly the paper of Liu et al. published in Cell in 2017 "Two Distant Catalytic Sites Are Responsible for C2c2 RNase Activities". Also, a local run gives you the possibility of using the full spectrum of NUPACK's programs. By the use of several of the the final structure prediction, we estimated whether the crRNA would be active in Cas13a. Furthermore, we experienced that NUPACK sometimes predicts the right secondary structure, it just doesn't represent the most stable structure. With NUPACK's subopt, it is possible to predict more than just the most stable structure. This enables looking at less stable structures which might be more favourable when bound to the protein and comparing these to the structure databank. The output of a suboptimal prediction is given below as the second example. Explanations are included as comments after '#': % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % 66 # length of the sequence -9.400 # free energy .....................(.((((((.((((....)))).)))))).)............... # secondary structure 22 51 # IDs of bases that form basepairs 24 49 # form basepairs 25 48 # this would mean base 22 26 47 # pairs with base 51 27 46 28 45 29 44 31 42 32 41 33 40 34 39 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % 66 -9.300 .....................(((((((..((((....)))).)))))))................ 22 50 23 49 24 48 25 47 26 46 27 45 28 44 31 42 32 41 33 40 34 39 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % From this, we can extract the secondary structure in Vienna notation as well as the free energies of the RNA structure to predict the probability of formation in solution with help of the calculation of the full partition function. Using these results, the user can make qualitative assumptions about the activity of the corresponding Cas13a-crRNA complex.
Mfold Mfold is a webserver for RNA secondary structure prediction developed by Michael Zuker based on his paper "Mfold web server for nucleic acid folding and hybridization prediction" that was published in Nucleic Acids Research in 2003. Since Mfold is not available as a locally buildable binary for every operating system, we developed a script that automatically requests a standardised RNA Fold job from the server, therefore making it available throughout all operating systems. Using the result obtained from this request, the secondary structure is checked via a string comparison in so-called "Vienna" notation. This notation gives base pairing as a string of dots and brackets where a dot represents a non-bonded base and brackets represents paired bases, clarified by a opening bracket "(" at the 5'-end and a closing bracket ")" at the 3'-end of each paired sequence. An example for the output of the program is given below: ####################################################################################### #################### CascAID Secondary Structure Verification ######################### ####################################################################################### ####################################################################################### ##################### NUPACK Secondary Structure Verification ######################### ####################################################################################### GOOD NEWS! YOU'VE GOT THE RIGHT SECONDARY STRUCTURE! YOUR SEQUENCE WAS: 5' GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGGC 3' (((((....((((.........)))).))))) #### MATCHED SECONDARY STRUCTURE ...(((((....((((.........)))).)))))..((((..........))))......... #### PREDICTED SECONDARY STRUCTURE ___________________________________________________________________ YOUR BACKBONE SEQUENCE HAS BEEN FOUND IN THE DATABANK IT CORRESPONDS TO THE BACKBONE SEQUENCE OF: lwaCas13a ______________________________________________________________________________________ Job ended normally. Sun Oct 29 23:46:28 2017 Do you have internet connectivity? [yes/no]yes ####################################################################################### #################### MFOLD SECONDARY STRUCTURE VERIFICATION ########################### ####################################################################################### #################### CAUTION! ##################### mFOLD SECONDARY STRUCTURE DOES NOT FIT OUR DATA BANK #################### CAUTION! ##################### YOUR SEQUENCE AND MOST STABLE PREDICTED STRUCTURE IS: 5' GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGG 3' ..(((((((((.((((.........)))).......((.....)).....)))).))))).... ______________________________________________________________________________________ Job ended normally. Sun Oct 29 23:47:06 2017 This is also a good example to show that it might happen that one program recognizes the crRNA secondary structure while the other does not. In this case, NUPACK has predicted the structure while Mfold is not able to predict the structure. Even though this is an experimental construct that worked, we did not put the secondary structure prediction of this into the database for Mfold, since it was unable to predict the right structure.
Off-Target Effects In order to rule out off-target effects for the designed crRNA in diagnostic applications, we developed a script that is able to BLAST the sequence either against whole databases online or a custom database we compiled. This database contains the human transcriptome and those of bacteria common in the human nasal tract as well as modell organisms used in our project: Homo Sapiens Escherichia Coli Bacillus subtilis Staphylococcus aureus Corynebacterium diphtheriae Streptococcus diphtheriae Haemophillus influenzae Transcriptomes that are common in the nasal tract but were not available are, among others: Neisseria family Staphylococcus epidermidis Streptococcus pyogenes All data was retreived from the Transcriptome Release #90 of the ENSEMBL project. The output is generated from the output of a blastn-short run and consists in the example below of all sequences that show sequence identity of 18 bp or higher. For an actual run, the identity would need to be 26 bp or higher in order to actually show off-target effects since Cas13a is selective up to 2 point mutations regarding the binding of crRNA and subsequent RNase activity. The expectation value here describes the number of hits one can expect to find in a random database the same size as the database used for the blastn-short run. ################################################################## ####################### Input Sequence ########################### ################################################################## Your Sequence was: GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGGC Your target sequence thus is: UGAUAAAGAAGACAGUCAUAAGUGCGGC Sun Oct 29 23:49:22 2017 ################################################################## ################################################################## ####### Following possible off-targets have been identified ###### ################################################################## >seq 0 # ID of alignment sequence:gnl\|BL_ORD_ID\|91933 ENSBTAT00000042836.3 cds chromosome: # location of gene UMD3.1:3:107697178:107698719:1 gene:ENSBTAG00000030338.3 # gene name and ID gene_biotype:protein_coding transcript_biotype:protein_coding # type of gene gene_symbol:GJA9 description:gap junction protein alpha 9 # Gene description [Source:HGNC Symbol;Acc:HGNC:19155] length:1542 # Length of gene e value:1.50536 # expectation value identity:18 # sequence identity count ATAAAGAAGACAGTCATAA... # alignment \|\|\|\|\|\|\|\|\|\|\|\| \|\|\|\|\|\|... # match/mismatches ATAAAGAAGACACTCATAA... # alignment >seq 1 sequence:gnl\|BL_ORD_ID\|69018 ENSBTAT00000042836.3 cdna chromosome:UMD3.1:3:107697178:107698719:1 gene:ENSBTAG00000030338.3 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:GJA9 description:gap junction protein alpha 9 [Source:HGNC Symbol;Acc:HGNC:19155] length:1542 e value:1.50536 identity:18 ATAAAGAAGACAGTCATAA... \|\|\|\|\|\|\|\|\|\|\|\| \|\|\|\|\|\|... ATAAAGAAGACACTCATAA... ___________________________________________________________ Job ended normally. Sun Oct 29 23:49:22 2017 These results have also been saved in: off_target.out The full BLAST output can be found in: seq.xml
Database The database program gives you an interface to interact with the MySQL database created for crRNAs that have been shown to work experimentally . ############ Available Detection Targets ################# [1] Virus [2] Bacteria [0] Go back one step What would you like to detect?2 ############ Available Detection Targets ################# [1] Escherichia coli [2] Bacillus subtillis [0] Go back one step What would you like to detect?1 ############ Choose your Target ################# [1] E. Coli 16s rRNA [0] Go back one step What would you like to detect?1 ########### The sequence thou art looking for is : ################ ACUUUACUCCCUUCCUCCCCGCUGAAA [9] Exit [0] Go back one step However, these still need to be tested for off-target effects experimentally since in silico screening can only confirm specificity to a certain amount of certainty.
Hardware control The software for reading out the fluorescence detector is described in the Hardware section. All software developed for hardware can be found in our GitHub repository.
References M. Dirks, J. S. Bois, J. M. Schaeffer, E. Winfree, and N. A. Pierce. "Thermodynamic analysis of interacting nucleic acid strands."(2007) SIAM Rev, 49:65-88. R. M. Dirks and N. A. Pierce. "An algorithm for computing nucleic acid base-pairing probabilities including pseudoknots." (2004) J. Comput. Chem., 25:1295-1304. R. M. Dirks and N. A. Pierce. "A partition function algorithm for nucleic acid secondary structure including pseudoknots." (2003) J Comput Chem, 24:1664-1677. M. Zuker, D. H. Mathews and D. H. Turner. "Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide" (1999) RNA Biochemistry and Biotechnology 11-43 J. Barciszewski and B. F. C. Clark, eds., NATO ASI Series, Kluwer Academic Publishers, Dordrecht, NL J.-M. Rouillard, M. Zuker and E. Gulari. "OligoArray 2.0: Thermodynamicaly improved oligonucleotide design for microarrays." (2003) Nucleic Acids Res. 31:12, 3057-3062. S.F. Altschul, W. Gish, W. Miller, E.W. Myers and D.J. Lipman "Basic local alignment search tool." (1990) J. Mol. Biol. 215:403-410.

@@ Line 24: / Line 24: @@
 #myContent *{
-  color: #919191;
+  color: #444444;
 }
@@ Line 77: / Line 77: @@
 <tr><td colspan=6 align=left valign=center>
 <font size=7 color=#51a7f9><b style="color: #51a7f9">Software</b></font>
+<pre>
+                    ###################################################################
+                    #                                                                 #
+                    #                        CascAID V1.0                             #
+                    #                                                                 #
+                    #                   Thu Nov  2 04:23:54 2017                      #
+                    #                                                                 #
+                    #                      IGEM Munich 2017                           #
+                    #                                                                 #
+                    ###################################################################
+</pre>
 </td>
 </tr>
@@ Line 82: / Line 95: @@
 	<td  colspan = 6 align="left">
 		<p class="introduction">
-We mainly developed two branches of Software needed for our project. On the one hand, we developed Software to allow user's devices such as Computers and Smartphones to control our Hardware's devices, Heatbringer and Lightbringer. On the other hand, we used scripting in order to improve the performance of the Cas13a protein regarding a diagnostic device test. This involved the post-design verification of crRNA regarding secondary structure and transcriptomal uniqueness as well as the development of a database of crRNA designs that have already worked. We tried to make the latter as extensive as possible given the limited time, checking for collaboration with other teams working with Cas13a, mainly TU Delft.
+CascAID is a potentially universal tool for nucleic acid detection.
+Fast adaptation of our platform to new targets requires <i>in silico</i> verification of the crRNA design.
+Crucial factors for the development of these crRNA designs are the binding of the crRNA to Cas13a, which is
+mainly determined by its secondary structure, and the uniqueness of the targeting sequence in the transcriptome (to rule out false positive results). To ensure the integrity of the Cas13a-crRNA complex, we developed
+a python script that uses the established program packages for secondary structures, NUPACK and Mfold.
+In order to verify the specificity of the targeting sequence, we used the BLASTN-short program to
+check for similar structures in a transcriptome databank. Additionally, we created a database of crRNA designs
+that have already worked and made it
+as extensive as possible given the limited time, checking for collaboration with other teams working with Cas13a,
+mainly TU Delft.
+The second branch of software we developed is needed for hardware control in our project.
+They allow user's devices such as computers and smartphones to control
+our hardware, Heatbringer and Lightbringer.
+The repository to our software can be found <a class="myLink" href="https://github.com/igemsoftware2017/igem_munich_2017">here</a>.
                  </p>
 	</td>
@@ Line 95: / Line 121: @@
 <h3>crRNA Design Verification</h3>
 <p>
-There are two main problems regarding the crRNA design of Cas13a for a diagnostic device. First of all, one needs to make sure that the secondary structure of the crRNA needed for Cas13a activity is achieved. Second, one needs to make sure that the sequence targeted by the crRNA is specific, i.e. there is no off-target effects in the transcriptome of the organisms present in the sample. If this is not the case, false positive results will occur. The software we developed relies mainly on bioinformatic principles such as Secondary Structure Prediction and Basic Local Alignment Searches Tools (BLAST).
+There are two main problems regarding the design of crRNA for a diagnostic test.
+First, the secondary structure of the crRNA needed for Cas13a activity needs to be verified.
+Secondly, the sequence targeted by the crRNA has to be specific, i.e. there must be no identical sequence in the
+reference transcriptome of an healthy patient. Otherwise, off-target effects will lead to
+false positive results since Cas13a is activated even though the pathogen is not present.
+To address these issues, we developed a software relying on bioinformatic principles such as
+secondary structure prediction and Basic Local Alignment Search Tool (BLAST).
 </p>
 </td>
@@ Line 109: / Line 141: @@
 <h3>Secondary Structure Prediction</h3>
 <p>
-For secondary structure prediction of the crRNA we utilised the two mainly used porgram packages in the field, NUPACK and Mfold. With the help of these packages, we were able to compare newly designed crRNA with secondary structures of crRNAs that were already known to be active, either from actual crystallography data of crRNA in complex with Cas13a, or from structure prediction data of experimentally tested crRNAs. Through this, we could prior to experiments already sort out certain crRNA designs that would not fit the secondary structures. We developed a script for the end user automatising this procedure.
+For secondary structure prediction of the crRNA we utilised the two established program packages
-</p>
+in the field, NUPACK and Mfold to compare newly designed crRNA with secondary structures of crRNAs that
-</td>
+were already known to be active. These reference crRNA structures were either obtained from actual
-</tr>
+crystallography data of crRNA in complex with Cas13a, or from structure prediction data of experimentally
+tested crRNAs. Using secondary structure verification we were able to rule out misfolding crRNA
-<tr class="lastRow"><td colspan=6 align=center valign=center>
+designs prior to experiment. We developed a script for the end user automating this procedure.
-<h4>Mfold</h4>
-<br>
-<p>
-Mfold is a webserver for RNA secondary structure prediction developed by Michael Zuker based on his paper "Mfold web server for nucleic acid folding and hybridization prediction" that published in <i>Nucleic Acids Research</i>  in 2003. Since Mfold is not available as a locally buildable binary for every operating system, we developed a script that automatically requests a standardised RNA Fold job to the server, therefore making it available throughout all operating systems. Using the result obtained from this request, the secondary structure is checked via a string comparison in so-called "Vienna" notation. This notation gives base pairing as a string of dots and brackets where a dot represents a non-bonded base and brackets form the base-pairs, clarified by a opening bracket "(" at the 5'-end of the base-pair and a closing bracket ")" at the 3'-end. An example taken from the sample output of the program is given below:
-<pre style="text-align: left;">
-Example 1: Secondary Structure Prediction
-NICE! YOU'VE GOT THE RIGHT SECONDARY STRUCTURE!
-YOUR SEQUENCE WAS:
-GAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACACUUUACUCCCUUCCUCCCCGCUGAAAGAU
-                     (.((((((.((((....)))).)))))).)                   ######## MATCHED SECONDARY STRUCTURE
-.....................(.((((((.((((....)))).)))))).)..............     ######## PREDICTED SECONDARY STRUCTURE
-YOUR BACKBONE SEQUENCE HAS BEEN FOUND IN THE DATABANK
-IT CORRESPONDS TO THE BACKBONE SEQUENCE OF: lwaCas13a
-</pre>
-</p>
-<p>
-A more visual output from Mfold is in progress, though not needed for the preliminary usage of the program.
 </p>
 </td>
@@ Line 142: / Line 155: @@
 <br>
 <p>
-For offline usage and second validation, we implemented NUPACK locally. This decision was made because we experienced that in certain cases, only one of the program packages was able to predict the secondary structure of crRNA as described in previous papers, predominantly the paper of Liu et al. published in <i>Cell</i> in 2017 "Two Distant Catalytic Sites Are Responsible for C2c2 RNase Activities". Also, it gives you the opportunity to use the program without access to the internet. NUPACK is a RNA Secondary Structure Prediction program package developed by several contributors under the guidance of Prof. Niles A. Pierce at the California Insitute of Technology (Caltech). The source-code is available free-of-charge for academic usage. We implemented it on a Mac running Mac OS Sierra. NUPACK allows the analysis of the partition function, the minimum free energy and the equillibrium base-pairing probabilities of a RNA sequence. By the use of several of these parameters and the final structure prediction, we estimated whether the crRNA would be active in Cas13a. Furthermore, it is possible to predict more than just the most stable structure. This enables looking at less stable structures since the protein may compensate for non-ideal structures by giving the right environment for stabilisation. The output of a suboptimal prediction is given in Example 2:
+NUPACK is a RNA Secondary Structure Prediction package developed
+by several contributors under the guidance of Prof. Niles A. Pierce at the California Insitute of Technology (Caltech).
+The source-code is available free-of-charge for academic usage.
+NUPACK allows the analysis of the partition function, the minimum free energy and the equillibrium base-pairing
+probabilities of a RNA sequence.
+For offline usage, we implemented NUPACK locally. We proceeded to implement Mfold as a webserver request.
+This decision was made because we experienced that in certain cases, only one of the program packages
+was able to predict the secondary structure of crRNA as described in previous papers, predominantly the paper of Liu et al. published in <i>Cell</i> in 2017
+"Two Distant Catalytic Sites Are Responsible for C2c2 RNase Activities". Also, a local run gives you
+the possibility of using the full spectrum of NUPACK's programs.
+By the use of several of the the final structure prediction, we estimated whether the
+crRNA would be active in Cas13a.
+Furthermore, we experienced that NUPACK sometimes predicts the right secondary structure, it just doesn't represent
+the most stable structure. With NUPACK's subopt, it is possible to predict more than just
+the most stable structure. This enables looking at less stable structures which might be more favourable when bound to the protein and comparing these to the
+structure databank. The output of a suboptimal prediction
+is given below as the second example. Explanations are included as comments after '#':
 <pre style="text-align: left;">
 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
+                                                                # length of the sequence
--9.400
+-9.400                                                             # free energy
-.....................(.((((((.((((....)))).)))))).)...............
+.....................(.((((((.((((....)))).)))))).)............... # secondary structure
-      51
+      51                                                         # IDs of bases that form basepairs
-      49
+      49                                                         # form basepairs
-      48
+      48                                                         # this would mean base 22
-      47
+      47                                                         # pairs with base 51
       46
       45
@@ Line 183: / Line 213: @@
 <p>
-From this, one can extract the secondary structure in Vienna notation as well as the Free Energies of the RNA structure to predict the probability of formation in solution with help of the calculation of the full partition function. Using these, we predicted qualitative activity of the corresponding Cas13a-crRNA complex.
+From this, we can extract the secondary structure in Vienna notation as well as the free energies
+of the RNA structure to predict the probability of formation in solution with help of the calculation
+of the full partition function. Using these results, the user can make qualitative assumptions about
+the activity of the corresponding Cas13a-crRNA complex.
 </p>
 </td>
 </tr>
+<tr class="lastRow"><td colspan=6 align=center valign=center>
+<h4>Mfold</h4>
+<br>
+<p>
+Mfold is a webserver for RNA secondary structure prediction developed by Michael Zuker based on his paper
+"Mfold web server for nucleic acid folding and hybridization prediction" that was published in <i>Nucleic Acids Research</i>
+in 2003. Since Mfold is not available as a locally buildable binary for every operating system, we developed a
+script that automatically requests a standardised RNA Fold job from the server, therefore making it available
+throughout all operating systems. Using the result obtained from this request, the secondary structure is
+checked via a string comparison in so-called "Vienna" notation. This notation gives base pairing as a string
+of dots and brackets where a dot represents a non-bonded base and brackets represents paired bases, clarified by
+a opening bracket "(" at the 5'-end and a closing bracket ")" at the 3'-end of each paired sequence. An example for the output
+of the program is given below:
+<pre style="text-align: left;">
+#######################################################################################
+#################### CascAID Secondary Structure Verification #########################
+#######################################################################################
+#######################################################################################
+##################### NUPACK Secondary Structure Verification #########################
+#######################################################################################
+GOOD NEWS! YOU'VE GOT THE RIGHT SECONDARY STRUCTURE!
+YOUR SEQUENCE WAS:
+' GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGGC  3'
+      (((((....((((.........)))).)))))                               ####    MATCHED SECONDARY STRUCTURE
+   ...(((((....((((.........)))).)))))..((((..........)))).........  ####  PREDICTED SECONDARY STRUCTURE
+___________________________________________________________________
+		YOUR BACKBONE SEQUENCE HAS BEEN FOUND IN THE DATABANK
+		IT CORRESPONDS TO THE BACKBONE SEQUENCE OF: lwaCas13a
+______________________________________________________________________________________
+Job ended normally. Sun Oct 29 23:46:28 2017
+Do you have internet connectivity? [yes/no]yes
+#######################################################################################
+#################### MFOLD SECONDARY STRUCTURE VERIFICATION ###########################
+#######################################################################################
+		#################### CAUTION! #####################
+		mFOLD SECONDARY STRUCTURE DOES NOT FIT OUR DATA BANK
+		#################### CAUTION! #####################
+YOUR SEQUENCE AND MOST STABLE PREDICTED STRUCTURE IS:
+' GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGG 3'
+   ..(((((((((.((((.........)))).......((.....)).....)))).)))))....
+______________________________________________________________________________________
+Job ended normally. Sun Oct 29 23:47:06 2017
+</pre>
+</p>
+<p>
+This is also a good example to show that it might happen that one program recognizes the
+crRNA secondary structure while the other does not. In this case, NUPACK has predicted the structure
+while Mfold is not able to predict the structure. Even though this is an experimental construct
+that worked, we did not put the secondary structure prediction of this into the database for Mfold,
+since it was unable to predict the right structure.
+</p>
+</td>
+</tr>
@@ Line 194: / Line 304: @@
 <p>
-In order to rule out off-target effects for the designed crRNA in diagnostic applications, we developed a script that is able to blast the sequence against either whole databases online or a sub-database we created from transcriptome data of human and bacterial transcriptomes that are commonly found inside the nose and modell organisms used in our project including:
+In order to rule out off-target effects for the designed crRNA in diagnostic applications,
+we developed a script that is able to BLAST the sequence either against whole databases
+online or a custom database we compiled. This database contains the human transcriptome and those of bacteria common in the human nasal tract as well as modell organisms used in our project:
 <ol style="list-style-type:disc; list-style-position:left; text-align: left;">
 <li>Homo Sapiens</li>
@@ Line 206: / Line 318: @@
 </p>
 <p>
-Transcriptomes that would be necessary but were not available are:
+Transcriptomes that are common in the nasal tract but were not available are,
+among others:
 </p>
 <ol style="list-style-type:disc; list-style-position:left; text-align: left;">
@@ Line 215: / Line 328: @@
 <br>
 <p>
-All data was retreived from www.ensembl.org webpage from the Transcriptome Release #90.
+All data was retreived from the Transcriptome Release #90 of the ENSEMBL project. The output is generated
+from the output of a blastn-short run and consists in the example below of all sequences that show sequence identity of 18 bp or
+higher. For an actual run, the identity would need to be 26 bp or higher in order to actually show off-target effects since Cas13a is
+selective up to 2 point mutations regarding the binding of crRNA and subsequent RNase activity.
+The expectation value here describes the number of hits one can expect to find
+in a random database the same size as the database used for the blastn-short run.
 </p>
 </p>
 <pre style="text-align: left;">
+##################################################################
+####################### Input Sequence ###########################
+##################################################################
+Your Sequence was:
+GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGGC
+Your target sequence thus is:
+UGAUAAAGAAGACAGUCAUAAGUGCGGC
+Sun Oct 29 23:49:22 2017
+##################################################################
 ##################################################################
 ####### Following possible off-targets have been identified ######
 ##################################################################
->seq 0
-sequence:gnl|BL_ORD_ID|2 KJJ58724 cdna:annotated supercontig:ASM95397v1:scaffold_31:1584:1937:1
-gene:NG01_11520 gene_biotype:protein_coding
-transcript_biotype:protein_coding description:hypothetical protein
-length:354
-e value:2.42551e-24
-identity:60
-GTGTCCGTTGAGACCCTTGCCAGCAACCATGTCGATCCGCTCCCCGAATCCGTTGCGTCT...
-||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||...
-GTGTCCGTTGAGACCCTTGCCAGCAACCATGTCGATCCGCTCCCCGAATCCGTTGCGTCT...
-</pre>
-</td>
+>seq 0                                                               # ID of alignment
+sequence:gnl|BL_ORD_ID|91933 ENSBTAT00000042836.3 cds chromosome:    # location of gene
+UMD3.1:3:107697178:107698719:1 gene:ENSBTAG00000030338.3             # gene name and ID
+gene_biotype:protein_coding transcript_biotype:protein_coding        # type of gene
+gene_symbol:GJA9 description:gap junction protein alpha 9            # Gene description
+[Source:HGNC Symbol;Acc:HGNC:19155]
+length:1542                                                          # Length of gene
+e value:1.50536                                                      # expectation value
+identity:18                                                          # sequence identity count
+ATAAAGAAGACAGTCATAA...                                               # alignment
+|||||||||||| ||||||...                                               # match/mismatches
+ATAAAGAAGACACTCATAA...                                               # alignment
+>seq 1
+sequence:gnl|BL_ORD_ID|69018 ENSBTAT00000042836.3 cdna
+chromosome:UMD3.1:3:107697178:107698719:1 gene:ENSBTAG00000030338.3
+gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:GJA9
+description:gap junction protein alpha 9
+[Source:HGNC Symbol;Acc:HGNC:19155]
+length:1542
+e value:1.50536
+identity:18
+ATAAAGAAGACAGTCATAA...
+|||||||||||| ||||||...
+ATAAAGAAGACACTCATAA...
+___________________________________________________________
+Job ended normally. Sun Oct 29 23:49:22 2017
+These results have also been saved in:              off_target.out
+The full BLAST output can be found in:                     seq.xml
+</pre>
 </tr>
+</td>
@@ Line 242: / Line 400: @@
 <br>
 <p>
-The database program gives you an interface to interact with the MySQL database created for crRNAs that have been shown experimentally to work.
+The database program gives you an interface to interact with the MySQL database created for
+crRNAs that have been shown to work experimentally .
 </p>
 <pre style="text-align: left;">
-###################################################################
-##############        Welcome to CasCAID2GO      ##################
-###################################################################
+############         Available Detection Targets         #################
-############          Target clarified           #################
 [1] Virus
 [2] Bacteria
-[3] Resistance
 [0] Go back one step
@@ Line 260: / Line 414: @@
 What would you like to detect?2
-############          Target clarified           #################
+############         Available Detection Targets         #################
-[1] E. Coli
+[1] Escherichia coli
+[2] Bacillus subtillis
 [0] Go back one step
@@ Line 268: / Line 423: @@
 What would you like to detect?1
-############        Specific Target chosen               ################
+############             Choose your Target              #################
+[1] E. Coli 16s rRNA
-[1] rRNA Ribosome
 [0] Go back one step
@@ Line 280: / Line 433: @@
 ###########      The sequence thou art looking for is : ################
-GTGTGAGCTCCTAATACGACTCACTATAGGGACCACCCCAAAAATGAAGGGGACTAAAACAACTTTACTCCCTTCCTCCCCGCTGAAAGAT
+ACUUUACUCCCUUCCUCCCCGCUGAAA
-[1] Order from IDT
 [9] Exit
 [0] Go back one step
 </pre>
 <p>
-However, these still need to be tested for off-target effects experimentally since <i>in silico</i> screening can only confirm specificity to a certain amount of certainty.
+However, these still need to be tested for off-target effects experimentally since <i>in silico</i>
+screening can only confirm specificity to a certain amount of certainty.
 </p>
 </td>
 </tr>
+<tr><td colspan=6 align=center valign=center>
+<h3>Hardware control</h3>
+<p>
+The software for reading out the fluorescence detector is described in the
+<a class="myLink" href="https://2017.igem.org/Team:Munich/Hardware/Detector">Hardware</a> section.
+All software developed for hardware can be found in our <a class="myLink" href="https://github.com/igemsoftware2017/igem_munich_2017">GitHub</a> repository.
+</p>
+</td>
+</tr>
+<tr><td colspan=6 align=center valign=center>
+<h3>References</h3>
+<p>
+    <ol style="text-align: left">
+    	<li id="ref_1">M. Dirks, J. S. Bois, J. M. Schaeffer, E. Winfree, and N. A. Pierce.
+    	"Thermodynamic analysis of interacting nucleic acid strands."(2007) <i>SIAM Rev</i>, 49:65-88.</li>
+		<li id="ref_2">R. M. Dirks and N. A. Pierce. "An algorithm for computing nucleic acid base-pairing probabilities including pseudoknots."
+		(2004) <i>J. Comput. Chem.</i>, 25:1295-1304.</li>
+		<li id="ref_3">R. M. Dirks and N. A. Pierce. "A partition function algorithm for nucleic acid secondary structure including pseudoknots."
+		(2003) <i>J Comput Chem</i>, 24:1664-1677.</li>
+		<li id="ref_4">M. Zuker, D. H. Mathews and D. H. Turner. "Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide"
+		(1999) <i>RNA Biochemistry and Biotechnology</i> 11-43 J. Barciszewski and B. F. C. Clark, eds.,
+		NATO ASI Series, Kluwer Academic Publishers, Dordrecht, NL </li>
+		<li id="ref_5">J.-M. Rouillard, M. Zuker and E. Gulari. "OligoArray 2.0: Thermodynamicaly improved
+		oligonucleotide design for microarrays." (2003) <i>Nucleic Acids Res.</i> 31:12, 3057-3062. </li>
+		<li>S.F. Altschul, W. Gish, W. Miller, E.W. Myers and D.J. Lipman "Basic local alignment search tool." (1990)
+		 <i>J. Mol. Biol. </i> 215:403-410.</li>
+    </ol>
+</p>
+</td>
+</tr>
 <tr><td class="no-padding" colspan=6 align=right valign=center height=10>

Difference between revisions of "Team:Munich/Software"

Latest revision as of 01:47, 2 November 2017

crRNA Design Verification

Secondary Structure Prediction

NUPACK

Mfold

Off-Target Effects

Database

Hardware control

References