(38 intermediate revisions by 5 users not shown) | |||
Line 24: | Line 24: | ||
#myContent *{ | #myContent *{ | ||
− | color: # | + | color: #444444; |
} | } | ||
Line 77: | Line 77: | ||
<tr><td colspan=6 align=left valign=center> | <tr><td colspan=6 align=left valign=center> | ||
<font size=7 color=#51a7f9><b style="color: #51a7f9">Software</b></font> | <font size=7 color=#51a7f9><b style="color: #51a7f9">Software</b></font> | ||
+ | <pre> | ||
+ | ################################################################### | ||
+ | # # | ||
+ | # CascAID V1.0 # | ||
+ | # # | ||
+ | # Thu Nov 2 04:23:54 2017 # | ||
+ | # # | ||
+ | # IGEM Munich 2017 # | ||
+ | # # | ||
+ | ################################################################### | ||
+ | |||
+ | |||
+ | </pre> | ||
</td> | </td> | ||
</tr> | </tr> | ||
Line 82: | Line 95: | ||
<td colspan = 6 align="left"> | <td colspan = 6 align="left"> | ||
<p class="introduction"> | <p class="introduction"> | ||
− | + | CascAID is a potentially universal tool for nucleic acid detection. | |
+ | Fast adaptation of our platform to new targets requires <i>in silico</i> verification of the crRNA design. | ||
+ | Crucial factors for the development of these crRNA designs are the binding of the crRNA to Cas13a, which is | ||
+ | mainly determined by its secondary structure, and the uniqueness of the targeting sequence in the transcriptome (to rule out false positive results). To ensure the integrity of the Cas13a-crRNA complex, we developed | ||
+ | a python script that uses the established program packages for secondary structures, NUPACK and Mfold. | ||
+ | In order to verify the specificity of the targeting sequence, we used the BLASTN-short program to | ||
+ | check for similar structures in a transcriptome databank. Additionally, we created a database of crRNA designs | ||
+ | that have already worked and made it | ||
+ | as extensive as possible given the limited time, checking for collaboration with other teams working with Cas13a, | ||
+ | mainly TU Delft. | ||
+ | The second branch of software we developed is needed for hardware control in our project. | ||
+ | They allow user's devices such as computers and smartphones to control | ||
+ | our hardware, Heatbringer and Lightbringer. | ||
+ | The repository to our software can be found <a class="myLink" href="https://github.com/igemsoftware2017/igem_munich_2017">here</a>. | ||
</p> | </p> | ||
</td> | </td> | ||
Line 95: | Line 121: | ||
<h3>crRNA Design Verification</h3> | <h3>crRNA Design Verification</h3> | ||
<p> | <p> | ||
− | There are two main problems regarding the | + | There are two main problems regarding the design of crRNA for a diagnostic test. |
+ | First, the secondary structure of the crRNA needed for Cas13a activity needs to be verified. | ||
+ | Secondly, the sequence targeted by the crRNA has to be specific, i.e. there must be no identical sequence in the | ||
+ | reference transcriptome of an healthy patient. Otherwise, off-target effects will lead to | ||
+ | false positive results since Cas13a is activated even though the pathogen is not present. | ||
+ | To address these issues, we developed a software relying on bioinformatic principles such as | ||
+ | secondary structure prediction and Basic Local Alignment Search Tool (BLAST). | ||
</p> | </p> | ||
</td> | </td> | ||
Line 109: | Line 141: | ||
<h3>Secondary Structure Prediction</h3> | <h3>Secondary Structure Prediction</h3> | ||
<p> | <p> | ||
− | For secondary structure prediction of the crRNA we utilised the two | + | For secondary structure prediction of the crRNA we utilised the two established program packages |
− | + | in the field, NUPACK and Mfold to compare newly designed crRNA with secondary structures of crRNAs that | |
− | + | were already known to be active. These reference crRNA structures were either obtained from actual | |
− | + | crystallography data of crRNA in complex with Cas13a, or from structure prediction data of experimentally | |
− | + | tested crRNAs. Using secondary structure verification we were able to rule out misfolding crRNA | |
− | + | designs prior to experiment. We developed a script for the end user automating this procedure. | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
</p> | </p> | ||
</td> | </td> | ||
Line 142: | Line 155: | ||
<br> | <br> | ||
<p> | <p> | ||
− | For offline usage | + | NUPACK is a RNA Secondary Structure Prediction package developed |
+ | by several contributors under the guidance of Prof. Niles A. Pierce at the California Insitute of Technology (Caltech). | ||
+ | The source-code is available free-of-charge for academic usage. | ||
+ | NUPACK allows the analysis of the partition function, the minimum free energy and the equillibrium base-pairing | ||
+ | probabilities of a RNA sequence. | ||
+ | For offline usage, we implemented NUPACK locally. We proceeded to implement Mfold as a webserver request. | ||
+ | This decision was made because we experienced that in certain cases, only one of the program packages | ||
+ | was able to predict the secondary structure of crRNA as described in previous papers, predominantly the paper of Liu et al. published in <i>Cell</i> in 2017 | ||
+ | "Two Distant Catalytic Sites Are Responsible for C2c2 RNase Activities". Also, a local run gives you | ||
+ | the possibility of using the full spectrum of NUPACK's programs. | ||
+ | By the use of several of the the final structure prediction, we estimated whether the | ||
+ | crRNA would be active in Cas13a. | ||
+ | Furthermore, we experienced that NUPACK sometimes predicts the right secondary structure, it just doesn't represent | ||
+ | the most stable structure. With NUPACK's subopt, it is possible to predict more than just | ||
+ | the most stable structure. This enables looking at less stable structures which might be more favourable when bound to the protein and comparing these to the | ||
+ | structure databank. The output of a suboptimal prediction | ||
+ | is given below as the second example. Explanations are included as comments after '#': | ||
+ | |||
<pre style="text-align: left;"> | <pre style="text-align: left;"> | ||
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % | % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % | ||
− | 66 | + | 66 # length of the sequence |
− | -9.400 | + | -9.400 # free energy |
− | .....................(.((((((.((((....)))).)))))).)............... | + | .....................(.((((((.((((....)))).)))))).)............... # secondary structure |
− | 22 51 | + | 22 51 # IDs of bases that form basepairs |
− | 24 49 | + | 24 49 # form basepairs |
− | 25 48 | + | 25 48 # this would mean base 22 |
− | 26 47 | + | 26 47 # pairs with base 51 |
27 46 | 27 46 | ||
28 45 | 28 45 | ||
Line 183: | Line 213: | ||
<p> | <p> | ||
− | From this, | + | From this, we can extract the secondary structure in Vienna notation as well as the free energies |
+ | of the RNA structure to predict the probability of formation in solution with help of the calculation | ||
+ | of the full partition function. Using these results, the user can make qualitative assumptions about | ||
+ | the activity of the corresponding Cas13a-crRNA complex. | ||
</p> | </p> | ||
</td> | </td> | ||
</tr> | </tr> | ||
+ | |||
+ | <tr class="lastRow"><td colspan=6 align=center valign=center> | ||
+ | <h4>Mfold</h4> | ||
+ | <br> | ||
+ | <p> | ||
+ | Mfold is a webserver for RNA secondary structure prediction developed by Michael Zuker based on his paper | ||
+ | "Mfold web server for nucleic acid folding and hybridization prediction" that was published in <i>Nucleic Acids Research</i> | ||
+ | in 2003. Since Mfold is not available as a locally buildable binary for every operating system, we developed a | ||
+ | script that automatically requests a standardised RNA Fold job from the server, therefore making it available | ||
+ | throughout all operating systems. Using the result obtained from this request, the secondary structure is | ||
+ | checked via a string comparison in so-called "Vienna" notation. This notation gives base pairing as a string | ||
+ | of dots and brackets where a dot represents a non-bonded base and brackets represents paired bases, clarified by | ||
+ | a opening bracket "(" at the 5'-end and a closing bracket ")" at the 3'-end of each paired sequence. An example for the output | ||
+ | of the program is given below: | ||
+ | <pre style="text-align: left;"> | ||
+ | |||
+ | ####################################################################################### | ||
+ | #################### CascAID Secondary Structure Verification ######################### | ||
+ | ####################################################################################### | ||
+ | |||
+ | |||
+ | |||
+ | ####################################################################################### | ||
+ | ##################### NUPACK Secondary Structure Verification ######################### | ||
+ | ####################################################################################### | ||
+ | |||
+ | |||
+ | GOOD NEWS! YOU'VE GOT THE RIGHT SECONDARY STRUCTURE! | ||
+ | YOUR SEQUENCE WAS: | ||
+ | |||
+ | 5' GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGGC 3' | ||
+ | |||
+ | (((((....((((.........)))).))))) #### MATCHED SECONDARY STRUCTURE | ||
+ | ...(((((....((((.........)))).)))))..((((..........))))......... #### PREDICTED SECONDARY STRUCTURE | ||
+ | ___________________________________________________________________ | ||
+ | |||
+ | YOUR BACKBONE SEQUENCE HAS BEEN FOUND IN THE DATABANK | ||
+ | IT CORRESPONDS TO THE BACKBONE SEQUENCE OF: lwaCas13a | ||
+ | ______________________________________________________________________________________ | ||
+ | |||
+ | Job ended normally. Sun Oct 29 23:46:28 2017 | ||
+ | Do you have internet connectivity? [yes/no]yes | ||
+ | |||
+ | |||
+ | ####################################################################################### | ||
+ | #################### MFOLD SECONDARY STRUCTURE VERIFICATION ########################### | ||
+ | ####################################################################################### | ||
+ | |||
+ | |||
+ | |||
+ | #################### CAUTION! ##################### | ||
+ | mFOLD SECONDARY STRUCTURE DOES NOT FIT OUR DATA BANK | ||
+ | #################### CAUTION! ##################### | ||
+ | |||
+ | |||
+ | YOUR SEQUENCE AND MOST STABLE PREDICTED STRUCTURE IS: | ||
+ | |||
+ | 5' GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGG 3' | ||
+ | ..(((((((((.((((.........)))).......((.....)).....)))).))))).... | ||
+ | ______________________________________________________________________________________ | ||
+ | |||
+ | Job ended normally. Sun Oct 29 23:47:06 2017 | ||
+ | |||
+ | |||
+ | </pre> | ||
+ | </p> | ||
+ | <p> | ||
+ | This is also a good example to show that it might happen that one program recognizes the | ||
+ | crRNA secondary structure while the other does not. In this case, NUPACK has predicted the structure | ||
+ | while Mfold is not able to predict the structure. Even though this is an experimental construct | ||
+ | that worked, we did not put the secondary structure prediction of this into the database for Mfold, | ||
+ | since it was unable to predict the right structure. | ||
+ | </p> | ||
+ | </td> | ||
+ | </tr> | ||
+ | |||
+ | |||
Line 194: | Line 304: | ||
<p> | <p> | ||
− | In order to rule out off-target effects for the designed crRNA in diagnostic applications, we developed a script that is able to | + | In order to rule out off-target effects for the designed crRNA in diagnostic applications, |
+ | we developed a script that is able to BLAST the sequence either against whole databases | ||
+ | online or a custom database we compiled. This database contains the human transcriptome and those of bacteria common in the human nasal tract as well as modell organisms used in our project: | ||
<ol style="list-style-type:disc; list-style-position:left; text-align: left;"> | <ol style="list-style-type:disc; list-style-position:left; text-align: left;"> | ||
<li>Homo Sapiens</li> | <li>Homo Sapiens</li> | ||
Line 206: | Line 318: | ||
</p> | </p> | ||
<p> | <p> | ||
− | Transcriptomes that | + | Transcriptomes that are common in the nasal tract but were not available are, |
+ | among others: | ||
</p> | </p> | ||
<ol style="list-style-type:disc; list-style-position:left; text-align: left;"> | <ol style="list-style-type:disc; list-style-position:left; text-align: left;"> | ||
Line 215: | Line 328: | ||
<br> | <br> | ||
<p> | <p> | ||
− | All data was retreived | + | All data was retreived from the Transcriptome Release #90 of the ENSEMBL project. The output is generated |
+ | from the output of a blastn-short run and consists in the example below of all sequences that show sequence identity of 18 bp or | ||
+ | higher. For an actual run, the identity would need to be 26 bp or higher in order to actually show off-target effects since Cas13a is | ||
+ | selective up to 2 point mutations regarding the binding of crRNA and subsequent RNase activity. | ||
+ | The expectation value here describes the number of hits one can expect to find | ||
+ | in a random database the same size as the database used for the blastn-short run. | ||
</p> | </p> | ||
</p> | </p> | ||
<pre style="text-align: left;"> | <pre style="text-align: left;"> | ||
+ | ################################################################## | ||
+ | ####################### Input Sequence ########################### | ||
+ | ################################################################## | ||
+ | |||
+ | Your Sequence was: | ||
+ | |||
+ | GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGGC | ||
+ | |||
+ | Your target sequence thus is: | ||
+ | UGAUAAAGAAGACAGUCAUAAGUGCGGC | ||
+ | |||
+ | Sun Oct 29 23:49:22 2017 | ||
+ | |||
+ | ################################################################## | ||
+ | |||
+ | |||
+ | |||
################################################################## | ################################################################## | ||
####### Following possible off-targets have been identified ###### | ####### Following possible off-targets have been identified ###### | ||
################################################################## | ################################################################## | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | </ | + | >seq 0 # ID of alignment |
+ | sequence:gnl|BL_ORD_ID|91933 ENSBTAT00000042836.3 cds chromosome: # location of gene | ||
+ | UMD3.1:3:107697178:107698719:1 gene:ENSBTAG00000030338.3 # gene name and ID | ||
+ | gene_biotype:protein_coding transcript_biotype:protein_coding # type of gene | ||
+ | gene_symbol:GJA9 description:gap junction protein alpha 9 # Gene description | ||
+ | [Source:HGNC Symbol;Acc:HGNC:19155] | ||
+ | length:1542 # Length of gene | ||
+ | e value:1.50536 # expectation value | ||
+ | identity:18 # sequence identity count | ||
+ | ATAAAGAAGACAGTCATAA... # alignment | ||
+ | |||||||||||| ||||||... # match/mismatches | ||
+ | ATAAAGAAGACACTCATAA... # alignment | ||
+ | |||
+ | >seq 1 | ||
+ | sequence:gnl|BL_ORD_ID|69018 ENSBTAT00000042836.3 cdna | ||
+ | chromosome:UMD3.1:3:107697178:107698719:1 gene:ENSBTAG00000030338.3 | ||
+ | gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:GJA9 | ||
+ | description:gap junction protein alpha 9 | ||
+ | [Source:HGNC Symbol;Acc:HGNC:19155] | ||
+ | length:1542 | ||
+ | e value:1.50536 | ||
+ | identity:18 | ||
+ | ATAAAGAAGACAGTCATAA... | ||
+ | |||||||||||| ||||||... | ||
+ | ATAAAGAAGACACTCATAA... | ||
+ | ___________________________________________________________ | ||
+ | |||
+ | Job ended normally. Sun Oct 29 23:49:22 2017 | ||
+ | |||
+ | |||
+ | These results have also been saved in: off_target.out | ||
+ | The full BLAST output can be found in: seq.xml | ||
+ | </pre> | ||
</tr> | </tr> | ||
+ | </td> | ||
+ | |||
Line 242: | Line 400: | ||
<br> | <br> | ||
<p> | <p> | ||
− | The database program gives you an interface to interact with the MySQL database created for crRNAs that have been shown | + | The database program gives you an interface to interact with the MySQL database created for |
+ | crRNAs that have been shown to work experimentally . | ||
</p> | </p> | ||
<pre style="text-align: left;"> | <pre style="text-align: left;"> | ||
− | |||
− | |||
− | |||
− | + | ############ Available Detection Targets ################# | |
− | ############ | + | |
[1] Virus | [1] Virus | ||
[2] Bacteria | [2] Bacteria | ||
− | |||
[0] Go back one step | [0] Go back one step | ||
Line 260: | Line 414: | ||
What would you like to detect?2 | What would you like to detect?2 | ||
− | ############ | + | ############ Available Detection Targets ################# |
− | [1] | + | [1] Escherichia coli |
+ | [2] Bacillus subtillis | ||
[0] Go back one step | [0] Go back one step | ||
Line 268: | Line 423: | ||
What would you like to detect?1 | What would you like to detect?1 | ||
− | ############ | + | ############ Choose your Target ################# |
− | + | [1] E. Coli 16s rRNA | |
− | + | ||
− | [1] rRNA | + | |
[0] Go back one step | [0] Go back one step | ||
Line 280: | Line 433: | ||
########### The sequence thou art looking for is : ################ | ########### The sequence thou art looking for is : ################ | ||
− | + | ACUUUACUCCCUUCCUCCCCGCUGAAA | |
+ | |||
− | |||
[9] Exit | [9] Exit | ||
[0] Go back one step | [0] Go back one step | ||
+ | |||
</pre> | </pre> | ||
<p> | <p> | ||
− | However, these still need to be tested for off-target effects experimentally since <i>in silico</i> screening can only confirm specificity to a certain amount of certainty. | + | However, these still need to be tested for off-target effects experimentally since <i>in silico</i> |
+ | screening can only confirm specificity to a certain amount of certainty. | ||
</p> | </p> | ||
</td> | </td> | ||
</tr> | </tr> | ||
+ | |||
+ | <tr><td colspan=6 align=center valign=center> | ||
+ | <h3>Hardware control</h3> | ||
+ | <p> | ||
+ | The software for reading out the fluorescence detector is described in the | ||
+ | <a class="myLink" href="https://2017.igem.org/Team:Munich/Hardware/Detector">Hardware</a> section. | ||
+ | All software developed for hardware can be found in our <a class="myLink" href="https://github.com/igemsoftware2017/igem_munich_2017">GitHub</a> repository. | ||
+ | </p> | ||
+ | </td> | ||
+ | </tr> | ||
+ | |||
+ | <tr><td colspan=6 align=center valign=center> | ||
+ | <h3>References</h3> | ||
+ | <p> | ||
+ | <ol style="text-align: left"> | ||
+ | <li id="ref_1">M. Dirks, J. S. Bois, J. M. Schaeffer, E. Winfree, and N. A. Pierce. | ||
+ | "Thermodynamic analysis of interacting nucleic acid strands."(2007) <i>SIAM Rev</i>, 49:65-88.</li> | ||
+ | <li id="ref_2">R. M. Dirks and N. A. Pierce. "An algorithm for computing nucleic acid base-pairing probabilities including pseudoknots." | ||
+ | (2004) <i>J. Comput. Chem.</i>, 25:1295-1304.</li> | ||
+ | <li id="ref_3">R. M. Dirks and N. A. Pierce. "A partition function algorithm for nucleic acid secondary structure including pseudoknots." | ||
+ | (2003) <i>J Comput Chem</i>, 24:1664-1677.</li> | ||
+ | <li id="ref_4">M. Zuker, D. H. Mathews and D. H. Turner. "Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide" | ||
+ | (1999) <i>RNA Biochemistry and Biotechnology</i> 11-43 J. Barciszewski and B. F. C. Clark, eds., | ||
+ | NATO ASI Series, Kluwer Academic Publishers, Dordrecht, NL </li> | ||
+ | <li id="ref_5">J.-M. Rouillard, M. Zuker and E. Gulari. "OligoArray 2.0: Thermodynamicaly improved | ||
+ | oligonucleotide design for microarrays." (2003) <i>Nucleic Acids Res.</i> 31:12, 3057-3062. </li> | ||
+ | <li>S.F. Altschul, W. Gish, W. Miller, E.W. Myers and D.J. Lipman "Basic local alignment search tool." (1990) | ||
+ | <i>J. Mol. Biol. </i> 215:403-410.</li> | ||
+ | |||
+ | |||
+ | </ol> | ||
+ | </p> | ||
+ | </td> | ||
+ | </tr> | ||
+ | |||
<tr><td class="no-padding" colspan=6 align=right valign=center height=10> | <tr><td class="no-padding" colspan=6 align=right valign=center height=10> |
Latest revision as of 01:47, 2 November 2017
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|