Sven klumpe (Talk | contribs) |
Sven klumpe (Talk | contribs) |
||
Line 69: | Line 69: | ||
<!-- Head End --> | <!-- Head End --> | ||
<!-- Content Begin --> | <!-- Content Begin --> | ||
− | <img id="TopPicture" width=" | + | <img id="TopPicture" width="960" src="https://static.igem.org/mediawiki/2017/7/78/T--Munich--FrontPagePictures_Software.svg"> |
<table width="960" border=0 cellspacing=0 cellpadding=10> | <table width="960" border=0 cellspacing=0 cellpadding=10> | ||
<tr> | <tr> | ||
Line 80: | Line 80: | ||
</tr> | </tr> | ||
<tr><td colspan=6 align=left valign=center> | <tr><td colspan=6 align=left valign=center> | ||
− | <font size=7 color=#51a7f9><b style="color: #51a7f9"> | + | <font size=7 color=#51a7f9><b style="color: #51a7f9">Software</b></font> |
+ | <pre> | ||
+ | ################################################################### | ||
+ | # # | ||
+ | # CascAID V1.0 # | ||
+ | # # | ||
+ | # Wed Nov 1 04:23:54 2017 # | ||
+ | # # | ||
+ | # IGEM Munich 2017 # | ||
+ | # # | ||
+ | # # | ||
+ | # # | ||
+ | # # | ||
+ | # Please send bug reports to: # | ||
+ | # # | ||
+ | # Sven Klumpe # | ||
+ | # # | ||
+ | # E-Mail: sven.klumpe@tum.de # | ||
+ | # # | ||
+ | ################################################################### | ||
+ | |||
+ | |||
+ | </pre> | ||
</td> | </td> | ||
</tr> | </tr> | ||
<tr class="lastRow"> | <tr class="lastRow"> | ||
− | <td colspan=6 align="left"> | + | <td colspan = 6 align="left"> |
<p class="introduction"> | <p class="introduction"> | ||
− | + | CascAID is a potentially universal tool for nucleic acid detection. | |
− | + | Fast adaptation of our platform to new targets requires <i>in silico</i> verification of the crRNA design. | |
− | + | Crucial factors for the development of these crRNA designs are the binding of the crRNA to Cas13a | |
− | + | mainly determined by its secondary structure and the uniqueness of the targeting sequence in the transcriptome | |
− | + | to rule out false positive results. To ensure the integrity of the Cas13a-crRNA complex, we developed | |
− | + | a python script that uses the established program packages for secondary structures NUPACK and Mfold. | |
− | + | In order to verify the specificity of the targeting sequence, we used the BLASTN-short program to | |
+ | check for similar structures in a transcriptome databank. Additionally, we created a database of crRNA designs | ||
+ | that have already worked and made it | ||
+ | as extensive as possible given the limited time, checking for collaboration with other teams working with Cas13a, | ||
+ | mainly TU Delft. | ||
+ | The second branch of software needed for our project we developed consists of the software for hardware control. | ||
+ | They allow user's devices such as computers and smartphones to control | ||
+ | our hardware's devices, Heatbringer and Lightbringer. | ||
+ | The repository to our software can be found <a class="myLink" href="https://github.com/igemsoftware2017/igem_munich_2017">here</a>. | ||
+ | </p> | ||
</td> | </td> | ||
</tr> | </tr> | ||
− | <tr><td colspan=6 align=center valign=center> | + | |
− | < | + | |
+ | |||
+ | |||
+ | <tr class="lastRow"><td colspan=6 align=center valign=center> | ||
+ | <h3>crRNA Design Verification</h3> | ||
<p> | <p> | ||
− | + | There are two main problems regarding the design of crRNA for a diagnostic test. | |
− | + | First, the secondary structure of the crRNA needed for Cas13a activity needs to be verified. | |
− | + | Second, the sequence targeted by the crRNA has to be specific, i.e. , there is no identical sequence in the | |
− | + | reference transcriptome of an healthy patient. Otherwise off-target effects will lead to | |
− | + | false positive results since Cas13a is activated even though the pathogen is not present. | |
− | + | To address these issues, we developed a software relying on bioinformatic principles such as | |
− | + | secondary structure prediction and Basic Local Alignment Searches Tools (BLAST). | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
</p> | </p> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
</td> | </td> | ||
+ | |||
+ | |||
</tr> | </tr> | ||
+ | |||
+ | |||
<tr><td colspan=6 align=center valign=center> | <tr><td colspan=6 align=center valign=center> | ||
− | < | + | <h3>Secondary Structure Prediction</h3> |
<p> | <p> | ||
− | + | For secondary structure prediction of the crRNA we utilised the two established program packages | |
− | + | in the field, NUPACK and Mfold to compare newly designed crRNA with secondary structures of crRNAs that | |
− | + | were already known to be active. These reference crRNA structures were either obtained from actual | |
− | + | crystallography data of crRNA in complex with Cas13a, or from structure prediction data of experimentally | |
− | + | tested crRNAs. Using secondary structure verification we were able to rule out misfolding crRNA | |
− | + | designs prior to experiment. We developed a script for the end user automatising this procedure. | |
</p> | </p> | ||
− | + | </td> | |
− | + | </tr> | |
− | + | ||
− | + | <tr class="lastRow"><td colspan=6 align=center valign=center> | |
− | + | <h4>NUPACK</h4> | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | < | + | |
− | + | ||
− | < | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
<br> | <br> | ||
+ | <p> | ||
+ | NUPACK is a RNA Secondary Structure Prediction program package developed | ||
+ | by several contributors under the guidance of Prof. Niles A. Pierce at the California Insitute of Technology (Caltech). | ||
+ | The source-code is available free-of-charge for academic usage. | ||
+ | NUPACK allows the analysis of the partition function, the minimum free energy and the equillibrium base-pairing | ||
+ | probabilities of a RNA sequence. | ||
+ | For offline usage we implemented NUPACK locally. We proceeded to implement Mfold as a webserver request. | ||
+ | This decision was made because we experienced that in certain cases, only one of the program packages | ||
+ | was able to predict the secondary structure of crRNA as described in previous papers, predominantly the paper of Liu et al. published in <i>Cell</i> in 2017 | ||
+ | "Two Distant Catalytic Sites Are Responsible for C2c2 RNase Activities". Also, a local run gives you | ||
+ | the possibility of using the full spectrum of NUPACK's programs. | ||
+ | By the use of several of the the final structure prediction, we estimated whether the | ||
+ | crRNA would be active in Cas13a. | ||
+ | Furthermore, we experienced that NUPACK sometimes predicts the right secondary structure, it just doesn't represent | ||
+ | the most stable structure. With NUPACK's subopt, it is possible to predict more than just | ||
+ | the most stable structure. This enables looking at less stable structures since the protein may compensate for | ||
+ | non-ideal structures by giving the right environment for stabilisation and compare this to the | ||
+ | structure databank. The output of a suboptimal prediction | ||
+ | is given in Example 2, the explanation is added after '#' for commenting: | ||
+ | |||
+ | |||
+ | <pre style="text-align: left;"> | ||
+ | % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % | ||
+ | 66 #### length of the sequence | ||
+ | -9.400 #### free energy of the structure | ||
+ | .....................(.((((((.((((....)))).)))))).)............... #### secondary structure in Vienna notation | ||
+ | 22 51 #### IDs of bases that form basepairs | ||
+ | 24 49 #### this would mean base 22 pairs with base 49 | ||
+ | 25 48 | ||
+ | 26 47 | ||
+ | 27 46 | ||
+ | 28 45 | ||
+ | 29 44 | ||
+ | 31 42 | ||
+ | 32 41 | ||
+ | 33 40 | ||
+ | 34 39 | ||
+ | % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % | ||
+ | |||
+ | % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % | ||
+ | 66 | ||
+ | -9.300 | ||
+ | .....................(((((((..((((....)))).)))))))................ | ||
+ | 22 50 | ||
+ | 23 49 | ||
+ | 24 48 | ||
+ | 25 47 | ||
+ | 26 46 | ||
+ | 27 45 | ||
+ | 28 44 | ||
+ | 31 42 | ||
+ | 32 41 | ||
+ | 33 40 | ||
+ | 34 39 | ||
+ | % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % | ||
+ | |||
+ | </pre> | ||
</p> | </p> | ||
<p> | <p> | ||
− | + | ||
− | + | From this, we can extract the secondary structure in Vienna notation as well as the free energies | |
− | + | of the RNA structure to predict the probability of formation in solution with help of the calculation | |
− | + | of the full partition function. Using these results, the user can make qualitative assumptions about | |
− | + | the activity of the corresponding Cas13a-crRNA complex. | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
</p> | </p> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
</td> | </td> | ||
</tr> | </tr> | ||
− | <tr><td colspan=6 align=center valign=center> | + | <tr class="lastRow"><td colspan=6 align=center valign=center> |
− | < | + | <h4>Mfold</h4> |
+ | <br> | ||
<p> | <p> | ||
− | + | Mfold is a webserver for RNA secondary structure prediction developed by Michael Zuker based on his paper | |
− | we | + | "Mfold web server for nucleic acid folding and hybridization prediction" that published in <i>Nucleic Acids Research</i> |
− | + | in 2003. Since Mfold is not available as a locally buildable binary for every operating system, we developed a | |
− | + | script that automatically requests a standardised RNA Fold job from the server, therefore making it available | |
− | + | throughout all operating systems. Using the result obtained from this request, the secondary structure is | |
− | + | checked via a string comparison in so-called "Vienna" notation. This notation gives base pairing as a string | |
− | + | of dots and brackets where a dot represents a non-bonded base and brackets form the base-pairs, clarified by | |
− | + | a opening bracket "(" at the 5'-end of the base-pair and a closing bracket ")" at the 3'-end. An example for the output | |
− | + | of the program is given below: | |
− | + | <pre style="text-align: left;"> | |
− | the | + | |
− | + | ||
− | + | ||
− | + | ||
− | < | + | |
+ | ####################################################################################### | ||
+ | #################### CascAID Secondary Structure Verification ######################### | ||
+ | ####################################################################################### | ||
− | + | ||
− | + | ||
+ | ####################################################################################### | ||
+ | ##################### NUPACK Secondary Structure Verification ######################### | ||
+ | ####################################################################################### | ||
+ | |||
+ | |||
+ | GOOD NEWS! YOU'VE GOT THE RIGHT SECONDARY STRUCTURE! | ||
+ | YOUR SEQUENCE WAS: | ||
+ | |||
+ | 5' GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGGC 3' | ||
+ | |||
+ | (((((....((((.........)))).))))) ######## MATCHED SECONDARY STRUCTURE | ||
+ | ...(((((....((((.........)))).)))))..((((..........))))......... ######## PREDICTED SECONDARY STRUCTURE | ||
+ | ___________________________________________________________________ | ||
+ | |||
+ | YOUR BACKBONE SEQUENCE HAS BEEN FOUND IN THE DATABANK | ||
+ | IT CORRESPONDS TO THE BACKBONE SEQUENCE OF: lwaCas13a | ||
+ | ______________________________________________________________________________________ | ||
+ | |||
+ | Job ended normally. Sun Oct 29 23:46:28 2017 | ||
+ | Do you have internet connectivity? [yes/no]yes | ||
+ | |||
+ | |||
+ | ####################################################################################### | ||
+ | #################### MFOLD SECONDARY STRUCTURE VERIFICATION ########################### | ||
+ | ####################################################################################### | ||
+ | |||
+ | |||
+ | |||
+ | #################### CAUTION! ##################### | ||
+ | mFOLD SECONDARY STRUCTURE DOES NOT FIT OUR DATA BANK | ||
+ | #################### CAUTION! ##################### | ||
+ | |||
+ | |||
+ | YOUR SEQUENCE AND MOST STABLE PREDICTED STRUCTURE IS: | ||
+ | |||
+ | 5' GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGG 3' | ||
+ | ..(((((((((.((((.........)))).......((.....)).....)))).))))).... | ||
+ | ______________________________________________________________________________________ | ||
+ | |||
+ | Job ended normally. Sun Oct 29 23:47:06 2017 | ||
+ | |||
+ | |||
+ | </pre> | ||
</p> | </p> | ||
<p> | <p> | ||
− | + | This is also a good example to show that the case can occur that one program recognizes the | |
− | + | crRNA secondary structure while the other does not. In this case, NUPACK has predicted the structure | |
− | + | while Mfold is not able to predict the structure. Even though this is an experimental construct | |
− | + | that worked, we did not put the secondary structure prediction of this into the database for Mfold, | |
− | + | since it was unable to predict the right structure. | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | the | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
</p> | </p> | ||
</td> | </td> | ||
− | </tr> | + | </tr> |
− | <tr><td colspan=6 align=center valign=center> | + | |
− | < | + | |
+ | |||
+ | |||
+ | <tr class="lastRow"><td colspan=6 align=center valign=center> | ||
+ | <h3>Off-Target Effects</h3> | ||
<p> | <p> | ||
− | + | ||
− | + | In order to rule out off-target effects for the designed crRNA in diagnostic applications, we developed a script that is able to blast the sequence against either whole databases online or a sub-database we created from transcriptome data of human and bacterial transcriptomes that are commonly found inside the nose and modell organisms used in our project including: | |
− | + | <ol style="list-style-type:disc; list-style-position:left; text-align: left;"> | |
+ | <li>Homo Sapiens</li> | ||
+ | <li>Escherichia Coli</li> | ||
+ | <li>Bacillus subtilis</li> | ||
+ | <li>Staphylococcus aureus</li> | ||
+ | <li>Corynebacterium diphtheriae</li> | ||
+ | <li>Streptococcus diphtheriae</li> | ||
+ | <li>Haemophillus influenzae</li> | ||
+ | </ol> | ||
</p> | </p> | ||
− | |||
− | |||
<p> | <p> | ||
− | + | Transcriptomes that would be necessary but were not available are: | |
</p> | </p> | ||
− | </ | + | <ol style="list-style-type:disc; list-style-position:left; text-align: left;"> |
+ | <li>Neisseria family</li> | ||
+ | <li>Staphylococcus epidermidis</li> | ||
+ | <li>Streptococcus pyogenes</li> | ||
+ | </ol> | ||
+ | <br> | ||
<p> | <p> | ||
− | + | All data was retreived from www.ensembl.org webpage from the Transcriptome Release #90. | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
</p> | </p> | ||
+ | </p> | ||
+ | <pre style="text-align: left;"> | ||
+ | ################################################################## | ||
+ | ####### Following possible off-targets have been identified ###### | ||
+ | ################################################################## | ||
+ | >seq 0 | ||
+ | sequence:gnl|BL_ORD_ID|2 KJJ58724 cdna:annotated supercontig:ASM95397v1:scaffold_31:1584:1937:1 | ||
+ | gene:NG01_11520 gene_biotype:protein_coding | ||
+ | transcript_biotype:protein_coding description:hypothetical protein | ||
+ | length:354 | ||
+ | e value:2.42551e-24 | ||
+ | identity:60 | ||
+ | GTGTCCGTTGAGACCCTTGCCAGCAACCATGTCGATCCGCTCCCCGAATCCGTTGCGTCT... | ||
+ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||... | ||
+ | GTGTCCGTTGAGACCCTTGCCAGCAACCATGTCGATCCGCTCCCCGAATCCGTTGCGTCT... | ||
+ | </pre> | ||
+ | |||
</td> | </td> | ||
</tr> | </tr> | ||
− | <tr><td colspan=6 align=center valign=center> | + | |
− | < | + | <tr class="lastRow"><td colspan=6 align=center valign=center> |
+ | <h3>Database</h3> | ||
+ | <br> | ||
<p> | <p> | ||
− | + | The database program gives you an interface to interact with the MySQL database created for crRNAs that have been shown experimentally to work. | |
− | + | ||
</p> | </p> | ||
+ | <pre style="text-align: left;"> | ||
− | + | ############ Available Detection Targets ################# | |
− | + | ||
− | + | [1] Virus | |
− | + | [2] Bacteria | |
− | + | ||
− | + | [0] Go back one step | |
− | + | ||
− | + | What would you like to detect?2 | |
− | + | ||
− | + | ############ Available Detection Targets ################# | |
− | + | ||
+ | [1] Escherichia coli | ||
+ | [2] Bacillus subtillis | ||
+ | |||
+ | [0] Go back one step | ||
+ | |||
+ | What would you like to detect?1 | ||
+ | |||
+ | ############ Choose your Target ################# | ||
+ | |||
+ | [1] E. Coli 16s rRNA | ||
+ | |||
+ | [0] Go back one step | ||
+ | |||
+ | What would you like to detect?1 | ||
+ | |||
+ | ########### The sequence thou art looking for is : ################ | ||
+ | |||
+ | ACUUUACUCCCUUCCUCCCCGCUGAAA | ||
+ | |||
+ | |||
+ | |||
+ | [9] Exit | ||
+ | [0] Go back one step | ||
+ | |||
+ | |||
+ | </pre> | ||
+ | <p> | ||
+ | |||
+ | However, these still need to be tested for off-target effects experimentally since <i>in silico</i> | ||
+ | screening can only confirm specificity to a certain amount of certainty. | ||
</p> | </p> | ||
</td> | </td> | ||
− | </tr> | + | </tr> |
<tr><td class="no-padding" colspan=6 align=right valign=center height=10> | <tr><td class="no-padding" colspan=6 align=right valign=center height=10> |
Revision as of 04:30, 1 November 2017
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|