Difference between revisions of "Team:Munich/Software"

(Prototype team page)
 
Line 1: Line 1:
{{Munich}}
+
<!-- #919191 Grau1 -->
 +
<!-- #787878 Grau2 -->
 +
<!-- #51A7f9 Blau1 -->
 +
<!-- #3c7cb9 Blau2 -->
 +
<!-- #ffffff weiss -->
 
<html>
 
<html>
 +
<link rel="stylesheet" type="text/css"
 +
href="https://2017.igem.org/Template:Munich/CSS?action=raw&ctype=text/css" />
 +
<link rel="stylesheet" type="text/css"
 +
href="https://2017.igem.org/Template:Munich/Header?action=raw&ctype=text/css" />
 +
<link rel="stylesheet" type="text/css"
 +
href="https://2017.igem.org/Template:Munich/Filter?action=raw&ctype=text/css" />
 +
<head>
 +
<style>
 +
#HQ_page h3{
 +
text-align: left;
 +
margin-bottom: 10px;
 +
}
  
 +
#HQ_page h3{
 +
color: #51a7f9;
 +
}
  
<div class="column full_size judges-will-not-evaluate">
 
<h3>★  ALERT! </h3>
 
<p>This page is used by the judges to evaluate your team for the <a href="https://2017.igem.org/Judging/Medals">medal criterion</a> or <a href="https://2017.igem.org/Judging/Awards"> award listed above</a>. </p>
 
<p> Delete this box in order to be evaluated for this medal criterion and/or award. See more information at <a href="https://2017.igem.org/Judging/Pages_for_Awards"> Instructions for Pages for awards</a>.</p>
 
</div>
 
<div class="clear"></div>
 
  
 +
#myContent *{
 +
color: #919191;
 +
}
  
<div class="column half_size">
+
#myContent tr p{
<h1>Software</h1>
+
margin-bottom: 10px;
<h3>Best Software Tool Special Prize</h3>
+
}
<p>Regardless of the topic, iGEM projects often create or adapt computational tools to move the project forward. Because they are born out of a direct practical need, these software tools (or new computational methods) can be surprisingly useful for other teams. Without necessarily being big or complex, they can make the crucial difference to a project's success. This award tries to find and honor such "nuggets" of computational work.
+
  
 +
#myContent tr.lastRow p{
 +
margin-bottom: 40px;
 +
}
  
<br><br>
+
#HQ_page .listInParagraph{
To compete for the <a href="https://2017.igem.org/Judging/Awards">Best Software Tool prize</a>, please describe your work on this page and also fill out the description on the <a href="https://2017.igem.org/Judging/Judging_Form">judging form</a>.
+
margin-bottom: 10px;
<br><br>
+
margin-left: 10px;
You must also delete the message box on the top of this page to be eligible for this prize.
+
text-align: left;
 +
}
 +
 
 +
#HQ_page .interLabResults{
 +
}
 +
 
 +
#HQ_page .interLabResults img{
 +
margin-top: 75px;
 +
margin-bottom: 75px;
 +
}
 +
 
 +
</style>
 +
</head></html>
 +
{{Munich/Menu}}
 +
<html>
 +
<body>
 +
<table width=100% height=100% cellpadding=0 cellspacing=0 border=0>
 +
<!-- Content -->
 +
<tr><td width="100%" colspan=4>
 +
<table width=100% height=100% cellpadding=0 cellspacing=0 border=0>
 +
<tr>
 +
<td width="40%">
 +
</td>
 +
<td id="myContent" width="20%" valign=top align=center>
 +
<br>
 +
<!-- Head End -->
 +
<!-- Content Begin -->
 +
<img id="TopPicture" width="960" src="https://static.igem.org/mediawiki/2017/7/78/T--Munich--FrontPagePictures_Software.svg">
 +
<table width="960" border=0 cellspacing=0 cellpadding=10>
 +
<tr>
 +
<td width=160></td>
 +
<td width=160></td>
 +
<td width=160></td>
 +
<td width=160></td>
 +
<td width=160></td>
 +
<td width=160></td>
 +
</tr>
 +
<tr><td colspan=6 align=left valign=center>
 +
<font size=7 color=#51a7f9><b style="color: #51a7f9">Software</b></font>
 +
</td>
 +
</tr>
 +
<tr class="lastRow">
 +
<td  colspan = 6 align="left">
 +
<p class="introduction">
 +
We mainly developed two branches of Software needed for our project. On the one hand, we developed Software to allow user's devices such as Computers and Smartphones to control our Hardware's devices, Heatbringer and Lightbringer. On the other hand, we used scripting in order to improve the performance of the Cas13a protein regarding a diagnostic device test. This involved the post-design verification of crRNA regarding secondary structure and transcriptomal uniqueness as well as the development of a database of crRNA designs that have already worked. We tried to make the latter as extensive as possible given the limited time, checking for collaboration with other teams working with Cas13a, mainly TU Delft.
 +
                </p>
 +
</td>
 +
</tr>
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
<tr class="lastRow"><td colspan=6 align=center valign=center>
 +
<h3>crRNA Design Verification</h3>
 +
<p> 
 +
There are two main problems regarding the crRNA design of Cas13a for a diagnostic device. First of all, one needs to make sure that the secondary structure of the crRNA needed for Cas13a activity is achieved. Second, one needs to make sure that the sequence targeted by the crRNA is specific, i.e. there is no off-target effects in the transcriptome of the organisms present in the sample. If this is not the case, false positive results will occur. The software we developed relies mainly on bioinformatic principles such as Secondary Structure Prediction and Basic Local Alignment Searches Tools (BLAST).  
 
</p>
 
</p>
 +
</td>
  
  
</div>
+
</tr>
  
<div class="column half_size">
+
 
<h5> Inspiration </h5>
+
 
 +
 
 +
 
 +
<tr><td colspan=6 align=center valign=center>
 +
<h3>Secondary Structure Prediction</h3>
 +
<p> 
 +
For secondary structure prediction of the crRNA we utilised the two mainly used porgram packages in the field, NUPACK and Mfold. With the help of these packages, we were able to compare newly designed crRNA with secondary structures of crRNAs that were already known to be active, either from actual crystallography data of crRNA in complex with Cas13a, or from structure prediction data of experimentally tested crRNAs. Through this, we could prior to experiments already sort out certain crRNA designs that would not fit the secondary structures. We developed a script for the end user automatising this procedure.
 +
</p>
 +
</td>
 +
</tr>
 +
 
 +
<tr class="lastRow"><td colspan=6 align=center valign=center>
 +
<h4>Mfold</h4>
 +
<br>
 +
<p> 
 +
Mfold is a webserver for RNA secondary structure prediction developed by Michael Zuker based on his paper "Mfold web server for nucleic acid folding and hybridization prediction" that published in <i>Nucleic Acids Research</i>  in 2003. Since Mfold is not available as a locally buildable binary for every operating system, we developed a script that automatically requests a standardised RNA Fold job to the server, therefore making it available throughout all operating systems. Using the result obtained from this request, the secondary structure is checked via a string comparison in so-called "Vienna" notation. This notation gives base pairing as a string of dots and brackets where a dot represents a non-bonded base and brackets form the base-pairs, clarified by a opening bracket "(" at the 5'-end of the base-pair and a closing bracket ")" at the 3'-end. An example taken from the sample output of the program is given below:
 +
<pre style="text-align: left;">
 +
Example 1: Secondary Structure Prediction
 +
 
 +
NICE! YOU'VE GOT THE RIGHT SECONDARY STRUCTURE!
 +
YOUR SEQUENCE WAS:
 +
GAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACACUUUACUCCCUUCCUCCCCGCUGAAAGAU
 +
                    (.((((((.((((....)))).)))))).)                  ######## MATCHED SECONDARY STRUCTURE
 +
.....................(.((((((.((((....)))).)))))).)..............    ######## PREDICTED SECONDARY STRUCTURE
 +
YOUR BACKBONE SEQUENCE HAS BEEN FOUND IN THE DATABANK
 +
IT CORRESPONDS TO THE BACKBONE SEQUENCE OF: lwaCas13a
 +
 
 +
</pre>
 +
</p>
 
<p>
 
<p>
Here are a few examples from previous teams:
+
A more visual output from Mfold is in progress, though not needed for the preliminary usage of the program.
 
</p>
 
</p>
<ul>
+
</td>
<li><a href="https://2016.igem.org/Team:BostonU_HW">2016 BostonU HW</a></li>
+
</tr>  
<li><a href="https://2016.igem.org/Team:Valencia_UPV">2016 Valencia UPV</a></li>
+
<li><a href="https://2014.igem.org/Team:Heidelberg/Software">2014 Heidelberg</a></li>
+
<li><a href="https://2014.igem.org/Team:Aachen/Project/Measurement_Device#Software">2014 Aachen</a></li>
+
</ul>
+
  
</div>
+
<tr class="lastRow"><td colspan=6 align=center valign=center>
 +
<h4>NUPACK</h4>
 +
<br>
 +
<p> 
 +
For offline usage and second validation, we implemented NUPACK locally. This decision was made because we experienced that in certain cases, only one of the program packages was able to predict the secondary structure of crRNA as described in previous papers, predominantly the paper of Liu et al. published in <i>Cell</i> in 2017 "Two Distant Catalytic Sites Are Responsible for C2c2 RNase Activities". Also, it gives you the opportunity to use the program without access to the internet. NUPACK is a RNA Secondary Structure Prediction program package developed by several contributors under the guidance of Prof. Niles A. Pierce at the California Insitute of Technology (Caltech). The source-code is available free-of-charge for academic usage. We implemented it on a Mac running Mac OS Sierra. NUPACK allows the analysis of the partition function, the minimum free energy and the equillibrium base-pairing probabilities of a RNA sequence. By the use of several of these parameters and the final structure prediction, we estimated whether the crRNA would be active in Cas13a. Furthermore, it is possible to predict more than just the most stable structure. This enables looking at less stable structures since the protein may compensate for non-ideal structures by giving the right environment for stabilisation. The output of a suboptimal prediction is given in Example 2:
 +
 
 +
<pre style="text-align: left;">
 +
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
 +
66
 +
-9.400
 +
.....................(.((((((.((((....)))).)))))).)...............
 +
22      51
 +
24      49
 +
25      48
 +
26      47
 +
27      46
 +
28      45
 +
29      44
 +
31      42
 +
32      41
 +
33      40
 +
34      39
 +
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
 +
 
 +
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
 +
66
 +
-9.300
 +
.....................(((((((..((((....)))).)))))))................
 +
22      50
 +
23      49
 +
24      48
 +
25      47
 +
26      46
 +
27      45
 +
28      44
 +
31      42
 +
32      41
 +
33      40
 +
34      39
 +
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
 +
 
 +
</pre>
 +
</p>
 +
<p>
 +
 
 +
From this, one can extract the secondary structure in Vienna notation as well as the Free Energies of the RNA structure to predict the probability of formation in solution with help of the calculation of the full partition function. Using these, we predicted qualitative activity of the corresponding Cas13a-crRNA complex.
 +
</p>
 +
</td>
 +
</tr>
 +
 
 +
 
 +
 
 +
<tr class="lastRow"><td colspan=6 align=center valign=center>
 +
<h3>Off-Target Effects</h3>
 +
<p> 
 +
 
 +
In order to rule out off-target effects for the designed crRNA in diagnostic applications, we developed a script that is able to blast the sequence against either whole databases online or a sub-database we created from transcriptome data of human and bacterial transcriptomes that are commonly found inside the nose and modell organisms used in our project including:
 +
<ol style="list-style-type:disc; list-style-position:left; text-align: left;">
 +
<li>Homo Sapiens</li>
 +
<li>Escherichia Coli</li>
 +
<li>Bacillus subtilis</li>
 +
<li>Staphylococcus aureus</li>
 +
<li>Corynebacterium diphtheriae</li>
 +
<li>Streptococcus diphtheriae</li>
 +
<li>Haemophillus influenzae</li>
 +
</ol>
 +
</p>
 +
<p>
 +
Transcriptomes that would be necessary but were not available are:
 +
</p>
 +
<ol style="list-style-type:disc; list-style-position:left; text-align: left;">
 +
<li>Neisseria family</li>
 +
<li>Staphylococcus epidermidis</li>
 +
<li>Streptococcus pyogenes</li>
 +
</ol>
 +
<br>
 +
<p>
 +
All data was retreived from www.ensembl.org webpage from the Transcriptome Release #90.
 +
</p>
 +
</p>
 +
<pre style="text-align: left;">
 +
##################################################################
 +
####### Following possible off-targets have been identified ######
 +
##################################################################
 +
>seq 0
 +
sequence:gnl|BL_ORD_ID|2 KJJ58724 cdna:annotated supercontig:ASM95397v1:scaffold_31:1584:1937:1
 +
gene:NG01_11520 gene_biotype:protein_coding
 +
transcript_biotype:protein_coding description:hypothetical protein
 +
length:354
 +
e value:2.42551e-24
 +
identity:60
 +
GTGTCCGTTGAGACCCTTGCCAGCAACCATGTCGATCCGCTCCCCGAATCCGTTGCGTCT...
 +
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||...
 +
GTGTCCGTTGAGACCCTTGCCAGCAACCATGTCGATCCGCTCCCCGAATCCGTTGCGTCT...
 +
</pre>
 +
 
 +
</td>
 +
</tr>
 +
 
 +
 
 +
<tr class="lastRow"><td colspan=6 align=center valign=center>
 +
<h3>Database</h3>
 +
<br>
 +
<p> 
 +
The database program gives you an interface to interact with the MySQL database created for crRNAs that have been shown experimentally to work.
 +
</p>
 +
<pre style="text-align: left;">
 +
###################################################################
 +
##############        Welcome to CasCAID2GO      ##################
 +
###################################################################
 +
 
 +
 
 +
############          Target clarified          #################
 +
 
 +
[1] Virus
 +
[2] Bacteria
 +
[3] Resistance
 +
 
 +
[0] Go back one step
 +
 
 +
What would you like to detect?2
 +
 
 +
############          Target clarified          #################
 +
 
 +
[1] E. Coli
 +
 
 +
[0] Go back one step
 +
 
 +
What would you like to detect?1
 +
 
 +
############        Specific Target chosen              ################
 +
 +
 +
 +
[1] rRNA Ribosome
 +
 
 +
[0] Go back one step
 +
 
 +
What would you like to detect?1
 +
 
 +
###########      The sequence thou art looking for is : ################
 +
 +
GTGTGAGCTCCTAATACGACTCACTATAGGGACCACCCCAAAAATGAAGGGGACTAAAACAACTTTACTCCCTTCCTCCCCGCTGAAAGAT
 +
 
 +
[1] Order from IDT
 +
 
 +
[9] Exit
 +
[0] Go back one step
 +
 
 +
</pre>
 +
<p>
 +
 
 +
However, these still need to be tested for off-target effects experimentally since <i>in silico</i> screening can only confirm specificity to a certain amount of certainty.
 +
</p>
 +
</td>
 +
</tr>
  
 +
<tr><td class="no-padding" colspan=6 align=right valign=center height=10>
 +
<br><br><br><center><hr></center>
 +
</td></tr>
 +
</table>
 +
<!-- Content End -->
 
</html>
 
</html>
 +
{{Munich/Footer}}

Revision as of 17:43, 25 October 2017


Software

We mainly developed two branches of Software needed for our project. On the one hand, we developed Software to allow user's devices such as Computers and Smartphones to control our Hardware's devices, Heatbringer and Lightbringer. On the other hand, we used scripting in order to improve the performance of the Cas13a protein regarding a diagnostic device test. This involved the post-design verification of crRNA regarding secondary structure and transcriptomal uniqueness as well as the development of a database of crRNA designs that have already worked. We tried to make the latter as extensive as possible given the limited time, checking for collaboration with other teams working with Cas13a, mainly TU Delft.

crRNA Design Verification

There are two main problems regarding the crRNA design of Cas13a for a diagnostic device. First of all, one needs to make sure that the secondary structure of the crRNA needed for Cas13a activity is achieved. Second, one needs to make sure that the sequence targeted by the crRNA is specific, i.e. there is no off-target effects in the transcriptome of the organisms present in the sample. If this is not the case, false positive results will occur. The software we developed relies mainly on bioinformatic principles such as Secondary Structure Prediction and Basic Local Alignment Searches Tools (BLAST).

Secondary Structure Prediction

For secondary structure prediction of the crRNA we utilised the two mainly used porgram packages in the field, NUPACK and Mfold. With the help of these packages, we were able to compare newly designed crRNA with secondary structures of crRNAs that were already known to be active, either from actual crystallography data of crRNA in complex with Cas13a, or from structure prediction data of experimentally tested crRNAs. Through this, we could prior to experiments already sort out certain crRNA designs that would not fit the secondary structures. We developed a script for the end user automatising this procedure.

Mfold


Mfold is a webserver for RNA secondary structure prediction developed by Michael Zuker based on his paper "Mfold web server for nucleic acid folding and hybridization prediction" that published in Nucleic Acids Research in 2003. Since Mfold is not available as a locally buildable binary for every operating system, we developed a script that automatically requests a standardised RNA Fold job to the server, therefore making it available throughout all operating systems. Using the result obtained from this request, the secondary structure is checked via a string comparison in so-called "Vienna" notation. This notation gives base pairing as a string of dots and brackets where a dot represents a non-bonded base and brackets form the base-pairs, clarified by a opening bracket "(" at the 5'-end of the base-pair and a closing bracket ")" at the 3'-end. An example taken from the sample output of the program is given below:

Example 1: Secondary Structure Prediction

NICE! YOU'VE GOT THE RIGHT SECONDARY STRUCTURE!
YOUR SEQUENCE WAS: 
GAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACACUUUACUCCCUUCCUCCCCGCUGAAAGAU
                     (.((((((.((((....)))).)))))).)                   ######## MATCHED SECONDARY STRUCTURE
.....................(.((((((.((((....)))).)))))).)..............     ######## PREDICTED SECONDARY STRUCTURE
YOUR BACKBONE SEQUENCE HAS BEEN FOUND IN THE DATABANK
IT CORRESPONDS TO THE BACKBONE SEQUENCE OF: lwaCas13a

A more visual output from Mfold is in progress, though not needed for the preliminary usage of the program.

NUPACK


For offline usage and second validation, we implemented NUPACK locally. This decision was made because we experienced that in certain cases, only one of the program packages was able to predict the secondary structure of crRNA as described in previous papers, predominantly the paper of Liu et al. published in Cell in 2017 "Two Distant Catalytic Sites Are Responsible for C2c2 RNase Activities". Also, it gives you the opportunity to use the program without access to the internet. NUPACK is a RNA Secondary Structure Prediction program package developed by several contributors under the guidance of Prof. Niles A. Pierce at the California Insitute of Technology (Caltech). The source-code is available free-of-charge for academic usage. We implemented it on a Mac running Mac OS Sierra. NUPACK allows the analysis of the partition function, the minimum free energy and the equillibrium base-pairing probabilities of a RNA sequence. By the use of several of these parameters and the final structure prediction, we estimated whether the crRNA would be active in Cas13a. Furthermore, it is possible to predict more than just the most stable structure. This enables looking at less stable structures since the protein may compensate for non-ideal structures by giving the right environment for stabilisation. The output of a suboptimal prediction is given in Example 2:

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
66
-9.400
.....................(.((((((.((((....)))).)))))).)...............
22      51
24      49
25      48
26      47
27      46
28      45
29      44
31      42
32      41
33      40
34      39
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
66
-9.300
.....................(((((((..((((....)))).)))))))................
22      50
23      49
24      48
25      47
26      46
27      45
28      44
31      42
32      41
33      40
34      39
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %

From this, one can extract the secondary structure in Vienna notation as well as the Free Energies of the RNA structure to predict the probability of formation in solution with help of the calculation of the full partition function. Using these, we predicted qualitative activity of the corresponding Cas13a-crRNA complex.

Off-Target Effects

In order to rule out off-target effects for the designed crRNA in diagnostic applications, we developed a script that is able to blast the sequence against either whole databases online or a sub-database we created from transcriptome data of human and bacterial transcriptomes that are commonly found inside the nose and modell organisms used in our project including:

  1. Homo Sapiens
  2. Escherichia Coli
  3. Bacillus subtilis
  4. Staphylococcus aureus
  5. Corynebacterium diphtheriae
  6. Streptococcus diphtheriae
  7. Haemophillus influenzae

Transcriptomes that would be necessary but were not available are:

  1. Neisseria family
  2. Staphylococcus epidermidis
  3. Streptococcus pyogenes

All data was retreived from www.ensembl.org webpage from the Transcriptome Release #90.

##################################################################
####### Following possible off-targets have been identified ######
##################################################################
>seq 0
sequence:gnl|BL_ORD_ID|2 KJJ58724 cdna:annotated supercontig:ASM95397v1:scaffold_31:1584:1937:1 
gene:NG01_11520 gene_biotype:protein_coding 
transcript_biotype:protein_coding description:hypothetical protein
length:354
e value:2.42551e-24
identity:60
GTGTCCGTTGAGACCCTTGCCAGCAACCATGTCGATCCGCTCCCCGAATCCGTTGCGTCT... 
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||... 
GTGTCCGTTGAGACCCTTGCCAGCAACCATGTCGATCCGCTCCCCGAATCCGTTGCGTCT... 

Database


The database program gives you an interface to interact with the MySQL database created for crRNAs that have been shown experimentally to work.

###################################################################
##############        Welcome to CasCAID2GO      ##################
###################################################################


############          Target clarified           #################

[1] Virus
[2] Bacteria
[3] Resistance

[0] Go back one step

What would you like to detect?2

############          Target clarified           #################

[1] E. Coli

[0] Go back one step

What would you like to detect?1

############        Specific Target chosen               ################
	
	
	
[1] rRNA Ribosome

[0] Go back one step

What would you like to detect?1

###########      The sequence thou art looking for is : ################
	
GTGTGAGCTCCTAATACGACTCACTATAGGGACCACCCCAAAAATGAAGGGGACTAAAACAACTTTACTCCCTTCCTCCCCGCTGAAAGAT

[1] Order from IDT

[9] Exit
[0] Go back one step

However, these still need to be tested for off-target effects experimentally since in silico screening can only confirm specificity to a certain amount of certainty.