Difference between revisions of "Team:Munich/Software/Collaborations"

Latest revision as of 23:17, 1 November 2017


Software Collaboration
For TU Delft We started collaborating early after the European iGEM Meet-Up with TU Delft in the wetlab. Since we were both employing Cas13a as the molecular heart of our project, it made sense to exchange data and experiences from wet lab experiments. At some point during a skype meeting, we realised that we were doing similar things in the dry lab and thus, decided to collaborate and exchange ideas in that field as well.
Secondary Structure Prediction In our drylab collaboration with Team Delft, we performed a Cas13a design verification of crRNAs they used in their project with our developed software. Since the secondary structure of the crRNA is essential for its affinity to Cas13a, one can make assumptions on the post-binding RNase activity of the protein based on secondary structure predictions. We then compared these predicted structures to secondary structures of crRNAs that have been shown to work experimentally. For the five crRNA sequences Team Delft provided, the predicted secondary structure matched one of the secondary structures in our databank. The only one that was not recognized by neither the NUPACK nor the mFOLD databank was crRNA 2. It's predicted secondary structure in Vienna notation is depicted below: GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACAAGACAGUCAUAAGUGCGGCGACGAU ..(((((((((.((((.........)))).((......))..)))).))))).((....)).. It looks like the backbone structure needed in the first 35 bases is not constructed completely. When looking at the graphical output of NUPACK depicted in Figure 1, however, it is visible that the binding probability seems to be rather low regarding the structures that interact with the terminal 28 bp target sequence. Thus, crRNA 2 would most probably still be active when tested in experiment, but might show decreased activity. Figure 1: Graphical output of the secondary structure prediction of crRNA 2 in NUPACK. Figure 2: Cleavage experiments of crRNA 2, crRNA 3 and crRNA 4 as provided by TU Delft. Indeed, decreased activity using crRNA 2 in comparison to other crRNA is observed when performing experiments using these crRNA designs. Though this has to be treated with caution since crRNA 4's secondary structure was predicted to be correct but the Cas13a-crRNA complex of that crRNA does not show higher activity. We conclude that further testing of the software has to be done to show whether it can predict activity of Cas13a.
Off-target Effects Furthermore, in order to determine off-target effects, we constructed a database from Transcriptome data obtained from the Ensembl and Ensembl Bacteria databank. We tried to make the search as tailor-made to the Delft project as possible and thus considered species that were relevant to their project of detecting bacterial resistances related to cattle and milk production. The included species were: Bos Taurus (Cattle) Clostridium perfringens Corynebacterium diphtheria Fusobacterium necrophorum Lactobacillus casei Lactococcus lactis Providencia stuartii Staphylococcus aureus Streptococcus pneumoniae Three of the five crRNAs showed no off-targets in the constructed database. For crRNA 3 and crRNA 4, possible off-targets were detected but just as a result of the identity parameter being rather low in the BLAST-short run. When looking at the results, it is clearly visible that the sequences are not identical enough to show any RNase activity of Cas13a since the 28 bases target is specific to up to 2 mutations, but all the hits that have been found are with a match of 17 bases far from the 28 bases long. Finally, we can conclude that our software did not find any problems in the crRNA design of Team Delft. It seems that their constructs will most probably show activity and, at least for the mentioned species, it will not show any off-target effects.
From TU Delft The TU Delft team developed a simulation code for modeling the on and off-target activities for sequences on all possible frames on the genome. In this, they employ a kinetic model by Depken et al. in 2017 to determine whether off-target activities are high enough to give a false positive collateral RNase activity of the Cas13a protein. They deducted from these model cleavage probabilities for different crRNA sequences and could thus make statements of the possibility to distinguish between our two bacterial targets, the 16S rRNA of E. Coli and B. subtillis. By running the crRNA against both genomes of these species, they determined the off-target probability to be very low for all our crRNA design as shown in Figure 3. Thus, they deduced from these that our crRNA design would be able to differentiate between E. Coli and B. subtilis. Figure 3: Off-target probability of activation of the Cas13a protein for our crRNAs sensing E.Coli (1.3) and B. subtillis crRNA 1-3 are taken into one since the spacer sequence of the crRNA was identical for these constructs. We would like to thank Team Delft for the fun collaboration and wish you all the best for your project. We are looking forward to seeing you in Boston.
References Klein, M., Eslami-Mossallam, B., Gonzalez Arroyo, D., Depken, M. (2017). "The kinetic basis of CRISPR-Cas off-targeting rules." doi: 10.1101/143602.

@@ Line 24: / Line 24: @@
 #myContent *{
-  color: #919191;
+  color: #444444;
 }
@@ Line 76: / Line 76: @@
 </tr>
 <tr><td colspan=6 align=left valign=center>
-<font size=7 color=#51a7f9><b style="color: #51a7f9">Software</b></font>
+<font size=7 color=#51a7f9><b style="color: #51a7f9">Software Collaboration</b></font>
-</td>
-</tr>
-<tr class="lastRow">
-	<td  colspan = 6 align="left">
-		<p class="introduction">
-We mainly developed two branches of Software needed for our project. On the one hand, we developed Software to allow user's devices such as Computers and Smartphones to control our Hardware's devices, Heatbringer and Lightbringer. On the other hand, we used scripting in order to improve the performance of the Cas13a protein regarding a diagnostic device test. This involved the post-design verification of crRNA regarding secondary structure and transcriptomal uniqueness as well as the development of a database of crRNA designs that have already worked. We tried to make the latter as extensive as possible given the limited time, checking for collaboration with other teams working with Cas13a, mainly TU Delft. The repository can be found <a class="myLink" href="https://github.com/igemsoftware2017/igem_munich_2017">here</a>.
-                </p>
-	</td>
-</tr>
@@ Line 91: / Line 82: @@
+<tr><td colspan=6 align=center valign=center>
-<tr class="lastRow"><td colspan=6 align=center valign=center>
+<h3>For TU Delft</h3>
-<h3>crRNA Design Verification</h3>
 <p>
-There are two main problems regarding the crRNA design of Cas13a for a diagnostic device. First of all, one needs to make sure that the secondary structure of the crRNA needed for Cas13a activity is achieved. Second, one needs to make sure that the sequence targeted by the crRNA is specific, i.e. there is no off-target effects in the transcriptome of the organisms present in the sample. If this is not the case, false positive results will occur. The software we developed relies mainly on bioinformatic principles such as Secondary Structure Prediction and Basic Local Alignment Searches Tools (BLAST).
+We started collaborating early after the European iGEM Meet-Up with TU Delft in the wetlab.
+Since we were both employing Cas13a as the molecular heart of our project, it made sense to
+exchange data and experiences from wet lab experiments. At some point during a skype meeting,
+we realised that we were doing similar things in the dry lab and thus, decided
+to collaborate and exchange ideas in that field as well. <br><br>
 </p>
 </td>
+</tr>
-</tr>
+<tr>
+<td colspan=6 align=center valign=center>
-<tr><td colspan=6 align=center valign=center>
 <h3>Secondary Structure Prediction</h3>
 <p>
-For secondary structure prediction of the crRNA we utilised the two mainly used porgram packages in the field, NUPACK and Mfold. With the help of these packages, we were able to compare newly designed crRNA with secondary structures of crRNAs that were already known to be active, either from actual crystallography data of crRNA in complex with Cas13a, or from structure prediction data of experimentally tested crRNAs. Through this, we could prior to experiments already sort out certain crRNA designs that would not fit the secondary structures. We developed a script for the end user automatising this procedure.
+In our drylab collaboration with Team Delft, we performed a Cas13a design verification
+of crRNAs they used in their project with our developed software. Since the secondary
+structure of the crRNA is essential for its affinity to Cas13a, one can make assumptions
+on the post-binding RNase activity of the protein based on secondary structure predictions.
+We then compared these predicted structures  to secondary structures of crRNAs that have
+been shown to work experimentally. For the five crRNA sequences Team Delft provided,
+the predicted secondary structure matched one of the secondary structures in our databank.
+The only one that was not recognized by neither the NUPACK nor the mFOLD databank was
+crRNA 2. It's predicted secondary structure in Vienna notation is depicted below:
 </p>
-</td>
+<pre>
-</tr>
+GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACAAGACAGUCAUAAGUGCGGCGACGAU
+..(((((((((.((((.........)))).((......))..)))).))))).((....))..
-<tr class="lastRow"><td colspan=6 align=center valign=center>
-<h4>Mfold</h4>
-<br>
-<p>
-Mfold is a webserver for RNA secondary structure prediction developed by Michael Zuker based on his paper "Mfold web server for nucleic acid folding and hybridization prediction" that published in <i>Nucleic Acids Research</i>  in 2003. Since Mfold is not available as a locally buildable binary for every operating system, we developed a script that automatically requests a standardised RNA Fold job to the server, therefore making it available throughout all operating systems. Using the result obtained from this request, the secondary structure is checked via a string comparison in so-called "Vienna" notation. This notation gives base pairing as a string of dots and brackets where a dot represents a non-bonded base and brackets form the base-pairs, clarified by a opening bracket "(" at the 5'-end of the base-pair and a closing bracket ")" at the 3'-end. An example taken from the sample output of the program is given below:
-<pre style="text-align: left;">
-Example 1: Secondary Structure Prediction
-NICE! YOU'VE GOT THE RIGHT SECONDARY STRUCTURE!
-YOUR SEQUENCE WAS:
-GAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACACUUUACUCCCUUCCUCCCCGCUGAAAGAU
-                     (.((((((.((((....)))).)))))).)                   ######## MATCHED SECONDARY STRUCTURE
-.....................(.((((((.((((....)))).)))))).)..............     ######## PREDICTED SECONDARY STRUCTURE
-YOUR BACKBONE SEQUENCE HAS BEEN FOUND IN THE DATABANK
-IT CORRESPONDS TO THE BACKBONE SEQUENCE OF: lwaCas13a
 </pre>
+<p>
+It looks like the backbone structure needed in the first 35 bases is not constructed completely.
+When looking at the graphical output of NUPACK depicted in Figure 1, however, it is visible that the binding probability
+seems to be rather low regarding the structures that interact with the terminal 28 bp target sequence.
+Thus, crRNA 2 would most probably still be active when tested in experiment, but might show decreased activity.
 </p>
+<div class="captionPicture" align=center>
+<img alt="LightbringerReal" src="https://static.igem.org/mediawiki/2017/b/bf/T--Munich--SoftwareCollaborationsPagePicture-mfe.png" width="500">
 <p>
-A more visual output from Mfold is in progress, though not needed for the preliminary usage of the program.
+<b>Figure 1</b>: Graphical output of the secondary structure prediction of crRNA 2 in NUPACK.
 </p>
-</td>
+</div>
-</tr>
+<div class="captionPicture" align=center>
+<img alt="LightbringerReal" src="https://static.igem.org/mediawiki/2017/c/c9/T--Munich--SoftwareCollaborationsPagePicture-activity.png" width="500">
-<tr class="lastRow"><td colspan=6 align=center valign=center>
+<p>
-<h4>NUPACK</h4>
+<b>Figure 2</b>: Cleavage experiments of crRNA 2, crRNA 3 and crRNA 4 as provided by TU Delft.
-<br>
-<p>
-For offline usage and second validation, we implemented NUPACK locally. This decision was made because we experienced that in certain cases, only one of the program packages was able to predict the secondary structure of crRNA as described in previous papers, predominantly the paper of Liu et al. published in <i>Cell</i> in 2017 "Two Distant Catalytic Sites Are Responsible for C2c2 RNase Activities". Also, it gives you the opportunity to use the program without access to the internet. NUPACK is a RNA Secondary Structure Prediction program package developed by several contributors under the guidance of Prof. Niles A. Pierce at the California Insitute of Technology (Caltech). The source-code is available free-of-charge for academic usage. We implemented it on a Mac running Mac OS Sierra. NUPACK allows the analysis of the partition function, the minimum free energy and the equillibrium base-pairing probabilities of a RNA sequence. By the use of several of these parameters and the final structure prediction, we estimated whether the crRNA would be active in Cas13a. Furthermore, it is possible to predict more than just the most stable structure. This enables looking at less stable structures since the protein may compensate for non-ideal structures by giving the right environment for stabilisation. The output of a suboptimal prediction is given in Example 2:
-<pre style="text-align: left;">
-% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
--9.400
-.....................(.((((((.((((....)))).)))))).)...............
-      51
-      49
-      48
-      47
-      46
-      45
-      44
-      42
-      41
-      40
-      39
-% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
-% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
--9.300
-.....................(((((((..((((....)))).)))))))................
-      50
-      49
-      48
-      47
-      46
-      45
-      44
-      42
-      41
-      40
-      39
-% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
-</pre>
 </p>
+</div>
 <p>
+Indeed, decreased activity using crRNA 2 in comparison to other crRNA is observed when performing experiments using
-From this, one can extract the secondary structure in Vienna notation as well as the Free Energies of the RNA structure to predict the probability of formation in solution with help of the calculation of the full partition function. Using these, we predicted qualitative activity of the corresponding Cas13a-crRNA complex.
+these crRNA designs. Though this has to be treated with caution since crRNA 4's secondary structure was predicted
+to be correct but the Cas13a-crRNA complex of that crRNA does not show higher activity. We conclude that further testing
+of the software has to be done to show whether it can predict activity of Cas13a. <br><br>
 </p>
 </td>
 </tr>
+<tr><td colspan=6 align=center valign=center>
+<h3>Off-target Effects</h3>
-<tr class="lastRow"><td colspan=6 align=center valign=center>
-<h3>Off-Target Effects</h3>
 <p>
+Furthermore, in order to determine off-target effects, we constructed a database from Transcriptome data obtained
+from the Ensembl and Ensembl Bacteria databank. We tried to make the search as tailor-made to the Delft project as
+possible and thus considered species that were relevant to their project of detecting bacterial resistances related
+to cattle and milk production.  The included species were:
-In order to rule out off-target effects for the designed crRNA in diagnostic applications, we developed a script that is able to blast the sequence against either whole databases online or a sub-database we created from transcriptome data of human and bacterial transcriptomes that are commonly found inside the nose and modell organisms used in our project including:
 <ol style="list-style-type:disc; list-style-position:left; text-align: left;">
-<li>Homo Sapiens</li>
+<li>Bos Taurus (Cattle)</li>
-<li>Escherichia Coli</li>
+<li>Clostridium perfringens</li>
-<li>Bacillus subtilis</li>
+<li>Corynebacterium diphtheria</li>
+<li>Fusobacterium necrophorum</li>
+<li>Lactobacillus casei</li>
+<li>Lactococcus lactis</li>
+<li>Providencia stuartii</li>
 <li>Staphylococcus aureus</li>
-<li>Corynebacterium diphtheriae</li>
+<li>Streptococcus pneumoniae</li>
-<li>Streptococcus diphtheriae</li>
-<li>Haemophillus influenzae</li>
 </ol>
-</p>
+<br>
 <p>
-Transcriptomes that would be necessary but were not available are:
+Three of the five crRNAs showed no off-targets in the constructed database. For crRNA 3 and crRNA 4,
+possible off-targets were detected but just as a result of the identity parameter being rather low
+in the BLAST-short run. When looking at the results, it is clearly visible that the sequences are not
+identical enough to show any RNase activity of Cas13a since the 28 bases target is specific to up to 2 mutations,
+but all the hits that have been found are with a match of 17 bases far from the 28 bases long.
 </p>
-<ol style="list-style-type:disc; list-style-position:left; text-align: left;">
-<li>Neisseria family</li>
-<li>Staphylococcus epidermidis</li>
-<li>Streptococcus pyogenes</li>
-</ol>
-<br>
 <p>
-All data was retreived from www.ensembl.org webpage from the Transcriptome Release #90.
+Finally, we can conclude that our software did not find any problems in the crRNA design of Team Delft.
+It seems that their constructs will most probably show activity and, at least for the mentioned species,
+it will not show any off-target effects.
 </p>
 </p>
-<pre style="text-align: left;">
-##################################################################
-####### Following possible off-targets have been identified ######
-##################################################################
->seq 0
-sequence:gnl|BL_ORD_ID|2 KJJ58724 cdna:annotated supercontig:ASM95397v1:scaffold_31:1584:1937:1
-gene:NG01_11520 gene_biotype:protein_coding
-transcript_biotype:protein_coding description:hypothetical protein
-length:354
-e value:2.42551e-24
-identity:60
-GTGTCCGTTGAGACCCTTGCCAGCAACCATGTCGATCCGCTCCCCGAATCCGTTGCGTCT...
-||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||...
-GTGTCCGTTGAGACCCTTGCCAGCAACCATGTCGATCCGCTCCCCGAATCCGTTGCGTCT...
-</pre>
 </td>
 </tr>
+<tr><td colspan=6 align=center valign=center>
-<tr class="lastRow"><td colspan=6 align=center valign=center>
+<h3>From TU Delft</h3>
-<h3>Database</h3>
-<br>
 <p>
-The database program gives you an interface to interact with the MySQL database created for crRNAs that have been shown experimentally to work.
+The TU Delft team developed a simulation code for modeling the on and off-target
+activities for sequences on all possible frames on the genome. In this, they employ a kinetic model
+by Depken et al. in 2017  to determine whether off-target activities are high enough to give a false positive
+collateral RNase activity of the Cas13a protein. They deducted from these model cleavage probabilities
+for different crRNA sequences and could thus make statements of the possibility to distinguish
+between our two bacterial targets, the 16S rRNA of <i>E. Coli</i> and <i>B. subtillis</i>. By running the crRNA
+against both genomes of these species, they determined the off-target probability to be very low for all our crRNA
+design as shown in <b>Figure 3</b>. Thus, they deduced from these that our crRNA design would be able to differentiate between <i>E. Coli</i> and <i>B. subtilis</i>.
 </p>
-<pre style="text-align: left;">
-###################################################################
-##############        Welcome to CasCAID2GO      ##################
-###################################################################
+<div class="captionPicture" align=center>
+<img alt="LightbringerReal" src="https://static.igem.org/mediawiki/2017/4/40/T--Munich--SoftwareCollaborationsPagePicture-cleavage.png" width="700">
+<p>
+<b>Figure 3:</b> Off-target probability of activation of the Cas13a protein for our crRNAs sensing <i>E.Coli (1.3)</i> and <i>B. subtillis</i>
+crRNA 1-3 are taken into one since the spacer sequence of the crRNA was identical for these constructs.
+</p>
+</div>
-############          Target clarified           #################
-[1] Virus
+<p>
-[2] Bacteria
+<br><br>
-[3] Resistance
+We would like to thank Team Delft for the fun collaboration and wish you all the best for your project.
+We are looking forward to seeing you in Boston.
+</p>
+</td>
+</tr>
-[0] Go back one step
-What would you like to detect?2
+<tr><td colspan=6 align=center valign=center>
+<h3>References</h3>
+<p>
+    <ol style="text-align: left">
+    	<li id="ref_1">Klein, M., Eslami-Mossallam, B., Gonzalez Arroyo, D., Depken, M. (2017).
+    	"The kinetic basis of CRISPR-Cas off-targeting rules." doi: 10.1101/143602.</li>
+    </ol>
+</p>
+</td>
+</tr>
-############          Target clarified           #################
-[1] E. Coli
-[0] Go back one step
-What would you like to detect?1
-############        Specific Target chosen               ################
-[1] rRNA Ribosome
-[0] Go back one step
-What would you like to detect?1
-###########      The sequence thou art looking for is : ################
-GTGTGAGCTCCTAATACGACTCACTATAGGGACCACCCCAAAAATGAAGGGGACTAAAACAACTTTACTCCCTTCCTCCCCGCTGAAAGAT
-[1] Order from IDT
-[9] Exit
-[0] Go back one step
-</pre>
-<p>
-However, these still need to be tested for off-target effects experimentally since <i>in silico</i> screening can only confirm specificity to a certain amount of certainty.
-</p>
-</td>
-</tr>
 <tr><td class="no-padding" colspan=6 align=right valign=center height=10>