Difference between revisions of "Team:Munich/Software"

Line 69: Line 69:
 
<!-- Head End -->
 
<!-- Head End -->
 
<!-- Content Begin -->
 
<!-- Content Begin -->
<img id="TopPicture" width="800" src="https://static.igem.org/mediawiki/2017/6/62/T--Munich--FrontPagePictures_Modeling.jpg">
+
<img id="TopPicture" width="960" src="https://static.igem.org/mediawiki/2017/7/78/T--Munich--FrontPagePictures_Software.svg">
 
<table width="960" border=0 cellspacing=0 cellpadding=10>
 
<table width="960" border=0 cellspacing=0 cellpadding=10>
 
<tr>
 
<tr>
Line 80: Line 80:
 
</tr>
 
</tr>
 
<tr><td colspan=6 align=left valign=center>
 
<tr><td colspan=6 align=left valign=center>
<font size=7 color=#51a7f9><b style="color: #51a7f9">Modelling</b></font>
+
<font size=7 color=#51a7f9><b style="color: #51a7f9">Software</b></font>
 +
<pre>
 +
###################################################################
 +
#                                                                #
 +
#                        CascAID V1.0                            #
 +
#                                                                #
 +
#                  Wed Nov  1 04:23:54 2017                      #
 +
#                                                                #
 +
#                      IGEM Munich 2017                          #
 +
#                                                                #
 +
#                                                                #
 +
#                                                                #
 +
#                                                                #
 +
#                  Please send bug reports to:                    #
 +
#                                                                #
 +
#                        Sven Klumpe                           #
 +
#                                                                #
 +
#                E-Mail: sven.klumpe@tum.de                      #
 +
#                                                                #
 +
###################################################################
 +
 
 +
 
 +
</pre>
 
</td>
 
</td>
 
</tr>
 
</tr>
 
<tr class="lastRow">
 
<tr class="lastRow">
<td  colspan=6 align="left">
+
<td  colspan = 6 align="left">
 
<p class="introduction">
 
<p class="introduction">
Modelling in Biosciences is a powerful tool that allows one to get a deeper understanding
+
CascAID is a potentially universal tool for nucleic acid detection.
of one's system. We mainly used Modelling to help with the design of our device.  
+
Fast adaptation of our platform to new targets requires <i>in silico</i> verification of the crRNA design.  
By this, we could avoid spending time on dead-end-designs that otherwise might have  
+
Crucial factors for the development of these crRNA designs are the binding of the crRNA to Cas13a
cost us a significant amount of time. Rather simple models can already give
+
mainly determined by its secondary structure and the uniqueness of the targeting sequence in the transcriptome
fair amount of information about one's system. That is why we decided at an early stage to incorporate
+
to rule out false positive results. To ensure the integrity of the Cas13a-crRNA complex, we developed
Modelling in our device design.  
+
a python script that uses the established program packages for secondary structures NUPACK and Mfold.  
        </p>
+
In order to verify the specificity of the targeting sequence, we used the BLASTN-short program to
 +
check for similar structures in a transcriptome databank. Additionally, we created a database of crRNA designs  
 +
that have already worked and made it
 +
as extensive as possible given the limited time, checking for collaboration with other teams working with Cas13a,
 +
mainly TU Delft.  
 +
The second branch of software needed for our project we developed consists of the software for hardware control.
 +
They allow user's devices such as computers and smartphones to control
 +
our hardware's devices, Heatbringer and Lightbringer.
 +
The repository to our software can be found <a class="myLink" href="https://github.com/igemsoftware2017/igem_munich_2017">here</a>.
 +
                </p>
 
</td>
 
</td>
 
</tr>
 
</tr>
  
  
<tr><td colspan=6 align=center valign=center>
+
 
<h2>Detection Limit</h2>
+
 
 +
 
 +
 
 +
<tr class="lastRow"><td colspan=6 align=center valign=center>
 +
<h3>crRNA Design Verification</h3>
 
<p>   
 
<p>   
One major concern when dealing with the problem of diagnostics on patients is obtaining the sample with which
+
There are two main problems regarding the design of crRNA for a diagnostic test.  
detection can actually be performed. Since we wanted our method to be non-invasive, one concern that we needed to
+
First, the secondary structure of the crRNA needed for Cas13a activity needs to be verified.  
deal with is the concentration of pathogens and thus detectable RNA in the patients mucus. First approximations from
+
Second, the sequence targeted by the crRNA has to be specific, i.e. , there is no identical sequence in the  
different papers already showed that virological samples show concentrations no higher than low pM and can even go as low
+
reference transcriptome of an healthy patient. Otherwise off-target effects will lead to  
as fM. Thus, we characterised the theoretical detection limit of the Cas13a RNAse activity. In order to do this, we first
+
false positive results since Cas13a is activated even though the pathogen is not present.  
fitted parameters using experimental data to the model shown below and used these in target RNA concentration dependent
+
To address these issues, we developed a software relying on bioinformatic principles such as
simulations. The results are shown in Figure 1. It shows that the detection limit in the time range of an hour is
+
secondary structure prediction and Basic Local Alignment Searches Tools (BLAST).  
approximately one- to two-digit nM region. Due to this result, our initial design of applying the lysed and purified RNA sample
+
directly on the detection paper strip had to be discarded. Instead, we had to explore amplification methods we could
+
perform upstream in the detection process.
+
As a side note, the detection limit could most probably have been pushed a bit to lower concentrations by using higher
+
concentrations in Cas13a and crRNA, but by doing this production cost per paperstrip would have increased a lot. Also,
+
it is known from literature that Cas proteins at high concentrations show activity independent of their activation mechanism
+
which is why the concentration of Cas13a in the detection system could not be increased by higher orders of magnitude.
+
</p>
+
<div class="captionPicture">
+
<img alt="LightbringerReal" src="https://static.igem.org/mediawiki/2017/c/c5/T--Munich--ModellingPagePicture_Theoretical_Detection_Limit.png" width="600">
+
<p>
+
Figure 1: Theoretical Detection Limit determined for the Cas13a system using 20 nM concentrations of Cas13a and crRNA.  
+
 
</p>
 
</p>
</div>
 
<p>
 
Since our detector has shown to be sensitive enough to detect one-digit nM concentrations of RNase alert, the needed concentration of
 
protein and crRNA could be downscaled, as only few nM of cleaved RNase alert were needed to get a read-out. The concentrations
 
were then pushed from the initial model which included 20 nM Cas13a and 20 nM crRNA to 1 nM Cas13a and 10 nM crRNA.
 
The plot below shows that the Theoretical Detection Limit still stays equal, bearing in mind that detection occurs once ~6 nM RNase alert
 
have been cleaved.
 
</p>
 
<div class="captionPicture">
 
<img alt="LightbringerReal" src="https://static.igem.org/mediawiki/2017/4/46/T--Munich--ModellingPagePicture_Theoretical_Detection_Limit_lowCas.png" width="600">
 
<p>
 
Figure 2: Theoretical Detection Limit determined for the Cas13a system using concentrations of 1 nM Cas13a and 10 nM crRNA.
 
</p>
 
</div>
 
 
</td>
 
</td>
 +
 +
 
</tr>
 
</tr>
 +
 +
  
  
  
 
<tr><td colspan=6 align=center valign=center>
 
<tr><td colspan=6 align=center valign=center>
<h2>Lysis on Chip</h2>
+
<h3>Secondary Structure Prediction</h3>
 
<p>   
 
<p>   
We modelled the lysis process on chip to get an idea of how long lysis would need to take place
+
For secondary structure prediction of the crRNA we utilised the two established program packages
in order to release enough RNA for downstream amplification. For this, we constructed a very simplistic
+
in the field, NUPACK and Mfold to compare newly designed crRNA with secondary structures of crRNAs that
model for bacterial cell lysis. In this, we estimated the rate constants for cell lysis by common colony PCR
+
were already known to be active. These reference crRNA structures were either obtained from actual
protocols which use a 10 minute lysis step at 95 °C for thermolysis. Thus, we considered a half-time of Bacteria
+
crystallography data of crRNA in complex with Cas13a, or from structure prediction data of experimentally
of 2 minutes at 95 °C. This would result in a lysis efficiency of 96.875%. Starting from this estimation,
+
tested crRNAs. Using secondary structure verification we were able to rule out misfolding crRNA
we considered the rate constant of lysis and thus the half-time using Arrhenius equation:
+
designs prior to experiment. We developed a script for the end user automatising this procedure.  
 
</p>
 
</p>
<p>
+
</td>
<div class="equationDiv"><img src="https://static.igem.org/mediawiki/2017/b/b0/T--Munich--Model_Equation_1.png"><span>(1)</span></div>
+
</tr>  
<div class="equationDiv"><img src="https://static.igem.org/mediawiki/2017/3/30/T--Munich--Model_Equation_2.png"><span>(2)</span></div>
+
 
</p>
+
<tr class="lastRow"><td colspan=6 align=center valign=center>
<p>
+
<h4>NUPACK</h4>
with rate constants k<sub>1</sub> and k<sub>2</sub> at temperature T<sub>1</sub> and T<sub>2</sub>
+
and Boltzmann constant R.
+
</p>
+
<p>
+
where R is the gas constant and k<sub>1</sub> and k<sub>2</sub> are the rate constant  at temperature T<sub>1</sub> and T<sub>2</sub>
+
The activation energy difference E_A was fitted to a barrier that follows the common rule of thumb that lysis should increase twice in
+
efficiency for every temperature increase of 10 °C. The model for lysis is shown derived in the following:
+
</p>
+
<p>
+
The full model can then be described by the coupled ordinary differential equations:<br>
+
</p>
+
<p>
+
<div class="equationDiv"><img src="https://static.igem.org/mediawiki/2017/7/7d/T--Munich--Model_Equation_3.png"><span>(3)</span></div>
+
<div class="equationDiv"><img src="https://static.igem.org/mediawiki/2017/3/35/T--Munich--Model_Equation_4.png"><span>(4)</span></div>
+
</p>  
+
<p>
+
with k<sub>lysis</sub> being the rate constant of bacterial lysis, k<sub>RNase</sub> the rate constant
+
of RNA degradation, count of target RNA [targetRNA] and count of bacteria Baks(t).
+
The solution to equation 3 is of course simply:
+
 
<br>
 
<br>
 +
<p> 
 +
NUPACK is a RNA Secondary Structure Prediction program package developed
 +
by several contributors under the guidance of Prof. Niles A. Pierce at the California Insitute of Technology (Caltech).
 +
The source-code is available free-of-charge for academic usage.
 +
NUPACK allows the analysis of the partition function, the minimum free energy and the equillibrium base-pairing
 +
probabilities of a RNA sequence.
 +
For offline usage we implemented NUPACK locally. We proceeded to implement Mfold as a webserver request.
 +
This decision was made because we experienced that in certain cases, only one of the program packages
 +
was able to predict the secondary structure of crRNA as described in previous papers, predominantly the paper of Liu et al. published in <i>Cell</i> in 2017
 +
"Two Distant Catalytic Sites Are Responsible for C2c2 RNase Activities". Also, a local run gives you
 +
the possibility of using the full spectrum of NUPACK's programs.
 +
By the use of several of the the final structure prediction, we estimated whether the
 +
crRNA would be active in Cas13a.
 +
Furthermore, we experienced that NUPACK sometimes predicts the right secondary structure, it just doesn't represent
 +
the most stable structure. With NUPACK's subopt, it is possible to predict more than just
 +
the most stable structure. This enables looking at less stable structures since the protein may compensate for
 +
non-ideal structures by giving the right environment for stabilisation and compare this to the
 +
structure databank. The output of a suboptimal prediction
 +
is given in Example 2, the explanation is added after '#' for commenting:
 +
 +
 +
<pre style="text-align: left;">
 +
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
 +
66 #### length of the sequence
 +
-9.400 #### free energy of the structure
 +
.....................(.((((((.((((....)))).)))))).)............... #### secondary structure in Vienna notation
 +
22      51 #### IDs of bases that form basepairs
 +
24      49 #### this would mean base 22 pairs with base 49
 +
25      48
 +
26      47
 +
27      46
 +
28      45
 +
29      44
 +
31      42
 +
32      41
 +
33      40
 +
34      39
 +
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
 +
 +
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
 +
66
 +
-9.300
 +
.....................(((((((..((((....)))).)))))))................
 +
22      50
 +
23      49
 +
24      48
 +
25      47
 +
26      46
 +
27      45
 +
28      44
 +
31      42
 +
32      41
 +
33      40
 +
34      39
 +
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
 +
 +
</pre>
 
</p>
 
</p>
 
<p>
 
<p>
<div class="equationDiv"><img src="https://static.igem.org/mediawiki/2017/8/81/T--Munich--Model_Equation_5.png"><span>(5)</span></div>
+
 
</p>
+
From this, we can extract the secondary structure in Vienna notation as well as the free energies
<p>
+
of the RNA structure to predict the probability of formation in solution with help of the calculation
Plugging equation 5 into equation 4 gives
+
of the full partition function. Using these results, the user can make qualitative assumptions about
</p>
+
the activity of the corresponding Cas13a-crRNA complex.  
<p>
+
<div class="equationDiv"><img src="https://static.igem.org/mediawiki/2017/6/6d/T--Munich--Model_Equation_6.png"><span>(6)</span></div>
+
</p>
+
<p>
+
where ratio determines the copy number of a target RNA in a single cell. This differential equation has the form and thus the analytical solution:
+
</p>
+
<p>
+
Equation 7 + 8
+
<div class="equationDiv"><img src="https://static.igem.org/mediawiki/2017/b/b7/T--Munich--Model_Equation_7.png"><span>(7)</span></div>
+
<div class="equationDiv"><img src="https://static.igem.org/mediawiki/2017/e/e5/T--Munich--Model_Equation_8.png"><span>(8)</span></div>
+
</p>
+
<p>
+
with initial condition of
+
 
</p>
 
</p>
<p>
 
<div class="equationDiv"><img style="height: 40px;" src="https://static.igem.org/mediawiki/2017/4/4e/T--Munich--Model_Equation_10.png"><span>(10)</span></div>
 
</p>
 
<p>
 
we get the final solution to the lysis equation:
 
</p>
 
<p>
 
<div class="equationDiv"><img src="https://static.igem.org/mediawiki/2017/5/5d/T--Munich--Model_Equation_11.png"><span>(11)</span></div>
 
</p>
 
<p>
 
The full model at different temperatures looks as follows:
 
</p>
 
<div class="captionPicture">
 
<img width=800 align=center valign=center src="https://static.igem.org/mediawiki/2017/f/f1/T--Munich--ModellingPagePicture_Lysis_Temperature.png" alt="Lysis_Temperature">
 
<p>
 
Figure 3: Effect of lysis temperature on the lysis efficiency of bacterial cells
 
and determination of the released concentration of target RNA from lysis assuming a ratio of 30
 
RNA molecules per cell.
 
</p>
 
</div>
 
 
</td>
 
</td>
 
</tr>
 
</tr>
  
<tr><td colspan=6 align=center valign=center>
+
<tr class="lastRow"><td colspan=6 align=center valign=center>
<h2>Signal Amplification</h2>
+
<h4>Mfold</h4>
 +
<br>
 
<p>   
 
<p>   
For the simulation of an amplification system, we developd a model for a circuit amplifying an RNA. Therefore,  
+
Mfold is a webserver for RNA secondary structure prediction developed by Michael Zuker based on his paper
we couple a Reverse Trancription to an isothermal PCR-like amplification called Recombinase Polymerase Amplification (RPA)
+
"Mfold web server for nucleic acid folding and hybridization prediction" that published in <i>Nucleic Acids Research</i> 
and do In-Vitro Transcription from the build template. A scheme for the model is shown in Figure 4.
+
in 2003. Since Mfold is not available as a locally buildable binary for every operating system, we developed a  
For simplicity, we made assumptions to this model:<br>
+
script that automatically requests a standardised RNA Fold job from the server, therefore making it available
First, the RPA reaction is thought to be in the linear region, independent of Primer concentration since we
+
throughout all operating systems. Using the result obtained from this request, the secondary structure is  
work in an environment of very high primer and dNTP concentrations (up to 1000 nM) and only want to reach RNA concentration within the  
+
checked via a string comparison in so-called "Vienna" notation. This notation gives base pairing as a string
range of the detection limit of our Cas13a protein, which is in the nM region. Therefore, since we are amplifying the RNA by
+
of dots and brackets where a dot represents a non-bonded base and brackets form the base-pairs, clarified by  
Transcription from the cDNA, this assumption is reasonable. The same argument goes for the In-Vitro Transcription; since we
+
a opening bracket "(" at the 5'-end of the base-pair and a closing bracket ")" at the 3'-end. An example for the output
are in an environment of excessive rNTP concentrations, thus first order approximation is valid. <br>
+
of the program is given below:
Rate constants were approximated by experiments or taken from literature. The only rate constant that was not available was
+
<pre style="text-align: left;">
the rate of Reverse Transcription. We, thus, took producer's information about commercial RT kits and estimated from these very
+
conservatively (two orders of magnitude less in reaction speed) to not be biased in the simulation by overfitting parameters. <br>
+
The rate constants are the following:  
+
COUNT ALL 4 RATE CONSTANTS
+
</p>
+
  
 +
#######################################################################################
 +
#################### CascAID Secondary Structure Verification #########################
 +
#######################################################################################
  
<p>
+
 
The coupled ODEs for the signal amplification circuit can be described simply by:
+
 
 +
#######################################################################################
 +
##################### NUPACK Secondary Structure Verification #########################
 +
#######################################################################################
 +
 +
 
 +
GOOD NEWS! YOU'VE GOT THE RIGHT SECONDARY STRUCTURE!
 +
YOUR SEQUENCE WAS:
 +
 
 +
5' GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGGC  3'
 +
 
 +
      (((((....((((.........)))).)))))                                  ########  MATCHED SECONDARY STRUCTURE
 +
  ...(((((....((((.........)))).)))))..((((..........)))).........    ######## PREDICTED SECONDARY STRUCTURE
 +
___________________________________________________________________
 +
 
 +
YOUR BACKBONE SEQUENCE HAS BEEN FOUND IN THE DATABANK
 +
IT CORRESPONDS TO THE BACKBONE SEQUENCE OF: lwaCas13a
 +
______________________________________________________________________________________
 +
 
 +
Job ended normally. Sun Oct 29 23:46:28 2017
 +
Do you have internet connectivity? [yes/no]yes
 +
 
 +
 
 +
#######################################################################################
 +
#################### MFOLD SECONDARY STRUCTURE VERIFICATION ###########################
 +
#######################################################################################
 +
 
 +
 
 +
 
 +
#################### CAUTION! #####################
 +
mFOLD SECONDARY STRUCTURE DOES NOT FIT OUR DATA BANK
 +
#################### CAUTION! #####################
 +
 
 +
 
 +
YOUR SEQUENCE AND MOST STABLE PREDICTED STRUCTURE IS:
 +
 
 +
5' GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGG 3'
 +
  ..(((((((((.((((.........)))).......((.....)).....)))).)))))....
 +
______________________________________________________________________________________
 +
 
 +
Job ended normally. Sun Oct 29 23:47:06 2017
 +
 
 +
 
 +
</pre>
 
</p>
 
</p>
 
<p>
 
<p>
Equations 12 + 13
+
This is also a good example to show that the case can occur that one program recognizes the
</p>
+
crRNA secondary structure while the other does not. In this case, NUPACK has predicted the structure
<div class="captionPicture">
+
while Mfold is not able to predict the structure. Even though this is an experimental construct
<img width=600 align=center valign=center src="https://static.igem.org/mediawiki/2017/d/dc/T--Munich--ModellingPagePicture_RT-RPA-TX_scheme.svg" alt="RT-RPA-TX_scheme">
+
that worked, we did not put the secondary structure prediction of this into the database for Mfold,
<p>
+
since it was unable to predict the right structure.
Figure 4: Scheme for the RT-RPA-Tx Amplification system.
+
</p>
+
</div>
+
<div class="captionPicture">
+
<img width=800 align=center valign=center src="https://static.igem.org/mediawiki/2017/8/8c/T--Munich--ModellingPagePicture_RT-RPA-TX.png" alt="RT-RPA-TX">
+
<p>
+
Figure 5: Target RNA concentration dependent on initial concentrations to determine the cycle time in RT-RPA-Tx needed for reaching
+
the Cas13a detection limit of 10 nM (red line).
+
</p>
+
</div>
+
<p>
+
The overall dynamics of the RT-RPA-Tx system are shown below for several starting concentrations of RNA.  
+
 
</p>
 
</p>
 
</td>
 
</td>
</tr>
+
</tr>  
  
<tr><td colspan=6 align=center valign=center>
+
 
<h2>Theoretical Detection Limit using the Amplification Circuit and Cas13a Detection</h2>
+
 
 +
 
 +
 
 +
<tr class="lastRow"><td colspan=6 align=center valign=center>
 +
<h3>Off-Target Effects</h3>
 
<p>   
 
<p>   
Since the reasoning behind using an amplification method was to bring down the detection limit, a new theoretical
+
 
detection limit of the device may be determined combining model of lysis and isothermal amplification. For this,
+
In order to rule out off-target effects for the designed crRNA in diagnostic applications, we developed a script that is able to blast the sequence against either whole databases online or a sub-database we created from transcriptome data of human and bacterial transcriptomes that are commonly found inside the nose and modell organisms used in our project including:
a reasonable cycle time for point-of-care application of one hour was chosen.
+
<ol style="list-style-type:disc; list-style-position:left; text-align: left;">
 +
<li>Homo Sapiens</li>
 +
<li>Escherichia Coli</li>
 +
<li>Bacillus subtilis</li>
 +
<li>Staphylococcus aureus</li>
 +
<li>Corynebacterium diphtheriae</li>
 +
<li>Streptococcus diphtheriae</li>
 +
<li>Haemophillus influenzae</li>
 +
</ol>
 
</p>
 
</p>
<div class="captionPicture">
 
<img width=800 align=center valign=center src="https://static.igem.org/mediawiki/2017/4/40/T--Munich--ModellingPagePicture_Cycle_Times2.png" alt="RT-RPA-TX">
 
 
<p>
 
<p>
Determining Cycle times to reach 10 nM Detection Limit using Amplification Circuit. Red dashed line marks the end of the thermolysis
+
Transcriptomes that would be necessary but were not available are:
 
</p>
 
</p>
</div>
+
<ol style="list-style-type:disc; list-style-position:left; text-align: left;">
 +
<li>Neisseria family</li>
 +
<li>Staphylococcus epidermidis</li>
 +
<li>Streptococcus pyogenes</li>
 +
</ol>
 +
<br>
 
<p>
 
<p>
When comparing this to cycle times needed for reaching the detection limit at 95 °C, one sees that lysis temperatures is not very important
+
All data was retreived from www.ensembl.org webpage from the Transcriptome Release #90.
to the amplification and only results in a slight shift to longer time scales. This is reasonable, since RPA, and PCR in general,
+
are enormously sensitive methods, and thus only need few templates to show a signal. Also, when comparing the concentrations
+
in the temperature screen above, one can observe that the concentrations of RNA within the sample only change insignificantly, all showing concentrations that range
+
within three-digit attomolar region or higher. Also, this model works with the statement in the literature that as little as 10 templates are enough to trigger amplification through RPA.
+
 
</p>
 
</p>
 +
</p>
 +
<pre style="text-align: left;">
 +
##################################################################
 +
####### Following possible off-targets have been identified ######
 +
##################################################################
 +
>seq 0
 +
sequence:gnl|BL_ORD_ID|2 KJJ58724 cdna:annotated supercontig:ASM95397v1:scaffold_31:1584:1937:1
 +
gene:NG01_11520 gene_biotype:protein_coding
 +
transcript_biotype:protein_coding description:hypothetical protein
 +
length:354
 +
e value:2.42551e-24
 +
identity:60
 +
GTGTCCGTTGAGACCCTTGCCAGCAACCATGTCGATCCGCTCCCCGAATCCGTTGCGTCT...
 +
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||...
 +
GTGTCCGTTGAGACCCTTGCCAGCAACCATGTCGATCCGCTCCCCGAATCCGTTGCGTCT...
 +
</pre>
 +
 
</td>
 
</td>
 
</tr>
 
</tr>
  
<tr><td colspan=6 align=center valign=center>
+
 
<h2>Signal Amplification Measurement in RPATx</h2>
+
<tr class="lastRow"><td colspan=6 align=center valign=center>
 +
<h3>Database</h3>
 +
<br>
 
<p>   
 
<p>   
When we performed time-dependent measurements of crRNA in a RPATx Ansatz, we measured saturation of T7 RNA Polymerase already at 0.2 nM template DNA. The reaction
+
The database program gives you an interface to interact with the MySQL database created for crRNAs that have been shown experimentally to work.  
kinetics and thus the formation of RNA showed pseudo-first order dynamics with a rate constant of 97 ng/min transcribed RNA. Compared to the literature (https://www.biosciencetechnology.com/article/2003/09/maximizing-yield-full-length-rna-vitro-transcription-reaction) this is not even the bottleneck since <i>In-Vitro Transcription</i> reactions can yield up to 400 μg in 4 hours. This led us to try out a concentration series of different template concentration and try whether we could detect the extracted RNA with Cas13a.  
+
 
</p>
 
</p>
 +
<pre style="text-align: left;">
  
<tr><td colspan=6 align=center valign=center>
+
############        Available Detection Targets        #################
<h3>References</h3>
+
 
<p>
+
[1] Virus
    <ol style="text-align: left">
+
[2] Bacteria
      <li id="ref_1"></li>
+
 
      <li id="ref_2"></li>
+
[0] Go back one step
      <li id="ref_3"></li>
+
 
      <li id="ref_4"></li>
+
What would you like to detect?2
      <li id="ref_5"></li>
+
 
 
+
############        Available Detection Targets        #################
    </ol>
+
 
 +
[1] Escherichia coli
 +
[2] Bacillus subtillis
 +
 
 +
[0] Go back one step
 +
 
 +
What would you like to detect?1
 +
 
 +
############            Choose your Target              #################
 +
 +
[1] E. Coli 16s rRNA
 +
 
 +
[0] Go back one step
 +
 
 +
What would you like to detect?1
 +
 
 +
###########      The sequence thou art looking for is : ################
 +
 +
ACUUUACUCCCUUCCUCCCCGCUGAAA
 +
 
 +
 
 +
 
 +
[9] Exit
 +
[0] Go back one step
 +
 
 +
 
 +
</pre>
 +
<p>
 +
 
 +
However, these still need to be tested for off-target effects experimentally since <i>in silico</i>  
 +
screening can only confirm specificity to a certain amount of certainty.
 
</p>
 
</p>
 
</td>
 
</td>
</tr>
+
</tr>
  
 
<tr><td class="no-padding" colspan=6 align=right valign=center height=10>
 
<tr><td class="no-padding" colspan=6 align=right valign=center height=10>

Revision as of 04:30, 1 November 2017


Software
###################################################################
#                                                                 #
#                        CascAID V1.0                             #
#                                                                 #
#                   Wed Nov  1 04:23:54 2017                      #		
#                                                                 #
#                      IGEM Munich 2017                           # 
#                                                                 #
#                                                                 #
#                                                                 #
#                                                                 #
#                  Please send bug reports to:                    #
#                                                                 #
#                         Sven Klumpe	                          #
#                                                                 #
#                E-Mail: sven.klumpe@tum.de                       #
#                                                                 #
###################################################################


CascAID is a potentially universal tool for nucleic acid detection. Fast adaptation of our platform to new targets requires in silico verification of the crRNA design. Crucial factors for the development of these crRNA designs are the binding of the crRNA to Cas13a mainly determined by its secondary structure and the uniqueness of the targeting sequence in the transcriptome to rule out false positive results. To ensure the integrity of the Cas13a-crRNA complex, we developed a python script that uses the established program packages for secondary structures NUPACK and Mfold. In order to verify the specificity of the targeting sequence, we used the BLASTN-short program to check for similar structures in a transcriptome databank. Additionally, we created a database of crRNA designs that have already worked and made it as extensive as possible given the limited time, checking for collaboration with other teams working with Cas13a, mainly TU Delft. The second branch of software needed for our project we developed consists of the software for hardware control. They allow user's devices such as computers and smartphones to control our hardware's devices, Heatbringer and Lightbringer. The repository to our software can be found here.

crRNA Design Verification

There are two main problems regarding the design of crRNA for a diagnostic test. First, the secondary structure of the crRNA needed for Cas13a activity needs to be verified. Second, the sequence targeted by the crRNA has to be specific, i.e. , there is no identical sequence in the reference transcriptome of an healthy patient. Otherwise off-target effects will lead to false positive results since Cas13a is activated even though the pathogen is not present. To address these issues, we developed a software relying on bioinformatic principles such as secondary structure prediction and Basic Local Alignment Searches Tools (BLAST).

Secondary Structure Prediction

For secondary structure prediction of the crRNA we utilised the two established program packages in the field, NUPACK and Mfold to compare newly designed crRNA with secondary structures of crRNAs that were already known to be active. These reference crRNA structures were either obtained from actual crystallography data of crRNA in complex with Cas13a, or from structure prediction data of experimentally tested crRNAs. Using secondary structure verification we were able to rule out misfolding crRNA designs prior to experiment. We developed a script for the end user automatising this procedure.

NUPACK


NUPACK is a RNA Secondary Structure Prediction program package developed by several contributors under the guidance of Prof. Niles A. Pierce at the California Insitute of Technology (Caltech). The source-code is available free-of-charge for academic usage. NUPACK allows the analysis of the partition function, the minimum free energy and the equillibrium base-pairing probabilities of a RNA sequence. For offline usage we implemented NUPACK locally. We proceeded to implement Mfold as a webserver request. This decision was made because we experienced that in certain cases, only one of the program packages was able to predict the secondary structure of crRNA as described in previous papers, predominantly the paper of Liu et al. published in Cell in 2017 "Two Distant Catalytic Sites Are Responsible for C2c2 RNase Activities". Also, a local run gives you the possibility of using the full spectrum of NUPACK's programs. By the use of several of the the final structure prediction, we estimated whether the crRNA would be active in Cas13a. Furthermore, we experienced that NUPACK sometimes predicts the right secondary structure, it just doesn't represent the most stable structure. With NUPACK's subopt, it is possible to predict more than just the most stable structure. This enables looking at less stable structures since the protein may compensate for non-ideal structures by giving the right environment for stabilisation and compare this to the structure databank. The output of a suboptimal prediction is given in Example 2, the explanation is added after '#' for commenting:

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
66																	#### length of the sequence
-9.400																#### free energy of the structure
.....................(.((((((.((((....)))).)))))).)...............	#### secondary structure in Vienna notation
22      51															#### IDs of bases that form basepairs
24      49															#### this would mean base 22 pairs with base 49
25      48
26      47
27      46
28      45
29      44
31      42
32      41
33      40
34      39
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
66
-9.300
.....................(((((((..((((....)))).)))))))................
22      50
23      49
24      48
25      47
26      46
27      45
28      44
31      42
32      41
33      40
34      39
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %

From this, we can extract the secondary structure in Vienna notation as well as the free energies of the RNA structure to predict the probability of formation in solution with help of the calculation of the full partition function. Using these results, the user can make qualitative assumptions about the activity of the corresponding Cas13a-crRNA complex.

Mfold


Mfold is a webserver for RNA secondary structure prediction developed by Michael Zuker based on his paper "Mfold web server for nucleic acid folding and hybridization prediction" that published in Nucleic Acids Research in 2003. Since Mfold is not available as a locally buildable binary for every operating system, we developed a script that automatically requests a standardised RNA Fold job from the server, therefore making it available throughout all operating systems. Using the result obtained from this request, the secondary structure is checked via a string comparison in so-called "Vienna" notation. This notation gives base pairing as a string of dots and brackets where a dot represents a non-bonded base and brackets form the base-pairs, clarified by a opening bracket "(" at the 5'-end of the base-pair and a closing bracket ")" at the 3'-end. An example for the output of the program is given below:


#######################################################################################
#################### CascAID Secondary Structure Verification #########################
#######################################################################################



#######################################################################################
##################### NUPACK Secondary Structure Verification #########################
#######################################################################################
	

GOOD NEWS! YOU'VE GOT THE RIGHT SECONDARY STRUCTURE!
YOUR SEQUENCE WAS:

5' GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGGC  3'

      (((((....((((.........)))).)))))                                  ########   MATCHED SECONDARY STRUCTURE
   ...(((((....((((.........)))).)))))..((((..........)))).........     ######## PREDICTED SECONDARY STRUCTURE
___________________________________________________________________

		YOUR BACKBONE SEQUENCE HAS BEEN FOUND IN THE DATABANK
		IT CORRESPONDS TO THE BACKBONE SEQUENCE OF: lwaCas13a
______________________________________________________________________________________

Job ended normally. Sun Oct 29 23:46:28 2017
Do you have internet connectivity? [yes/no]yes


#######################################################################################
#################### MFOLD SECONDARY STRUCTURE VERIFICATION ###########################
#######################################################################################



		#################### CAUTION! ##################### 
		mFOLD SECONDARY STRUCTURE DOES NOT FIT OUR DATA BANK
		#################### CAUTION! #####################


YOUR SEQUENCE AND MOST STABLE PREDICTED STRUCTURE IS:

5' GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGG 3'
   ..(((((((((.((((.........)))).......((.....)).....)))).)))))....
______________________________________________________________________________________

Job ended normally. Sun Oct 29 23:47:06 2017


This is also a good example to show that the case can occur that one program recognizes the crRNA secondary structure while the other does not. In this case, NUPACK has predicted the structure while Mfold is not able to predict the structure. Even though this is an experimental construct that worked, we did not put the secondary structure prediction of this into the database for Mfold, since it was unable to predict the right structure.

Off-Target Effects

In order to rule out off-target effects for the designed crRNA in diagnostic applications, we developed a script that is able to blast the sequence against either whole databases online or a sub-database we created from transcriptome data of human and bacterial transcriptomes that are commonly found inside the nose and modell organisms used in our project including:

  1. Homo Sapiens
  2. Escherichia Coli
  3. Bacillus subtilis
  4. Staphylococcus aureus
  5. Corynebacterium diphtheriae
  6. Streptococcus diphtheriae
  7. Haemophillus influenzae

Transcriptomes that would be necessary but were not available are:

  1. Neisseria family
  2. Staphylococcus epidermidis
  3. Streptococcus pyogenes

All data was retreived from www.ensembl.org webpage from the Transcriptome Release #90.

##################################################################
####### Following possible off-targets have been identified ######
##################################################################
>seq 0
sequence:gnl|BL_ORD_ID|2 KJJ58724 cdna:annotated supercontig:ASM95397v1:scaffold_31:1584:1937:1 
gene:NG01_11520 gene_biotype:protein_coding 
transcript_biotype:protein_coding description:hypothetical protein
length:354
e value:2.42551e-24
identity:60
GTGTCCGTTGAGACCCTTGCCAGCAACCATGTCGATCCGCTCCCCGAATCCGTTGCGTCT... 
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||... 
GTGTCCGTTGAGACCCTTGCCAGCAACCATGTCGATCCGCTCCCCGAATCCGTTGCGTCT... 

Database


The database program gives you an interface to interact with the MySQL database created for crRNAs that have been shown experimentally to work.


############         Available Detection Targets         #################

[1] Virus
[2] Bacteria

[0] Go back one step

What would you like to detect?2

############         Available Detection Targets         #################

[1] Escherichia coli
[2] Bacillus subtillis

[0] Go back one step

What would you like to detect?1

############             Choose your Target              #################
	
[1] E. Coli 16s rRNA

[0] Go back one step

What would you like to detect?1

###########      The sequence thou art looking for is : ################
	
ACUUUACUCCCUUCCUCCCCGCUGAAA



[9] Exit
[0] Go back one step


However, these still need to be tested for off-target effects experimentally since in silico screening can only confirm specificity to a certain amount of certainty.