|
|
Line 307: |
Line 307: |
| | | |
| In order to rule out off-target effects for the designed crRNA in diagnostic applications, | | In order to rule out off-target effects for the designed crRNA in diagnostic applications, |
− | we developed a script that is able to BLAST the sequence against either whole databases | + | we developed a script that is able to BLAST the sequence either against whole databases |
− | online or a sub-database we created from transcriptome data of human and bacterial transcriptomes | + | online or a customary database for human and bacterial transcriptomes containing data from human |
− | that are commonly found inside the nose and modell organisms used in our project including:
| + | as well as from bacteria common in the human nasal tract and modell organisms used in our project including: |
| <ol style="list-style-type:disc; list-style-position:left; text-align: left;"> | | <ol style="list-style-type:disc; list-style-position:left; text-align: left;"> |
| <li>Homo Sapiens</li> | | <li>Homo Sapiens</li> |
Line 321: |
Line 321: |
| </p> | | </p> |
| <p> | | <p> |
− | Transcriptomes that would be necessary but were not available are: | + | Transcriptomes that would be necessary since common in the nasal truct but were not available are, |
| + | beneath others: |
| </p> | | </p> |
| <ol style="list-style-type:disc; list-style-position:left; text-align: left;"> | | <ol style="list-style-type:disc; list-style-position:left; text-align: left;"> |
|
|
|
|
|
|
|
Software
###################################################################
# #
# CascAID V1.0 #
# #
# Wed Nov 1 04:23:54 2017 #
# #
# IGEM Munich 2017 #
# #
###################################################################
|
CascAID is a potentially universal tool for nucleic acid detection.
Fast adaptation of our platform to new targets requires in silico verification of the crRNA design.
Crucial factors for the development of these crRNA designs are the binding of the crRNA to Cas13a
mainly determined by its secondary structure and the uniqueness of the targeting sequence in the transcriptome
to rule out false positive results. To ensure the integrity of the Cas13a-crRNA complex, we developed
a python script that uses the established program packages for secondary structures NUPACK and Mfold.
In order to verify the specificity of the targeting sequence, we used the BLASTN-short program to
check for similar structures in a transcriptome databank. Additionally, we created a database of crRNA designs
that have already worked and made it
as extensive as possible given the limited time, checking for collaboration with other teams working with Cas13a,
mainly TU Delft.
The second branch of software needed for our project we developed consists of the software for hardware control.
They allow user's devices such as computers and smartphones to control
our hardware's devices, Heatbringer and Lightbringer.
The repository to our software can be found here.
|
crRNA Design Verification
There are two main problems regarding the design of crRNA for a diagnostic test.
First, the secondary structure of the crRNA needed for Cas13a activity needs to be verified.
Second, the sequence targeted by the crRNA has to be specific, i.e. , there is no identical sequence in the
reference transcriptome of an healthy patient. Otherwise off-target effects will lead to
false positive results since Cas13a is activated even though the pathogen is not present.
To address these issues, we developed a software relying on bioinformatic principles such as
secondary structure prediction and Basic Local Alignment Searches Tools (BLAST).
|
Secondary Structure Prediction
For secondary structure prediction of the crRNA we utilised the two established program packages
in the field, NUPACK and Mfold to compare newly designed crRNA with secondary structures of crRNAs that
were already known to be active. These reference crRNA structures were either obtained from actual
crystallography data of crRNA in complex with Cas13a, or from structure prediction data of experimentally
tested crRNAs. Using secondary structure verification we were able to rule out misfolding crRNA
designs prior to experiment. We developed a script for the end user automatising this procedure.
|
NUPACK
NUPACK is a RNA Secondary Structure Prediction program package developed
by several contributors under the guidance of Prof. Niles A. Pierce at the California Insitute of Technology (Caltech).
The source-code is available free-of-charge for academic usage.
NUPACK allows the analysis of the partition function, the minimum free energy and the equillibrium base-pairing
probabilities of a RNA sequence.
For offline usage we implemented NUPACK locally. We proceeded to implement Mfold as a webserver request.
This decision was made because we experienced that in certain cases, only one of the program packages
was able to predict the secondary structure of crRNA as described in previous papers, predominantly the paper of Liu et al. published in Cell in 2017
"Two Distant Catalytic Sites Are Responsible for C2c2 RNase Activities". Also, a local run gives you
the possibility of using the full spectrum of NUPACK's programs.
By the use of several of the the final structure prediction, we estimated whether the
crRNA would be active in Cas13a.
Furthermore, we experienced that NUPACK sometimes predicts the right secondary structure, it just doesn't represent
the most stable structure. With NUPACK's subopt, it is possible to predict more than just
the most stable structure. This enables looking at less stable structures since the protein may compensate for
non-ideal structures by giving the right environment for stabilisation and compare this to the
structure databank. The output of a suboptimal prediction
is given in Example 2, the explanation is added after '#' for commenting:
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
66 # length of the sequence
-9.400 # free energy
.....................(.((((((.((((....)))).)))))).)............... # secondary structure
22 51 # IDs of bases that form basepairs
24 49 # form basepairs
25 48 # this would mean base 22
26 47 # pairs with base 49
27 46
28 45
29 44
31 42
32 41
33 40
34 39
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
66
-9.300
.....................(((((((..((((....)))).)))))))................
22 50
23 49
24 48
25 47
26 46
27 45
28 44
31 42
32 41
33 40
34 39
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
From this, we can extract the secondary structure in Vienna notation as well as the free energies
of the RNA structure to predict the probability of formation in solution with help of the calculation
of the full partition function. Using these results, the user can make qualitative assumptions about
the activity of the corresponding Cas13a-crRNA complex.
|
Mfold
Mfold is a webserver for RNA secondary structure prediction developed by Michael Zuker based on his paper
"Mfold web server for nucleic acid folding and hybridization prediction" that published in Nucleic Acids Research
in 2003. Since Mfold is not available as a locally buildable binary for every operating system, we developed a
script that automatically requests a standardised RNA Fold job from the server, therefore making it available
throughout all operating systems. Using the result obtained from this request, the secondary structure is
checked via a string comparison in so-called "Vienna" notation. This notation gives base pairing as a string
of dots and brackets where a dot represents a non-bonded base and brackets form the base-pairs, clarified by
a opening bracket "(" at the 5'-end of the base-pair and a closing bracket ")" at the 3'-end. An example for the output
of the program is given below:
#######################################################################################
#################### CascAID Secondary Structure Verification #########################
#######################################################################################
#######################################################################################
##################### NUPACK Secondary Structure Verification #########################
#######################################################################################
GOOD NEWS! YOU'VE GOT THE RIGHT SECONDARY STRUCTURE!
YOUR SEQUENCE WAS:
5' GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGGC 3'
(((((....((((.........)))).))))) #### MATCHED SECONDARY STRUCTURE
...(((((....((((.........)))).)))))..((((..........))))......... #### PREDICTED SECONDARY STRUCTURE
___________________________________________________________________
YOUR BACKBONE SEQUENCE HAS BEEN FOUND IN THE DATABANK
IT CORRESPONDS TO THE BACKBONE SEQUENCE OF: lwaCas13a
______________________________________________________________________________________
Job ended normally. Sun Oct 29 23:46:28 2017
Do you have internet connectivity? [yes/no]yes
#######################################################################################
#################### MFOLD SECONDARY STRUCTURE VERIFICATION ###########################
#######################################################################################
#################### CAUTION! #####################
mFOLD SECONDARY STRUCTURE DOES NOT FIT OUR DATA BANK
#################### CAUTION! #####################
YOUR SEQUENCE AND MOST STABLE PREDICTED STRUCTURE IS:
5' GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGG 3'
..(((((((((.((((.........)))).......((.....)).....)))).)))))....
______________________________________________________________________________________
Job ended normally. Sun Oct 29 23:47:06 2017
This is also a good example to show that the case can occur that one program recognizes the
crRNA secondary structure while the other does not. In this case, NUPACK has predicted the structure
while Mfold is not able to predict the structure. Even though this is an experimental construct
that worked, we did not put the secondary structure prediction of this into the database for Mfold,
since it was unable to predict the right structure.
|
Off-Target Effects
In order to rule out off-target effects for the designed crRNA in diagnostic applications,
we developed a script that is able to BLAST the sequence either against whole databases
online or a customary database for human and bacterial transcriptomes containing data from human
as well as from bacteria common in the human nasal tract and modell organisms used in our project including:
- Homo Sapiens
- Escherichia Coli
- Bacillus subtilis
- Staphylococcus aureus
- Corynebacterium diphtheriae
- Streptococcus diphtheriae
- Haemophillus influenzae
Transcriptomes that would be necessary since common in the nasal truct but were not available are,
beneath others:
- Neisseria family
- Staphylococcus epidermidis
- Streptococcus pyogenes
All data was retreived from www.ensembl.org webpage from the Transcriptome Release #90.
##################################################################
####################### Input Sequence ###########################
##################################################################
Your Sequence was:
GGAUUUAGACUACCCCAAAAACGAAGGGGACUAAAACUGAUAAAGAAGACAGUCAUAAGUGCGGC
Your target sequence thus is:
UGAUAAAGAAGACAGUCAUAAGUGCGGC
Sun Oct 29 23:49:22 2017
##################################################################
##################################################################
####### Following possible off-targets have been identified ######
##################################################################
>seq 0
sequence:gnl|BL_ORD_ID|91933 ENSBTAT00000042836.3 cds chromosome:
UMD3.1:3:107697178:107698719:1 gene:ENSBTAG00000030338.3
gene_biotype:protein_coding transcript_biotype:protein_coding
gene_symbol:GJA9 description:gap junction protein alpha 9
[Source:HGNC Symbol;Acc:HGNC:19155]
length:1542
e value:1.50536
identity:18
ATAAAGAAGACAGTCATAA...
|||||||||||| ||||||...
ATAAAGAAGACACTCATAA...
>seq 1
sequence:gnl|BL_ORD_ID|69018 ENSBTAT00000042836.3 cdna
chromosome:UMD3.1:3:107697178:107698719:1 gene:ENSBTAG00000030338.3
gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:GJA9
description:gap junction protein alpha 9
[Source:HGNC Symbol;Acc:HGNC:19155]
length:1542
e value:1.50536
identity:18
ATAAAGAAGACAGTCATAA...
|||||||||||| ||||||...
ATAAAGAAGACACTCATAA...
___________________________________________________________
Job ended normally. Sun Oct 29 23:49:22 2017
These results have also been saved in: off_target.out
The full BLAST output can be found in: seq.xml
|
Database
The database program gives you an interface to interact with the MySQL database created for
crRNAs that have been shown experimentally to work.
############ Available Detection Targets #################
[1] Virus
[2] Bacteria
[0] Go back one step
What would you like to detect?2
############ Available Detection Targets #################
[1] Escherichia coli
[2] Bacillus subtillis
[0] Go back one step
What would you like to detect?1
############ Choose your Target #################
[1] E. Coli 16s rRNA
[0] Go back one step
What would you like to detect?1
########### The sequence thou art looking for is : ################
ACUUUACUCCCUUCCUCCCCGCUGAAA
[9] Exit
[0] Go back one step
However, these still need to be tested for off-target effects experimentally since in silico
screening can only confirm specificity to a certain amount of certainty.
|
Hardware control
The software for reading out the fluorescence detector is described in the
Hardware section.
All software developed for hardware can be found in our
GitHub repository.
|
|
|
|
|
|
| |