Design
The first thing came into our mind was Cas9 system. But how can we apply Cas9 system into detecting ctDNA? In searching for solution, the project of Peking University IGEM in 2015 brought a lot of inspiration to our design.
Paired dCas9 design of Peking University and NUDT
Peking University IGEM team in 2015 dedicated to build a new reporter that is able to convert the sequence-specific information into easily readable signal including bioluminescence.[1] Compare to the previous nucleic acid detection methods such as q-PCR, probe method and direct sequencing, their method carries out a faster, less-risky and more specific detection.
Figure 3. Schematic of the paired dCas9 (PC) Reporter system of Peking University. A complex is formed by dCas9, split luciferase(Nluc or Cluc) and sgRNA. In the presence of target DNA, each of the complexes will bind to the specific sites indicated by sgRNA. When they approach to each other, the split luciferase will reassemble and generate bioluminescence signal.
In order to increase the specificity and visualization of the results, Peking University invented a programmable paired-dcas9 report system (PC report system) with luciferase. dCas9 is a variant of Cas9 protein that is catalytically dead. In other word, it can bind with sgRNA and lead itself to target site like Cas9 but it is unable to cleave DNA strand. The system designed by Peking University consists of a pair of dcas9 which fused with split luciferase (Nluc and Cluc) respectively. Thus, two kinds of split luciferase-dCas9-sgRNA complexes can be formed.(Nluc-dCas9 and Cluc-dCas9) With the presence of target DNA, each of the complexes would bind with target DNA. In this way, when a pair of sgRNA are designed properly close in target sequence, the two complexes would approach to each other so that two split luciferase attached to them would reassemble and generate bioluminescence signal.
NUDT of IGEM 2016 had a similar design. [2] Although intended to detect microRNA, their split-HRP-dCas9 design can also be utilized to detect ctDNA. The reunion of split HRP will convert the presence of target DNA into visual output signal by adding substrates such as 3,3',5,5'-Tetramethylbenzidine (TMB)
Figure 4. Schematic representation of split HRP PC report system of NUDT[3]. Each fragments of split HRP(sHRPN and sHRPC) will bind to dCas9 protein, and they will together form a sHPR-dCas9-sgRNA complex. Like PC report system of Peking University, the split HRP will reunion in the presence of target gene. It will then produce visual signal by adding substrates such as 3,3',5,5'-Tetramethylbenzidine (TMB).
However, there are still several possible improvements in those systems. First, due to its reliance on luciferase, it need other materials like fluorescein to function. Also, it is only able to produce one type of output -- bioluminescence. Most importantly, the system cannot amplify signal. In order to deal with the problem of extremely low concentration of ctDNA in blood, we created a system similar to the previous but using T7 polymerase.
Paired dCas9-T7 Report System
Paired dCas9-T7 report system is a more efficient and precise PC report system with flexible outputs and signal amplifying ability.
Like Peking University, we split the T7 polymerase and connect it with dCas9 protein via a linker. We call each complex NT7-dCas9 and CT7-dCas9.
Figure 5. The schematic illustration of paired dCas9 system(1). dCas9 is connected to one of the piece of split T7 RNA polymerase(NT7 and CT7). They will combine with sgRNA beforehand and constitute the split T7-dCas9-sgRNA complex.
Each complex by itself is inactive, but when the two complex attached to chosen target sites for identification of special sequences, they will reassemble to form a completed active T7 polymerase and start transcription in the presence of T7 promotor in cell free environment.
Figure 6. The schematic illustration of paired dCas9 system(2). With the presence of target DNA, each complex will bind with pre-designed sgRNA-binding site on the sequence. When the split T7 polymerase approach each other close enough, they will become active and start transcription by adding report gene with T7 promotor in cell free system. After transcription, the mRNA will be translated into report protein like GFP and RFP which generate signal output.
Our design has two major advantages over Peking University’s project. First, we made various types of output possible. Because the reassembled protein is an RNA polymerase, it can convert ctDNA into different types of signals, including GFP, RFP, luciferase or LacZ. Besides, with the presence of Target sequence, reunion T7 polymerase can transcribe DNA into mRNA which will be further translated into protein consistently. Therefore, the signal can be magnified over time even if the initial ctDNA concentration is undesirable.
Target sequence and sgRNA binding site
After we confirmed on our theoretical detection process, we started looking for a suitable gene locus. We decided at last that the fused gene EML4-ALK is our ideal sequence.
The echinoderm microtubule-associated protein-like 4-anaplastic lymphoma kinase (EML4-ALK) is a fusion gene resulting from chromosome inversion and was identified in nonsmall cell lung cancer(NSCLC). (Wong DW) [4] The specific medicine, ALK inhibitor Crizotinib, has been reported to have curative effect on repression of the tumor and remission of relative symptom.[5]
Figure 7. Schematic representation of a fusion gene.
We decide to detect EML4-ALK because 1) it has effective specific drug. 2) fusion gene can effectively avoid off-target effect of Cas9 system.
First, many types of ctDNA have already been detected but doctors only have drugs for few of them. Thus, detecting EML4-ALK whose specific drug, crizotinib, has been found, is more beneficial to patients than detecting other ctDNA. Patients can have precise treatment as soon as possible when the detection result is positive.
Second, false-positive result can be evaded by detecting fusion gene. Sometimes dCas9 protein may recognize a very similar site as its target and it generates false positive result in this case. For example, the sgRNA may bind with complementary sequence with one or two incompatible base pair. However, the corresponding T7 polymerase will continue to reunion and give out positive output. In this way, our system will generate a lot false-positive result in detecting point mutation or SNP because it will confuse the normal sequence and cancer sequence with one or two mutations. However, fused gene can obviate this problem. Fused ctDNA composed of two fused normal DNA, which means that if we design our two sgRNA binding sites separately in two sequence or located in the part they connected to with each other, we can avoid off-target effect of the report system.
The EML4-ALK fused gene has two variants (A and B).
More details:
https://www.ncbi.nlm.nih.gov/nuccore/AB374361.1 EML4-ALK variant 3 a
https://www.ncbi.nlm.nih.gov/nuccore/AB374362.1 EML4-ALK variant 3 b
We design our sgRNA binding site as the principles as mentioned – distributing two binding sites in each of two parts of the sequence separately. Also, there are other limits in selecting sgRNA binding site. First, it needs a PAM site next to it(NGG). Also, the distance between a pair of sites(SpaceDistance) should either too close or too distant(5-35 is desirable, as suggested by Peking University), so the choices of sites are further limited.
First, we wrote a computer program to find all PAM site and find out of the combination with the distance between them.
Computer stimulation for Variant A
Table 1. Computer stimulation result for possible sgRNA binding sites of EML4-ALK variant A and the distance between them.
Computer stimulation for Variant B
Table 2. Computer stimulation result for possible sgRNA binding sites of EML4-ALK variant B and the distance between them.
We chose four sgRNA binding sites from variant A(23a 33a 83a 93a) and three sites from variant B. (70b 126b 147b), with 7 combination in total.
Figure 8. The four sgRNA binding sites of fusion gene EML4-ALK Variant A. We designed four sgRNA binding sites in target gene – sgRNA23, sgRNA33, sgRNA83 and sgRNA93. The number of them indicates the location of their first base pair in target gene. The combination of sgRNA binding sites will be based on the rule that a pair of sgRNA should be distributed in EML4 or ALK area separately or one of them located on the connecting area of the EML4-ALK fusion gene.
Figure 9. The three sgRNA binding sites of fusion gene EML4-ALK Variant B. We designed three sgRNA binding sites in target gene – sgRNA70, sgRNA126 and sgRNA147. The number of them indicates the location of their first base pair in target gene. The combination of sgRNA binding sites will be based on the rule that a pair of sgRNA should be distributed in EML4 or ALK area separately or one of them located on the connecting area of the EML4-ALK fusion gene.
We intended to translate sgRNA in vitro. In order to achieve it, a cassette of T7-promotor-sgRNA-T7 terminator is indispensable. We planed to insert it into psb1c3 plasmid and submit them to IGEM registry as our new Biobricks.
Figure 10. The plasmid of sgRNA generator. The sgRNA generator Biobrick is composed of a T7 promotor, coding sequence for sgRNA and a T7 terminator.
We planned to use Polymerase Cycling Assembly(PCA) to construct those sequences.
The construction scheme was acquired from DNAWorks, which is a website for “Automatic oligonucleotide design for PCR-based gene synthesis.”[6]
.
Figure 11. The schematic illustration of Polymerase cycling assembly(PCA). [7]
Since those sgRNA cassette only have a 20bp difference, we speculated that experiments can be simplified by finding “overlapping” solutions given by DNAWorks. We tried to find as many same oligo DNA in the solution of seven sgRNA as possible. We wrote a program to do it and reduced the ordered oligo DNA from 42 to 24 in the end.
Figure 12. The cartoon explanation of saving ordered oligo DNA. The rectangle of each color represents a particular oligo DNA. By finding shared oligo DNA in over 200 solutions provided by DNAworks, we can save the ordered Oligo DNA.
1.Yihao,Z. et al.,2017.Paired Design of dCas9 as a Systematic Platform for the Detection of Featured Nucleic Acid Sequences in Pathogenic Strains. ACS Synth. Biol., 2017, 6 (2), pp 211–216.
2.Development of A NovelBlood-MicroRNA Handy Detection System with CRISPR. https://2016.igem.org/Team:NUDT_CHINA
3.https://2016.igem.org/Team:NUDT_CHINA/Design
4.Wong DW, et al. The EML4-ALK fusion gene is involved in various histologic types of lung cancers from nonsmokers with wild-type EGFR and KRAS. Cancer. 2009 Apr 15;115(8):1723-33.
5.Solomon BJ, Mok T, Kim DW, et al. First-line crizotinib versus chemotherapy in ALK-positive lung cancer. N Engl J Med 2014; 371 :2167-77
6.https://hpcwebapps.cit.nih.gov/dnaworks/
7.https://en.wikipedia.org/wiki/File:PCA_polymerase_cycling_assembly.jpg