Team:Bielefeld-CeBiTec/Project/unnatural base pair/preservation system

Retention and Preservation System

Short Summary

The retention and preservation of unnatural nucleoside triphosphates and unnatural bases in the DNA requires a cellular retention and preservation system acting on multiple cellular levels. On one hand, degradation of the unnatural nucleoside triphosphates by enzymes like cytosine deaminase must be prohibited for efficient integration into the DNA. On the other hand, once the unnatural bases are incorporated into the DNA, mutations of the unnatural bases have to be prevented. This can be achieved by adapting CRISPR/Cas9 to detect and prevent mutations of the unnatural bases.

Preservation system using Cas9

Due to tautomerisation of isoG and hydrolysis of isoC^m there is a need for a system, to preserve the unnatural base pair (UBP) on the plasmid. In 2017, Zhang et al. successfully deployed a CRISPR (clustered regularly interspaced short palindromic repeat)-Cas9 system for retention of a UBP. We adapted this conservation system to our UBP and thus used the bacterial immune response to eliminate all plasmid DNA that had lost the UBP.
The nuclease Cas9 is part of the adaptive immune system of Streptococcus pyogenes, where it induces double strand breaks in the genomic DNA. This enzyme is recruited by a CRISPR RNA (crRNA). A crRNA consists of direct repeats interspaced by variable sequences called protospacer. Those protospacers are derived from foreign DNA and encode the Cas9 guiding sequence (guide RNA). An auxiliary transactivating crRNA (tracrRNA) helps processing the precursor crRNA array into an active crRNA that contains the 20 nucleotide guide RNA. The guide RNA binds to the complementary genomic DNA sequence via Watson‑Crick base pairing. For this binding, the genomic DNA sequence needs to be located upstream of a CRISPR type II specific 5’ NGG protospacer adjacent motif (PAM). To combine crRNA and tracrRNA a chimeric single stranded guide RNA (sgRNA) was designed. Therefore, only the 20 nucleotide guiding sequence needs to be exchanged for targeting any genomic sequence followed by a PAM sequence (Ran et al., 2013 a, b). A double strand break introduced by Cas9 leads to DNA degradation by exonucleases in prokaryotic cells (Simmon and Lederberg, 1972).
The UBP isoG and isoC^m is an orthogonal system. UBP inside the target DNA causes a mismatch to the sgRNA generally reducing the cleavage activity of Cas9 (Zhang et al., 2017). Accordingly, Cas9 can be programmed by sgRNAs to cleave all plasmids, which had lost the UBP due to point mutations (Figure 1). Consequently, this leads to a retention of the UBP in the plasmids.

Figure 1: UBP conservation system using Cas9.
sgRNAs are targeted against every possible DNA sequence that had lost the UBP, which was incorporated on a plasmid. A: The UBP gets lost, which leads to a point mutation. One of the sgRNAs can bind to the DNA target sequence. Cas9 is recruited and cleaves the plasmid, which is followed by its degradation. B: Plasmids that contain a UBP in the DNA target sequence lead to a mismatch with every sgRNA. Cas9 does not cleave the plasmid, leading to a retention of the UBP.

Because the iGEM competition is based on plasmids that can be submitted to the parts registry we abandoned the idea of creating a novel strain for the UBP retention. So our retention system consists of two plasmids. The first plasmid (BBa_K2201010) contains the nucleotide transporter gene PtNTT2 and the cas9 on a pSB1K3 backbone. This plasmid was transformed into E. coli BL21(DE3). After producing chemically competent cells, another transformation was done with the second plasmid. The second plasmid carries the UBP and the sgRNAs for recruiting Cas9 on a pSB3C5 plasmid. Since two large plasmids produce a great metabolic stress for an organism, a more sufficient UBP retention could be achieved with a newly created E. coli strain, that has the genes PtNTT2 and cas9 in its genomic DNA. We provide the parts that are required to create a repair template for the knock in strategy using a CRISPR/Cas9 system. These parts were designed to perform a genomic integration into E. coli BL21(DE3). The genomic knock in can be done according to the protocol by Cobb et al., 2014 using the pCRISPomyces plasmid system. A sgRNA was designed searching for a reverse sequence with the constraint N(16)R(4)NGG (Cobb et al., 2004) inside the coding sequence of the arsB gene (Zhang et al., 2017) of the E. coli genome. arsB is coding for an arsenic efflux pump membrane protein. As a result, 5‘ TATTGTTCATAATAGAAGAG 3‘ turned out to be the suggested guiding sequence with the highest on target activity score with being unique within the complete genome. The required repair template is designed with 1 kb long flanking sequences and two terminators BBa_B0015 flanking the gene of interest for example cas9 (Figure 2). To be able to strictly regulate the expression of cas9 we rationally designed an optimized IPTG inducible promoter P_lac‑tight.

Figure 2: The repair template (BBa_K2201028) for genomic integration of cas9 into E. coli BL21(DE3).
The 1 kb left flanking sequence (LFS: BBa_K2201021) and right flanking sequence (RFS: K2201022) are taken from the genome of E. coli BL21(DE3). Inside the genome LFS and RFS are directly flanking the coding sequence of arsB. The strong terminators from BBa_B0015 were used to avoid basal expression and to stop transcription right after cas9. There are composite parts the consist of LFS + terminator + P_lac‑tight (BBa_K2201024) and terminator + RFS (BBa_K2201025) that can be assembled with any coding seqeunce of interest to create a repair template. The coding sequence of cas9 is taken from the pCRISPomyces plasmid system by Cobb et al., 2014 and is originated from S. pyogenes. It is negatively regulated by the IPTG‑inducible promoter P_lac‑tight.

This composite BioBrick from Figure 2 needs to be restricted with NotI to separate the linear repair template from the pSB1C3 backbone. Together with the target plasmid containing the sgRNA and the pCRISPomyces plasmid containing the cas9 all three elements were co‑transfected into E. coli BL21(DE3). The genomic integration was verified by sequencing.

Optimization of the negatively regulating promoter P_lac

We designed a lac operon for a tight repression called P_lac‑tight to achieve a low transcription rate. The induction of the wild type lac operon increases the level of β‑galactosidase 1000‑fold (Müller et al., 1996). As part of the lac operon the lac repressor was the first repressor isolated and sequenced in 1966 by Gilbert and Müller‑Hill. The wild type lac operon consists of the genes lacZ, lacY and lacA that are transcribed from the lac promoter P_lac into a polycistronic mRNA (Figure 4). These genes code for the proteins β‑galactosidase, Lac permease and Lac transacetylase (Oehler et al., 1994).

Figure 3: DNA map of the wild type lac operon and its transcriptional and translational products (Oehler et al., 1994).

The transcription is constitutively activated by the CAP protein. lacI codes for the tetrameric Lac repressor and is expressed by the promoter P_i. The Lac repressor can bind to the lac operators O1, O2 or O3. O3 is located 92 bp upstream of O1 and O2 401 bp downstream of O1. The inter‑operator distances are counted from the center of O1 to the center of the distal operator. By binding simultaniously to O1 and O2 or to O1 and O3, the Lac repressor forms a DNA loop that negatively controlls the expression of P_lac (Oehler et al., 1994).

Figure 4: The wild type lac operators.
The tetrameric Lac repressor can bind either the lac operators O1 and O3 or O1 and O2 to form a DNA loop. The DNA loops efficiently inhibit the transcription by the CAP protein (Oehler et al., 1994).

In 1994, Oehler et al. showed that an inactivated O2 in its natural position does not decrease the repression by low amounts of tetrameric Lac repressor. Based on this result, we designed our P_lac‑tight without the lac operator O2. It was also shown that two weak operators result in a tighter repression than a single strong operator. This can be explained by the thermodynamic concept that a second operator increases the local concentration of the Lac repressor for the neighboring operator. As a consequence there is a higher probability of occupation for two operators by the Lac repressor leading to a tighter repression (Oehler et al., 1996). In 1983, Sadler et al. proposed a ideal lac operator O_id that binds the Lac repressor 10‑fold tighter than the natural strong lac operator O1. O_id is a inverted repeat of the left half of O1 (Figure 5).

Figure 5: The lac operator O_id.
The inversion is indicated by the arrow for the perfectly symmetric lac operator O_id. It is the inverted repeat of the left half of O1 (Sadler et al. 1983).

The operator‑DNA‑operator complex requires energy for the bending process in order to form a DNA loop. Additional energy for a torsion is required when the two lac operators lay on opposite sites of the helical DNA surface. Therefore, the DNA loop formation is energetically favoured for lac operators in phase (Müller et al., 1996). In 1996, Mueller et al. investigated the strength of repression for an inter‑operator distance of O_id and O1 from 57.5 bp up to 1493.5 bp. The repression values were compared to the repression by a single O1 at its natural position. A shorter spacing than 57.5 bp could not be examined due to the 35 box of the promoter. Phase dependency for the repression was observed for a spacing around 200 bp. That leads to the observation of periodically maxima for repression values (Figure 6).

Figure 6: Repression values dependent on inter‑operator distances between O1 and O_id.
The repression values refer to the repression of the chromosomal lacZ gene under the control of O1 at its natural positon and O_id at the indicated position. With 50 tetrameric Lac repressors per cell the repression value is calculated by the specific activity of β‑galactosidase in absence of active Lac repressor devided by the specific activity of β‑galactosidase in the presence of active Lac repressor. The dashed line shows the repression value for a single natural O1 operator (Mueller et al., 1996).

The distance of 70.5 bp showed the strongest repression value which is 50‑fold higher than the natural repression. Repression drops sharply to 15‑fold at a 150.5 bp spacing and to threefold at around 600 bp. All inter‑operator distances beyond 600 bp kept a twofold increased repression value (Müller et al., 1996).
According to these results, our P_lac‑tight consists of the auxiliary operator O_id with a 70.5 bp spacing to O1 at its natural position. The residuray sequence like the P lac was kept as the natural lac operon taken from E. coli BL21(DE3) (Figure 7).

Figure 7: Tight lac operon P_lac‑tight.
The figure with its annotations was created with the software Geneious 10.0.8.

Deletion of codA

To retain the unnatural base pair, the degradation of the unnatural nucleoside triphosphates must be minimized. In E. coli, the gene codA codes for the cytosine deaminase, an enzyme of the pyrimidine metabolism. The cytosine deaminase catalyzes the reaction of cytosine to uracil. Furthermore, it can catalyze the deamination of isoguanosione and isocytosine. Isoguanosine is formed during oxidative stress by reaction of the radical oxygen species (ROS) •OH with adenine. Other products from reactions of adenine with ROS include 8-oxoadenine and 6-N-hydroxyaminopurine.

Figure 8: Reactions catalyzed by the cytosine deaminase.
A) The conversion of cytosine to uracil is a normal reaction step within the pyrimidine metabolism. B) Isocytosine is also converted into uracil by the cytosine deaminase. C) Isoguanine is formed during oxidative stress. Cytosine deaminase catalyzes the reaction to the non-mutagenic xanthine.

In E. coli, the codA gene is part of the codBA operon. Cytosine permease is encoded by codB, while codA encodes cytosine deaminase. The only way how cytosine can be metabolized is via hydrolytic deamination, a reaction that yields ammonia and uracil and is catalyzed by cytosine deaminase (Danielsen et al., 1992). When isoguanosine and isocytosine are provided to the cell, they are converted to uracil in the wildtype. Therefore, it is necessary to knock out or delete the codA gene to enable the stable availability of isoguanosine and isocytosine.

Figure 9: Arrangement of the codBA operon.
codA is located downstream of codB and 1,3 kb in size. Both genes overlap by 11 bases. codB codes for cytosine permease, while codA codes for cytosine deaminase.

codA is 1281 bp in size, resulting in a protein composed of 427 amino acids after translation. The protein CodA is located in the cytosol and has an atomic mass of 47,5 kD.

Figure 10: Crystal structure of cytosine deaminase from Escherichia coli complexed with zinc and phosphono-cytosine.
The structure was determined by X-Ray crystallography with a resolution of 1.71 Å (Hall et al., 2011).

References