Team:IISc-Bangalore/Design

  1. T7 expression system
  2. sfGFP-SpyCatcher
  3. mCherry-SpyTag
  4. GvpC
  5. Miscellaneous

T7 expression backbone

Aggregating gas vesicles using SpyCatcher-SpyTag binding involves protein overexpression, and no system is better for protein expression than E. coli BL21 (DE3) as its lon and ompT protease deficiency yields a huge amount of protein.

BL21 (DE3) is a lysogenic strain that has the T7 RNA polymerase gene integrated into its genome under the lac operon; adding IPTG induces expression of T7 RNA polymerase, which recognizes the T7 promoter sequence. Any gene inserted downstream of the T7 promoter can thus be expressed.

Using BBa_K525998 (T7 promoter+RBS) and BBa_K731721 (T7 terminator), we have designed a T7 expression backbone that can be used to assemble and express fusion proteins easily.

Choice of BioBricks

BBa_K525998 (T7 promoter+RBS) was chosen as the strong RBS B0034 used allows for maximal protein expression. BBa_K731721 (T7 terminator) was chosen instead of the standard B0015 double terminator as its in vivo termination efficiency is greater, as characterized by BBa_K731700.

Our First Modification — BBa_K2319001 (HindIII+ATG+AgeI scar)

A HindIII restriction site, a start codon and an AgeI restriction site (BBa_K2319001) are added immediately downstream of the T7 promoter+RBS. A number of design considerations motivated the choice of these restriction sites. The HindIII site (A\AGCTT) — sandwiched between the RBS and the start codon — has a sequence very similar to the optimal sequence predicted by the sequence logo of E. coli ribosome binding sites. In fact, the HindIII sequence is closer to the optimal sequence than the typical 5'-TACTAG-3' mixed SpeI-XbaI restriction site formed by BioBrick assembly, improving the initial ribosome-mRNA binding and thereby increasing the rate of translation.


Figure 1: the amino acids glycine (left), serine (center) and threonine (right)

The AgeI site was chosen to simplify assembly of fusion proteins in this backbone: by inserting a protein coding sequence at the N-terminus of the existing protein using the HindIII and AgeI sites, a fusion protein can be formed with a benign scar. The AgeI site (A\CCGGT) is translated in-frame to Thr-Gly, amino acids commonly used in linker sequences for fusion proteins. Threonine's hydroxyl group makes it hydrophilic — allowing stabilizing interactions with the aqueous cellular environment — while glycine's small size makes the linker more flexible, allowing both protein domains to fold independently. In addition, the AgeI site is useful if the user wishes to transfer an RFC25-compatible fusion protein (Freiburg format) into our expression system.

Our Second Modification — BBa_K2319004 (TAAG)

A stop codon (TAA) and the nucleotide G are added immediately upstream of BBa_K731721 (T7 terminator). This extra stop codon (assuming the fusion protein sequence has its own) ensures that translation is halted and prevents any translational read-through. The extra nucleotide G is added for a more subtle purpose: when placed just before the T7 terminator sequence (5'-CTAGC...TTTTG-3'), it forms the NheI restriction site (G\CTAGC). This allows any fusion protein to be inserted into our expression backbone using the HindIII and NheI sites.

Using the T7 expression backbone


Figure 2: T7 expression backbone showing HindIII, AgeI and NheI sites

Expressing any protein of interest

This T7 expression backbone can be used to express any protein if its coding sequence (with a start codon) is inserted using the HindIII and NheI sites. These sites can be added to the coding sequence using PCR with primers having 5'-overhangs.

Fusing a protein domain at the N-terminus of an existing protein

By inserting the coding sequence of a protein domain (including the start codon) using the HindIII and AgeI sites into the T7 expression backbone (which already contains a protein coding sequence), an N-terminal fusion can be performed.

sfGFP-SpyCatcher

Using BBa_K1321337 (sfGFP in Freiburg format) and BBa_K1650037 (SpyCatcher), we make the fusion protein sfGFP-SpyCatcher to be expressed on the gas vesicle surface after fusion to GvpC.

Choice of BioBricks

BBa_K1321337 (sfGFP in Freiburg format) was chosen because superfolder GFP exhibits intense bright green fluorescence (making it easy to assay) and folds easily into an extremely stable structure — its half-life for unfolding at room temperature is estimated at 28 years! BBa_K1650037 (SpyCatcher) was chosen as it possesses a 6xHis tag near the N-terminus, which allows the fusion protein to be easily purified using a Ni-NTA column.

Our Modification — BBa_K2319002 (GGSGSGSS linker)

Between the sfGFP and SpyCatcher protein domains, we have inserted a short (8 aa), flexible, hydrophilic linker comprising Gly and Ser residues that allows sfGFP and SpyCatcher to fold independently of each other. Serine, like threonine, has a hydroxyl group that contributes to stability by interacting with the aqueous cellular environment. Glycine, again due to its small size, maintains flexibility of the linker.

This linker region also contains a BamHI restriction site (G\GATCC) whose in-frame translation is Gly-Ser. This BamHI site is used to link these two protein domains during assembly of the fusion construct.

mCherry-SpyTag

Using BBa_J18932 (mCherry RFP) and two well-designed oligos, we plan to produce mCherry-SpyTag.

Choice of BioBricks

BBa_J18932 (mCherry RFP) has an interesting flaw: an internal ATG near the N-terminus has a RBS-like sequence preceding it; this hidden translation start site leads to ~50% truncation of the produced mCherry protein!

Our First Modification — Improved mCherry

Using an in silico analysis of RBS strengths using an online RBS Calculator, we modified the nucleotide sequence preceding the translation start site to become a far weaker RBS while maintaining the same amino acid sequence. This inhibits translation initiation at that position by almost 75% (predicted) and so reduces the truncation of the protein.

BBa_J18932_mCherry      1 GTGAGCAAAGGCGAGGAAGATAACATG     27
                   |||...|||||.||.||||||||.|||
Improved_mCherry        1 GTGTCTAAAGGTGAAGAAGATAATATG     27

By exploiting a natural NdeI site (CA\TATG) occurring right after the modified sequence, we insert BBa_K2319006 comprising a HindIII site (for insertion into the T7 expression backbone), a start codon (ATG), and a 6xHis-tag (for easy Ni-NTA column-based protein purification), and the modified sequence preceding the hidden translation start site.

Our Second Modification — SpyTag Linker

Using another oligo, we insert BBa_K2319008 comprising a BamHI site (for insertion after the mCherry sequence), a GSGGGGS linker (for independent folding) and the SpyTag peptide sequence AHIVMVDAYKPTK. By doing this, we add SpyTag functionality to the mCherry protein, allowing it to bind to our sfGFP-SpyCatcher.

GvpC fusion

After assembling our sfGFP-SpyCatcher and mCherry-SpyTag fusion proteins, we have to express them on the gas vesicle surface: this is where our AgeI sites come in handy — by using HindIII and AgeI, we can insert the gvpC sequence into this expression backbone to make the GvpC fusion proteins. Our project involves gas vesicles extracted from Anabaena flos-aquae and Halobacterium salinarum NRC-1 — we need the gvpC genes from both these organisms, but as always, there are design considerations to keep in mind.

Our Modification — GvpC of Halobacterium salinarum NRC-1

H. salinarum is a halophile that tolerates hypersaline environments with ease — using a "salt-in" strategy, its cellular environment has evolved to be hypersaline itself! The vast majority of proteins synthesized by H. salinarum have negatively-charged acidic side chains and require 4-5 M salt concentrations to function. GvpC is one such protein. GvpC contains seven internal repeats of the domain that interlocks between GvpA "ribs" and strengthens the gas vesicle, followed by an acidic tail at the C-terminus which stabilizes the protein through effective solvation.

GvpC protein sequence from Halobacterium salinarum NRC-1, showing seven internal repeats and acidic tail

                         MSVTDKRDEMSTARDKFAESQQEFESYADEFAADITAKQDDVSDLVDAITDFQAEMTNTT
                                                |  | |||||       |     | |   |       
                                              DAFHTYGDEFAAEVDHLRADIDAQRDVIREMQ       
                                              |||  | | ||        ||      |           
                                              DAFEAYADIFATDIADKQ-DIGNLLAAIEALRTEMNSTH
                                               ||||||| || | |    ||  | |||     |     
                                              GAFEAYADDFAADVAALR-DISDLVAAIDDFQEEFIAVQ
                                               ||  || || |        |  | ||| |    | |  
                                              DAFDNYAGDFDAE-------IDQLHAAIADQHDSFDATA
                                              |||  |   |                              
                                              DAFAEYRDEFYRIEVEALLEAINDFQQDIGDFRAEFETTE
                                              |||      ||  |  |   |                  |
                                              DAFVAFARDFYGHEITAEEGAAEAEAEPVEADADVEAEAE*
                                             #VSPDEAGGESAGTEEEETEPAEVETAAPEVEGSPADTADE
                                              AEDTEAEEETEEEAPEDMVQCRVCGEYYQAITEPHLQTHD
                                              MTIQEYRDEYGEDVPLRPDDKT

* denotes the truncation site (denoted C3 truncation site)
# denotes the start of the acidic tail (note the abundance of aspartate and glutamate residues)
Adapted from DasSarma et al (2013)

For our proposed fusions, this is a problem: haloarchaeal gas vesicles "shrug off" their surface GvpC if high salt concentrations are not maintained. As a result, we removed this acidic tail that destabilizes the GvpC-gas vesicle binding at low salt concentrations, and used this truncated GvpC for our fusions.

Our Modification — (GGGGS)2 linker + AgeI

Our second modification to both gvpC sequences is an addition of a C-terminal (GGGGS)2 linker (for proper folding of the GvpC domain) and an AgeI site (for insertion into our sfGFP-SpyCatcher and mCherry-SpyTag backbones).

Miscellaneous

Codon optimization

As far as possible, we have manually codon-optimized the fusion proteins for the amino acids we are adding using our primers — His, Gly, Ser, Ala — using the codon table of E. coli and an online codon usage tool.


Figure 3: Codon usage in E. coli genes

Choice of restriction enzymes

Note that all the restriction sites we have chosen to insert are supplied by NEB as High Fidelity (HF) versions for optimal double digestions and downstream processes in NEB's CutSmart buffer.