Latest revision as of 02:20, 2 November 2017

One of these sequences encodes a toxin.

Do you know which?

ACAGTTACACGGACAACAAGGTGTTCCAGGCTTCTTTCCTCCCTTCGACGATGTATTTCCAATAGGTGTAATCGTCGGCGAAATCGTGTTCGGTTCTCCCGACGACTGCGAAGGAAACGGATTGTTCGGTGTATTCGGCGTGTACAGCAGAATTTCACCCGACAGCGGTCCTTCTTCACCAAATCCCAGCGGCGGCGG
CTGCGCGATGCTCGACGAGTCAATTCCGCTGTAGTACAGGAAGTCGTACGAGTCATTGCTATTGTATCAGCTAGAGATAATCGCGTACGCGCTCGAGCTCGAGCTATTTCGTCCTGAGCTGATGTCTCCGTCGATAATGAAAAATCCTCCGCTGATGTCCAGGTACAGACCCAGTCCGTCGTATCCTCCAATCGCTGA
GGTGCGAAGATTGACGACCCGCTCTATATTCATCATGTGTGGCCGCATGACCCGACAATTACACATTTCATTTTAAAGCTCGCGCATGCGATTGACATTGACATTACACTATATGAAATTAAGCCGTATCCGAAGCTCTGGCGTTATTATATTAAGCCGGTGCATGTGTCTGTGTATCCGCATTATTATCTCGCGGAA
AGGCACTTCCTACTTCTTAAGAAACGGCTAAGCAGCAGAGTTAAGAGCCTTAAGTCACTATCAAGCCCGCTAGTATTCAAACACAGCCACCTACTTCTACTTCTATCATGGCGGATGCTATTCAAGCGGAAGTTCAAAGTTTGCCGGCGGCTATTCAAGAGAAGCAGACCAAGACGGAAGAGCCGGCGGAAACACATG

The Next vivo Connection

Rapid Cell-free Systems

The guiding vision behind our project is to be able to provide an easy-to-purify cell-free system in order to bring the benefits of synthetic biology to the masses. To achieve this, we have provided methods to easily purify all of the necessary transcriptional and translational components. This includes essential proteins and RNAs, with a strong emphasis on transfer RNAs (tRNAs). In addition, because the Next vivo system lacks genomic DNA it cannot replicate or regenerate energy. It is essentially a simple protein production machine that translates a transcribed messenger RNA (mRNA). Because of these characteristics, Next vivo is highly amenable to genetic recoding.

For a more comprehensive look at our system, check out our design page.

NEXT VIVO DESIGN

Genetic Recoding

The conversion from and RNA message to a protein is mediated by a set of evolutionarily conserved tRNAs that make up what we know as the "Universeal Genetic Code." Genetic recoding then, is a process by which the conventionally understood relationships between codon-anticodon and tRNA-amino acid are altered. For example, the amber stop codon (UAG) can be reassigned to instead incorporate a standard or non-standard amino acid into a growing peptide. [1]

Accordingy, it follows that modifying the relationship between codon and amino acid incorporation is equivalent to the creation of a novel genetic code.

Genetic Recoding vs. Codon Reassignment

There is some discussion surrounding the use of the term “Genetic Recoding” and “Codon Reassignment.” Becuase our system falls in between two proposed definitions, we have chosen to refer to the practice as “Genetic Recoding” in the context of our project and will refer to it accordingly.

You can read more about the distinction at the link below:

The Distinction Between Recoding and Codon Reassignment

Disrupting this relationship has numerous benefits including the expanding the available codon space to allow for the incorporation of non-standard amino acids into a system and biocontainment by designing orthogonal genes that are fundamentally incompatible with ordinary organisms. Read more about Biocontainment on our Design Page

BIOCONTAINMENT

Methods for Genetic Recoding

As with all problems in biology, there is more than one way to achieve a goal. In the case of genetic recoding however, the constant involved is the manipulation of the tRNA.

Recoding can be accomplished via:

Introducing orthogonal tRNA-aaRS pairs [2]

Mutating tRNA-aaRS pairs [3]

tRNA misacylation by promiscuous RNA enzymes (Flexizymes) [4]

Other iGEM Teams are also working on codon reassignment and recoding for alternative purposes. Check out the awesome project at Bielefeld where they focus on expanding the genetic code!

Bielefeld-CeBiTec

Encrypted Sequences

Novel Genetic Codes

Again, though genetic recoding is a developing technology, the field will only grow and develop as our scientific understanding and computational ability to design RNAs and proteins improves. In the same way that we currently have access to large libraries of promoters, ribosomal binding sites, and protein coding sequences, it is not hard to imagine the construction of a library of tRNAs that can be charged with non-canonical amino acids. Whether this is achieved via flexizymes, mutant pairs, or orthogonal introduction, computational selection of internally consistent sets of tRNAs and charging machinery will make it trivially easy to design a novel genetic code, and cell-free options like the Next vivo system would make deploying a novel genetic code relatively uncomplicated and achievable for individuals that possess basic technical ability.

Despite the benefits of genetic recoding, we should be careful and consider the unintended consequences of this technology. Undermining the faithful reproduction of a protein by a universally conserved genetic codes strikes down a cornerstone used in many fields within biology- particularly within genomics and bioinformatics.

The apparent risk of this technology is that genetic recoding may allow harmful sequences to be “encrypted”, thus masking the information contained within while retaining the ability to faithfully produce the encoded protein.

When the available sample space provided by the genetic code and our understanding of the translational machinery is analyzed, it becomes apparent that recoding allows for a potential to generate numerous genetic codes. A lower-bound estimate of the total set of non-redundant genetic codes can be found ccording to the following formula:

Where n is the number of nucleic acid bases, l is the length of the codon, and a is the number of amino acids that need to be assigned a unique codon.

When this space is calculated with conventional parameters: n=4, l=3, and a=20, we estimate that there are (64 Choose 20)*20! possible combinations. Or put another way, 4.77 x 10^34 entirely novel genetic codes within which to encrypt a harmful DNA sequence.

GRecoS (Genetic Recoding Space)

That's 47 decillion, or 47 million billion billion billion genetic codes. Even at the lower bound, this is an extremely large sample space to search iteratively. Despite the size of the sample space, it remains to be seen whether or not this relationship is cryptographically strong. The program used to perform this calculation can be found on our GitHub page.

Check out GRecoS on the CODONxCHANGE GitHub repository!

Our estimation of the full genetic code sample space, including functionally redundant genetic codes is as above. However, it should be noted that this is an estimation of the upper limit of the coding space and may overestimate the true diversity of the given coding space.

Preliminary Testing

The potential for harm as a result of this technology is not to be underestimated. If recoded systems become as prevalent and easy to obtain as we expect them to be, control over where DNA containing funtional toxin sequences are distributed greatly diminishes. Accordingly, we reached out to gene synthesis companies to determine whether or not current bioinformatic technologies can detect radically recoded toxin sequences. For most of our analysis, we selected alpha conotoxin because it is short, lethal, and previously described as a potential threat in the paper Conotoxins: Potential Weapons from the Sea in the Journal of Bioterrorism and Biodefense.

Detecting Encrypted Sequences

A total of five companies from the IGSC (n=5) of the 11 possible agreed to test our sequences. Two control sequences were sent along with the encrypted sequences: unencrypted GFP and unencrypted conotoxin. The remaining 10 sequences consisted of equal numbers of encrypted GFP and conotoxin.

	Unencrypted Sequences (n=2)		Encrypted Sequences (n=10)
Sequence Identity	Green Fluorescent Protein (n=1)	Conotoxin (n=1)	Green Fluorescent Protein (n=5)	Conotoxin (n=5)
Company 1	100% (±0%)	100% (±0%)	0% (±0%)	0% (±0%)
Company 2	100% (±0%)	100% (±0%)	0% (±0%)	0% (±0%)
Company 3	100% (±0%)	100% (±0%)	0% (±0%)	0% (±0%)
Company 4	100% (±0%)	100% (±0%)	0% (±0%)	0% (±0%)
Company 5	100% (±0%)	100% (±0%)	0% (±0%)	0% (±0%)

Emails were sent to all current members of the IGSC asking them to screen twelve sequences for us. Of the five companies that were willing to help us, all of them correctly identified the un-encrypted toxic proteins.

However, no organization could correctly identify the encrypted toxins.

While this result is unsettling, it is not unexpected. All genomic science relies heavily on the assumption that there is a known relationship between DNA and protein. Though the technology for large-scale recoding does not presently exist, it is prudent to be prepared instead of ignoring a potentially dangerous problem.

This experiment was repeated using each variation of the BLAST software hosted on the NCBI website. Again, the software could not identify any of the completely recoded sequences. If you are curious about the results, the sequences that we sent are available for you to try as well.

Get Data

Since the initial testing, we have maintained correspondence with individuals at these companies and are looking forward to working closely with them to ensure that DNA synthesis remains a safe and secure practice. We would also like to thank the companies and the wonderful individuals that we had the chance to interact with for their tremendous assistance in identifying and dealing with this problem before it becomes a pressing security issue. Synthetic biologists need DNA, and DNA synthesis needs new bioinformatic screening tools.

Beating BLAST

Basic Local Alignment Search Tool

To the best of our knowledge, the only tool maintaining the safety and security of DNA synthesis is BLAST. We have shown earlier that complete genetic recoding completely nullifies the ability of BLAST to detect a sequence, but it remains to be seen how much recoding BLAST can tolerate before a sequence becomes totally unrecognizable. BLAST works by breaking a query sequence into small ‘words’ of a specified length. Words that exactly match a sequence within the database are ‘high-scoring pairs’ and contribute to a positive scoring alignment. In essence, the more exact word matches in a query sequence to a database sequence, the better the alignment score will be.

However, it is not intuitively obvious what degree of genetic recoding is required to evade detection by BLAST. To test this, we developed a software tool within the CODONxCHANGE suite, written in Python 2.7 to test the integrity of the BLAST platform against sequences that have been partially encrypted via a set number of recoding events. This tool can also be used to prepare genes for implementation in an orthogonal cell-free system for biocontainment purposes.

SeCReT (Sequential Codon Reassignment Tool)

The tool is designed to take a nucleic acid coding sequence or protein sequence as an input, and return an ‘encrypted’ version of the sequence based upon translation via a novel genetic code. It achieves this by translating a DNA sequence into a protein sequence, and then sequentially assigning a random codon to each unique amino acid required within the protein. The resultant sequence is returned in a newly encrypted state.

Check out SeCReT on the CODONxCHANGE GitHub repository!

How Robust is BLAST?

The sequence of an alpha conotoxin was randomly encrypted 29 times for each of the possible 17 recoding events. The sequences were then analyzed via BLAST using a locally curated database containing only the original sequence. The percent identity was averaged with non-hits counting as a 0 value. The percent identity of each sequence and the number of matches returned per group was then plotted as a function of the number of recoding events. The orange trace represents the percent identity of the query to the subject sequence, and the blue trace represents the proprotion of recoded events that BLAST found a match for.

The results of this analysis suggest that the most effective number of switches to BEAT Blast is 4. This level of recording is generally sufficient to bring the similarity of a protein to below 80%, which is the consesus value reported in the HHS guidelines and adhered to by the majority of gene synthesis companies.

After 9 recoding events, a number of sequences become able to evade BLAST entirely.

This suggests that rational recoding to disrupt the generation of high scoring words in BLAST may improve the cryptographic ability of genetic recoding even further. Furthermore, having more than 12 recoding events significantly improves the chances that the encrypted sequence is not detected by BLAST at all. Again, the raw data generated from the experiments are available for public analysis and more results can be found on the GitHub page.

Get Data

Rational Evasion of BLAST

Curiously, the number of recoding events does not necessarily always indicate whether a read is detected by BLAST or not. A small section of data from one SeCReT run is presented to indicate that in some cases, recoding can be especially disruptive to determining the identity of a protein product.

Get Data

Building Solutions

Changes on the Horizon

We have shown that the power of the BLAST program to identify proteins arising from genetically recoded sequences is extremely limited, there are initiatives to develop new biosecurity tools. For example, Intelligence Advanced Research Projects Activity (IARPA), the cousin of DARPA, has a program called Functional Genomic and Computational Assessment of Threats (Fun GCAT) which aims to catalyze the development of tools to improve DNA screening capabilites. Several of the synthesis companies that we spoke to are involved in this program. It is our hope that we see more mobilization on this front to ensure that the benefits that arise from Synthetic Biology continue to greatly outweigh the risks.

Check out IARPA!

Check out FunGCAT!

DeToxIT (Decryption and Toxin Identification Tool)

Here we throw our own hat into the ring with a simple tool to decrypt DNA sequences that have been radically recoded and compare them to a database of known select agents.

Test it out with a trial data set found on our GitHub page!

Get Data

Check out DeToxIT in the CODONxCHANGE GitHub repository!

Summary

Biosecurity Analysis

We determined that current biosecurity protocols and tools (BLAST) are ineffective at identiying proteins that could arise from recoded sequences.

We also determined that the minimum degree of recoding required to evade BLAST is approximately 9 recoding events.

We also determined that 4 recoding events are sufficient to reduce the percent identity of a toxin to 80% of its closest reference sequence.

Lastly, we have built relationships with members of the IGSC and are excited to continue working to keep DNA synthesis safe.

CODONxCHANGE Software Suite

GRecoS (Genetic Recoding Space)

SeCReT (Sequential Codon Reassignment Tool)

DeToxIT (Decryption and Toxin Identification Tool)

Looking for the Source Code?

While most of our software can be found freely availble on our GitHub repository, you will notice that some of the tools do not have their source distributed. For the interim, we have been advised not to publish the source code for the fully functional encryption software.

Until we know the ethical and legal standing surrounding the distribution of the software, it will be available via direct contact only.

Instead, a feature-reduced version of the software is available as a pre-compiled binary to get a taste of how it works, and a tangible idea for just how effective recoding is at hiding the identity of a sequence. If you would like to get access to the source code, you can get in touch with us by following the link and verifying that you are affiliated with an academic institution or other trusted party. We love how safe and accessible Synthetic Biology is, and we are excited to continue developing tools to keep it that way. Thank you for your understanding.

Get Access

References

[1] Young, T. S. and P. G. Schultz, Beyond the Canonical 20 Amino Acids: Expanding the Genetic Lexicon. Journal of Biological Chemistry, 2010. 285: 11039-11044.

[2] Javahishvili, T., A. Manibusan, S. Srinagesh, D. Lee, S. Ensari, M. Shimazu, and P. G. Schultz, Role of tRNA Orthogonality in an Expanded Genetic Code. ACS Chemical Biology, 2014. 9(4): 874-879.

[3]Chatterjee, A., H. Xiao, and P. G. Schultz, Evolution of multiple, mutually orthogonal prolyl-tRNA synthetase/tRNA pairs for unnatural amino acid mutagenesis in Escherichia coli. Proceedings of the National Academy of Sciences of the United States of America, 2012. 109(37): 14841-14846.

[4] Ohuchi, M., H. Murakami, and H. Suga, The flexizyme system: a highly flexible tRNA aminoacylation tool for the translation apparatus. Current Opinion in Chemical Biology, 2007. 11(5): 537-542.

@@ Line 70: / Line 70: @@
 <body>
 <img src="https://static.igem.org/mediawiki/2017/archive/0/05/20171031004733%21Banner_HPbiosecurity.png" class="bannerImg" />
+<br><br>
+<center>
+<form>
+    <input class="tealButton" type="button" value="The Next vivo Connection" onclick="window.location.href='#anchor1'" />
+</form>
+<form>
+    <input class="tealButton" type="button" value="Encrypted Sequences" onclick="window.location.href='#anchor2'" />
+</form>
+<form>
+    <input class="tealButton" type="button" value="Beating Blast" onclick="window.location.href='#anchor3'" />
+</form>
+<form>
+    <input class="tealButton" type="button" value="Building Solutions" onclick="window.location.href='#anchor4'" />
+</form>
+<form>
+    <input class="tealButton" type="button" value="Summary" onclick="window.location.href='#anchor5'" />
+</form>
+</center>
    <br><br>
@@ Line 96: / Line 118: @@
    </div>
 </div>
+<div id="anchor1"></div><br><br>
 <div class="container">
    <div class="row">
      <div class="col s12">
-       <h1 class="segmentHeader"><span style="font-weight:normal;">The Next vivo Connection</h1>
+       <h2 class="segmentHeader">The N<i>ex</i>t <i>vivo</i> Connection</h2>
      </div>
    </div>
    <div class="row">
      <div class="col s12 m6">
-       <h5>Rapid Cell-Free Systems</h5>
+       <h5>Rapid Cell-free Systems</h5>
        <p class="text12">
-           In essence, our project is a rapidly purifiable cell-free system to bring the benefits of synthetic biology to as many people as possible. To do so, we provide methods to easily purify all of the necessary transcriptional and translational components. This includes proteins and RNAs such as functional tRNAs. Furthermore, the N<i>ex</i>t <i>vivo</i> system lacks genomic DNA and is instead a minimal simple DNA input and protein output system. Because of these characteristics, N<i>ex</i>t <i>vivo</i> is highly amenable to genetic recoding.
+           The guiding vision behind our project is to be able to provide an easy-to-purify cell-free system in order to bring the benefits of synthetic biology to the masses. To achieve this, we have provided methods to easily purify all of the necessary transcriptional and translational components. This includes essential proteins and RNAs, with a strong emphasis on transfer RNAs (tRNAs). In addition, because the N<i>ex</i>t <i>vivo</i> system lacks genomic DNA it cannot replicate or regenerate energy. It is essentially a simple protein production machine that translates a transcribed messenger RNA (mRNA). Because of these characteristics, N<i>ex</i>t <i>vivo</i> is highly amenable to genetic recoding.
          </p>
-      <p class="text12">
-        For a more comprehensive look at the system, check out our design page.
-        </p>
-        <div align="center">
-          <a class="waves-effect waves-light btn cyan darken-3" href="https://2017.igem.org/Team:Lethbridge/Design">N<i>EX</i>T <i>VIVO</i> DESIGN</a>
-        </div>
        </div>
        <div class="col s12 m6">
@@ Line 121: / Line 137: @@
          <img class="responsive-img" style="max-height:300px"src="https://static.igem.org/mediawiki/2017/d/d6/T--Lethbridge--cellfreepic.png" />
          </div>
+        <p class="text12">
+          For a more comprehensive look at our system, check out our design page.
+          </p>
+          <div align="center">
+            <a class="waves-effect waves-light btn cyan darken-3" href="https://2017.igem.org/Team:Lethbridge/Design">NEXT VIVO DESIGN</a>
+          </div>
        </div>
      </div>
@@ Line 127: / Line 149: @@
          <h5>Genetic Recoding</h5>
          <p class="text12">
-           Genetic recoding is a process by which the conventional relationships between codon-anticodon and tRNA-amino acid are altered. For instance, the amber stop codon (UAG) can be reassigned to instead incorporate a standard or non-standard amino acid into a growing peptide. [1]</p>
+           The conversion from and RNA message to a protein is mediated by a set of evolutionarily conserved tRNAs that make up what we know as the "Universeal Genetic Code." Genetic recoding then, is a process by which the conventionally understood relationships between codon-anticodon and tRNA-amino acid are altered. For example, the amber stop codon (UAG) can be reassigned to instead incorporate a standard or non-standard amino acid into a growing peptide. [1]</p>
            <blockquote class="grey lighten-2">
-             Modifying the relationship between codon and amino acid incorporation is equivalent to the creation of a novel genetic code.
+             Accordingy, it follows that modifying the relationship between codon and amino acid incorporation is equivalent to the creation of a novel genetic code.
            </blockquote>
-          <p class="text12">This has numerous benefits including the incorporation of non-standard amino acids, biocontainment, and protein engineering.
-        </p>
-        <div align="center">
-          <a class="waves-effect waves-light btn cyan darken-3" href="https://2017.igem.org/Team:Lethbridge/HP/Gold_Integrated">BIOCONTAINMENT</a>
-        </div>
        </div>
        <div class="col s12 m6">
@@ Line 141: / Line 158: @@
            <div class="card-content white-text">
              <span class="card-title">Genetic Recoding vs. Codon Reassignment</span>
-               <p>Though there is some discussion surrounding the use of the term “Genetic Recoding” and “Codon Reassignment.” Becuase our system falls in between two proposed definitions, we have chosen to refer to the practice as “Genetic Recoding” in the context of our project and will refer to it accordingly.</p>
+               <p class="text12">There is some discussion surrounding the use of the term “Genetic Recoding” and “Codon Reassignment.” Becuase our system falls in between two proposed definitions, we have chosen to refer to the practice as “Genetic Recoding” in the context of our project and will refer to it accordingly.</p>
+              <p class="text12">
+                You can read more about the distinction at the link below:
+              </p>
              </div>
              <div class="card-action">
@@ Line 148: / Line 168: @@
          </div>
        </div>
-  </div>
+    </div>
+    <div class="row">
+      <div class="col s12 m6">
+        <p class="text12">
+          Disrupting this relationship has numerous benefits including the expanding the available codon space to allow for the incorporation of non-standard amino acids into a system and biocontainment by designing orthogonal genes that are fundamentally incompatible with ordinary organisms. Read more about Biocontainment on our Design Page
+        </p>
+        <div align="center">
+          <a class="waves-effect waves-light btn cyan darken-3" href="https://2017.igem.org/Team:Lethbridge/Design#anchor8">BIOCONTAINMENT</a>
+        </div>
+      </div>
+      <div class="col s12 m6">
+        <h5>Methods for Genetic Recoding</h5>
+        <p class="text12">
+          As with all problems in biology, there is more than one way to achieve a goal. In the case of genetic recoding however, the constant involved is the manipulation of the tRNA.
+        </p>
+        <p class="text12">
+          Recoding can be accomplished via:
+        </p>
+          <p class="text12">Introducing orthogonal tRNA-aaRS pairs [2]</p>
+          <p class="text12">Mutating tRNA-aaRS pairs [3]</p>
+          <p class="text12">tRNA misacylation by promiscuous RNA enzymes (Flexizymes) [4]</p>
+          <br />
+      </div>
    <div class="row">
-     <div class="col s12 m6 l4">
+     <div class="col s12 m8">
        <img class="responsive-img" src="https://static.igem.org/mediawiki/2017/1/1d/T--Lethbridge--trnaswitch1met.png" />
      </div>
-     <div class="col s12 m6 l4">
+     <div class="col s12 m4">
-      <h5>
-        Recoding can be accomplished via:
-      </h5>
-        <p class="text12">Introducing orthogonal tRNA-aaRS pairs [2]</p>
-        <p class="text12">Mutating tRNA-aaRS pairs [3]</p>
-        <p class="text12">tRNA misacylation by promiscuous RNA enzymes (Flexizymes) [4]</p>
-        <br />
-    </div>
-    <div class="col s12 m6 l4">
        <div class="card pink darken-3">
          <div class="card-content white-text">
@@ Line 168: / Line 202: @@
              <img class="responsive-img" style="max-width:75%" src="https://static.igem.org/mediawiki/2017/6/6e/T--Bielefeld-CeBiTec--expand_monochrome_white_2_collapse.svg" />
            </div>
-             <p>Other iGEM Teams are also working on codon reassignment for alternative purposes. Check out the awesome project at Bielefeld where they focus on expanding the genetic code!</p>
+             <p class="text12">Other iGEM Teams are also working on codon reassignment and recoding for alternative purposes. Check out the awesome project at Bielefeld where they focus on expanding the genetic code!</p>
            </div>
            <div class="card-action">
@@ Line 177: / Line 211: @@
    </div>
 </div>
+</div>
+<div id="anchor2"></div><br><br>
 <div class="container">
    <div class="row">
      <div class="col s12">
-       <h1 class="segmentHeader"><span style="font-weight:normal;">Encrypted Sequences</h1>
+       <h2 class="segmentHeader">Encrypted Sequences</h2>
      </div>
    </div>
@@ Line 188: / Line 223: @@
        <h5>Novel Genetic Codes</h5>
        <p class="text12">
-         Though this is a developing field, genetic recoding will only develop as scientific understanding and computational design improve. It is not hard to imagine the construction of a library of tRNAs that can be charged with non-canonical amino acids. Whether this is achieved via flexizymes or mutant pairs, selecting internally consistent sets of tRNAs and charging machinery will make it trivially easy to design a novel genetic code, and the N<i>ex</i>t <i>vivo</i> system would make it readily obtainable.
+         Again, though genetic recoding is a developing technology, the field will only grow and develop as our scientific understanding and computational ability to design RNAs and proteins improves. In the same way that we currently have access to large libraries of promoters, ribosomal binding sites, and protein coding sequences, it is not hard to imagine the construction of a library of tRNAs that can be charged with non-canonical amino acids. Whether this is achieved via flexizymes, mutant pairs, or orthogonal introduction, computational selection of internally consistent sets of tRNAs and charging machinery will make it trivially easy to design a novel genetic code, and cell-free options like the N<i>ex</i>t <i>vivo</i> system would make deploying a novel genetic code relatively uncomplicated and achievable for individuals that possess basic technical ability.
+      </p>
+      <p class="text12">
+        Despite the benefits of genetic recoding, we should be careful and consider the unintended consequences of this technology. Undermining the faithful reproduction of a protein by a universally conserved genetic codes strikes down a cornerstone used in many fields within biology- particularly within genomics and bioinformatics.
        </p>
        <blockquote class="grey lighten-2">
@@ Line 194: / Line 232: @@
        </blockquote>
        <p class="text12">
-         When the available sample space provided by the genetic code is analyzed, recoding allows for a potential to generate numerous genetic codes according to the following formula:</br>
+         When the available sample space provided by the genetic code and our understanding of the translational machinery is analyzed, it becomes apparent that recoding allows for a potential to generate numerous genetic codes. A lower-bound estimate of the total set of non-redundant genetic codes can be found ccording to the following formula:</br>
        </p>
        <div align="center">
-         <img class="responsive-img" style="max-width:300px" src="https://static.igem.org/mediawiki/2017/2/2a/T--Lethbridge--codonspace2.png" />
+         <img class="responsive-img" style="max-width:200px" src="https://static.igem.org/mediawiki/2017/3/3b/T--Lethbridge--NRsamplespace.png" />
        </div>
        <p class="text12">
@@ Line 203: / Line 241: @@
        </p>
          <blockquote class="grey lighten-2">
-           When all codons are reassigned, a simplistic estimation (64!/44!) suggests that there are <b>4.77 x 10^34</b> possible combinations available.
+           When this space is calculated with conventional parameters: n=4, l=3, and a=20, we estimate that there are (64 Choose 20)*20! possible combinations. Or put another way, <b>4.77 x 10^34</b> entirely novel genetic codes within which to encrypt a harmful DNA sequence.
          </blockquote>
+      </div>
+    </div>
+    <div class="row">
+      <div class="col s12 m8">
+        <h5>GRecoS (Genetic Recoding Space)</h5>
        <p class="text12">
-         That's 47 decillion, or 47 million billion billion billion genetic codes. This is an extremely large sample space to search combinatorially. Despite the size of the sample space, it remains to be seen whether or not this relationship is cryptographically strong. The program used to perform this calculation can be found on our GitHub page.
+         That's 47 decillion, or 47 million billion billion billion genetic codes. Even at the lower bound, this is an extremely large sample space to search iteratively. Despite the size of the sample space, it remains to be seen whether or not this relationship is cryptographically strong. The program used to perform this calculation can be found on our GitHub page.
        </p>
-       <div align="center">
+       </div>
-        <a class="waves-effect waves-light btn orange darken-3" href="https://github.com/chrisaac/CODONxCHANGE">Get Tool</a>
+      <div class="col s12 m4">
+        <div class="card orange darken-3">
+          <div class="card-content white-text">
+            <div align="center">
+              <a href="https://github.com/chrisaac/CODONxCHANGE"><img class="responsive-img" style="max-width:200px" href="https://github.com/chrisaac/CODONxCHANGE" src="https://static.igem.org/mediawiki/2017/7/70/T--Lethbridge--GrecoS_logo.png"/></a>
+            </div>
+            <a href="https://github.com/chrisaac/CODONxCHANGE" class="align-center">Check out GRecoS on the <b>CODONxCHANGE</b> GitHub repository!</a>
+          </div>
+        </div>
        </div>
      </div>
      <div class="row">
-       <div class="col s12 m4">
+       <div class="col s12">
-         <h5>Preliminary Testing</h5>
+         <div align="center">
+          <img class="responsive-img" style="max-width:400px" src="https://static.igem.org/mediawiki/2017/6/69/T--Lethbridge--FullEquationV1.png" />
+        </div>
          <p class="text12">
-           The potential for harm as a result of this technology is not to be underestimated. If recoded systems become as prevalent and easy to obtain as we expect them to be, control over where toxin sequences are sent greatly diminishes. Accordingly, we reached out to gene synthesis companies to determine whether or not current bioinformatic technologies can detect radically recoded toxin sequences.
+           <br />Our estimation of the full genetic code sample space, including functionally redundant genetic codes is as above. However, it should be noted that this is an estimation of the upper limit of the coding space and may overestimate the true diversity of the given coding space.
          </p>
+      </div>
+    </div>
+    <div class="row">
+      <div class="col s12">
+        <h5>Preliminary Testing</h5>
          <p class="text12">
-           Emails were sent to all current members of the IGSC asking them to screen twelve sequences for us. Of the five companies that were willing to help us, all of them correctly identified the un-encrypted toxic proteins. However, no organization could correctly identify the encrypted toxins.
+           The potential for harm as a result of this technology is not to be underestimated. If recoded systems become as prevalent and easy to obtain as we expect them to be, control over where DNA containing funtional toxin sequences are distributed greatly diminishes. Accordingly, we reached out to gene synthesis companies to determine whether or not current bioinformatic technologies can detect radically recoded toxin sequences. For most of our analysis, we selected alpha conotoxin because it is short, lethal, and previously described as a potential threat in the paper <a href="https://www.omicsonline.org/conotoxins-potential-weapons-from-the-sea-2157-2526.1000120.pdf" id="pageLink">Conotoxins: Potential Weapons from the Sea</a> in the Journal of Bioterrorism and Biodefense.
          </p>
        </div>
-       <div class="col s12 m8">
+    </div>
+    <div class="row">
+       <div class="col s12">
          <h5>Detecting Encrypted Sequences</h5>
          <p class="text12">
-           A total of five companies from the IGSC (n=5) of the 11 possible agreed to test our sequences. Two control sequences were sent along with the encrypted sequences: unencrypted GFP and unencrypted conotoxin. The remaining 10 sequences consisted of equal numbers of encrypted GFP and conotoxin. Because BLAST relies on the Universal Genetic Code, no company was able to detect the encrypted sequences.
+           A total of five companies from the IGSC (n=5) of the 11 possible agreed to test our sequences. Two control sequences were sent along with the encrypted sequences: unencrypted GFP and unencrypted conotoxin. The remaining 10 sequences consisted of equal numbers of encrypted GFP and conotoxin.
          </p>
          <table class="">
@@ Line 245: / Line 307: @@
            </tr>
            <tr>
-             <td class="text12"><b>Identification Rate</b></td>
+             <td class="text12"><b>Company 1</b></td>
+            <td class="text12">100% (&#177;0%)</td>
+            <td class="text12">100% (&#177;0%)</td>
+            <td class="text12">0% (&#177;0%)</td>
+            <td class="text12">0% (&#177;0%)</td>
+          </tr>
+          <tr>
+            <td class="text12"><b>Company 2</b></td>
+            <td class="text12">100% (&#177;0%)</td>
+            <td class="text12">100% (&#177;0%)</td>
+            <td class="text12">0% (&#177;0%)</td>
+            <td class="text12">0% (&#177;0%)</td>
+          </tr>
+          <tr>
+            <td class="text12"><b>Company 3</b></td>
+            <td class="text12">100% (&#177;0%)</td>
+            <td class="text12">100% (&#177;0%)</td>
+            <td class="text12">0% (&#177;0%)</td>
+            <td class="text12">0% (&#177;0%)</td>
+          </tr>
+          <tr>
+            <td class="text12"><b>Company 4</b></td>
+            <td class="text12">100% (&#177;0%)</td>
+            <td class="text12">100% (&#177;0%)</td>
+            <td class="text12">0% (&#177;0%)</td>
+            <td class="text12">0% (&#177;0%)</td>
+          </tr>
+          <tr>
+            <td class="text12"><b>Company 5</b></td>
              <td class="text12">100% (&#177;0%)</td>
              <td class="text12">100% (&#177;0%)</td>
@@ Line 253: / Line 343: @@
          </tbody>
        </table>
-      <p class="text12">
-        While this result is unsettling, it is not unexpected. All of genomic science relies heavily on the assumption that there is a known relationship between DNA and protein. Though the technology for large-scale recoding does not presently exist, it is prudent to be prepared instead of ignoring a potentially dangerous problem.
-      </p>
        </div>
      </div>
      <div class="row">
        <div class="col s12">
          <p class="text12">
-           This experiment was repeated using each variation of the <a href="https://blast.ncbi.nlm.nih.gov/Blast.cgi" id="pageLink">BLAST</a> software hosted on the NCBI website.[5] Again, the software could not identify any of the completely recoded sequences. If you are curious about the results, the sequences that we sent are available for you to try as well.</p>
+          Emails were sent to all current members of the IGSC asking them to screen twelve sequences for us. Of the five companies that were willing to help us, all of them correctly identified the un-encrypted toxic proteins. </p>
+          <blockquote class="grey lighten-2">
+            However, no organization could correctly identify the encrypted toxins.
+          </blockquote>
+          <p class="text12">
+            While this result is unsettling, it is not unexpected. All genomic science relies heavily on the assumption that there is a known relationship between DNA and protein. Though the technology for large-scale recoding does not presently exist, it is prudent to be prepared instead of ignoring a potentially dangerous problem.</p>
+      </div>
+    </div>
+    <div class="row">
+      <div class="col s12">
+        <p class="text12">
+           This experiment was repeated using each variation of the <a href="https://blast.ncbi.nlm.nih.gov/Blast.cgi" id="pageLink">BLAST</a> software hosted on the NCBI website. Again, the software could not identify any of the completely recoded sequences. If you are curious about the results, the sequences that we sent are available for you to try as well.</p>
            <div align="center">
              <a class="waves-effect waves-light orange darken-3 btn" href="https://static.igem.org/mediawiki/2017/e/e5/T--Lethbridge--seqtesting.txt">Get Data</a>
            </div>
          <p class="text12">
-           Following the initial testing, we have maintained correspondence with individuals at these companies and are looking forward to working closely with them to ensure that DNA synthesis remains a safe and secure practice. We would also like to thank them for their tremendous assistance in identifying and dealing with this problem before it becomes a pressing security issue. Synthetic biologists need DNA, and DNA synthesis needs new bioinformatic screening tools.
+           Since the initial testing, we have maintained correspondence with individuals at these companies and are looking forward to working closely with them to ensure that DNA synthesis remains a safe and secure practice. We would also like to thank the companies and the wonderful individuals that we had the chance to interact with for their tremendous assistance in identifying and dealing with this problem before it becomes a pressing security issue. Synthetic biologists need DNA, and DNA synthesis needs new bioinformatic screening tools.
          </p>
        </div>
      </div>
    </div>
-</div>
+<!-- </div>-->
+<div id="anchor3"></div><br><br>
 <div class="container">
    <div class="row">
      <div class="col s12">
-       <h1 class="segmentHeader"><span style="font-weight:normal;">Beating BLAST</h1>
+       <h2 class="segmentHeader">Beating BLAST</h2>
      </div>
    </div>
    <div class="row">
      <div class="col s12 m6">
        <h5>Basic Local Alignment Search Tool</h5>
        <p class="text12">
-         Currently the only tool maintaining the safety and security of DNA synthesis is BLAST. We have shown earlier that recoding completely nullifies the ability of BLAST to detect a sequence, but it remains to be seen how much recoding of the sequence BLAST can tolerate before a sequence becomes totally unmatchable to a reference. BLAST works by breaking a query sequence into small ‘words’ of a specified length. Words that exactly match a sequence within the database are ‘high-scoring pairs’ and contribute to a positive scoring alignment. In essence, the more exact word matches in a query sequence to a database sequence, the better the alignment score will be.
+         To the best of our knowledge, the only tool maintaining the safety and security of DNA synthesis is BLAST. We have shown earlier that complete genetic recoding completely nullifies the ability of BLAST to detect a sequence, but it remains to be seen how much recoding BLAST can tolerate before a sequence becomes totally unrecognizable. BLAST works by breaking a query sequence into small ‘words’ of a specified length. Words that exactly match a sequence within the database are ‘high-scoring pairs’ and contribute to a positive scoring alignment. In essence, the more exact word matches in a query sequence to a database sequence, the better the alignment score will be.
        </p>
      </div>
@@ Line 290: / Line 391: @@
      </div>
    </div>
    <div class="row">
      <div class="col s12">
        <p class="text12">
-         However, it is not intuitively obvious what degree of genetic recoding is required to evade detection via BLAST. To test this, we developed a software tool within the <b>CODONxCHANGE</b> suite, written in Python 2.7 to test the integrity of the BLAST platform against sequences that have been partially encrypted with a set number of recoding events. This tool can also be used to prepare genes for implementation in an orthogonal cell-free system for biocontainment purposes.
+         However, it is not intuitively obvious what degree of genetic recoding is required to evade detection by BLAST. To test this, we developed a software tool within the <b>CODONxCHANGE</b> suite, written in <a href="https://www.python.org/#python-network" id="pageLink">Python 2.7</a> to test the integrity of the BLAST platform against sequences that have been partially encrypted via a set number of recoding events. This tool can also be used to prepare genes for implementation in an orthogonal cell-free system for biocontainment purposes.
        </p>
      </div>
    </div>
    <div class="row">
-     <div class="col s12 m6">
+     <div class="col s12 m8">
        <h5>SeCReT (Sequential Codon Reassignment Tool)</h5>
        <p class="text12">
-         The tool is designed to take a nucleic acid coding sequence or protein sequence as an input, and return an ‘encrypted’ version of the sequence. It achieves this by translating a DNA sequence into a protein sequence, and then sequentially assigning a random codon to each unique amino acid required within the protein. The resultant sequence is returned in a newly encrypted state.
+         The tool is designed to take a nucleic acid coding sequence or protein sequence as an input, and return an ‘encrypted’ version of the sequence based upon translation via a novel genetic code. It achieves this by translating a DNA sequence into a protein sequence, and then sequentially assigning a random codon to each unique amino acid required within the protein. The resultant sequence is returned in a newly encrypted state.
        </p>
      </div>
-     <div class="col s12 m6">
+     <div class="col s12 m4">
        <div class="card orange darken-3">
          <div class="card-content white-text">
            <div align="center">
-             <img class="responsive-img" style="max-width:100px" src="https://diversity.github.com/assets/svg/mark-github.svg"/>
+             <a href="https://github.com/chrisaac/CODONxCHANGE"><img class="responsive-img" style="max-width:100px" href="https://github.com/chrisaac/CODONxCHANGE" src="https://static.igem.org/mediawiki/2017/d/d7/T--Lethbridge--SeCReT_logo.png"/></a>
            </div>
            <a href="https://github.com/chrisaac/CODONxCHANGE" class="align-center">Check out SeCReT on the <b>CODONxCHANGE</b> GitHub repository!</a>
@@ Line 315: / Line 418: @@
      </div>
    </div>
    <div class="row">
      <div class="col s12">
       <h5>How Robust is BLAST?</h5>
       <p class="text12">
-        The sequence of Cholear Toxin A was randomly encrypted N times for each N possible recoding events. The sequences were then analyzed via BLAST and the percent identity of each sequence was plotted as a function of the number of recoding events.
+        The sequence of an alpha conotoxin was randomly encrypted 29 times for each of the possible 17 recoding events. The sequences were then analyzed via BLAST using a locally curated database containing only the original sequence. The percent identity was averaged with non-hits counting as a 0 value. The percent identity of each sequence and the number of matches returned per group was then plotted as a function of the number of recoding events. The orange trace represents the percent identity of the query to the subject sequence, and the blue trace represents the proprotion of recoded events that BLAST found a match for.
       </p>
-      <table class="">
+     <div align="center">
-       <thead>
+       <img class="responsive-img" style="max-width:500px" src="https://static.igem.org/mediawiki/2017/2/24/T--Lethbridge--InternalBLASTresults1.png"/>
-        <tr>
+     </div>
-            <th>Name</th>
-            <th>Item Name</th>
-            <th>Item Price</th>
-        </tr>
-      </thead>
-      <tbody>
-        <tr>
-          <td>Alvin</td>
-          <td>Eclair</td>
-          <td>$0.87</td>
-        </tr>
-        <tr>
-          <td>Alan</td>
-          <td>Jellybean</td>
-          <td>$3.76</td>
-        </tr>
-        <tr>
-          <td>Jonathan</td>
-          <td>Lollipop</td>
-          <td>$7.00</td>
-        </tr>
-      </tbody>
-     </table>
    </div>
 </div>
@@ Line 353: / Line 433: @@
    <div class="col s12">
      <p class="text12">
-       The results of this analysis suggests that the most effective number of switches to BEAT Blast is X. More results can be found on the GitHub page. Again, the raw data generated from the experiments are available for public analysis.
+       The results of this analysis suggest that the most effective number of switches to BEAT Blast is 4. This level of recording is generally sufficient to bring the similarity of a protein to below 80%, which is the consesus value reported in the HHS guidelines and adhered to by the majority of gene synthesis companies.</p>
+      <blockquote class="grey lighten-2">After 9 recoding events, a number of sequences become able to evade BLAST entirely.</blockquote>
+      <p class="text12">
+        This suggests that rational recoding to disrupt the generation of high scoring words in BLAST may improve the cryptographic ability of genetic recoding even further. Furthermore, having more than 12 recoding events significantly improves the chances that the encrypted sequence is not detected by BLAST at all. Again, the raw data generated from the experiments are available for public analysis and more results can be found on the GitHub page.
      </p>
      <div align="center">
@@ Line 360: / Line 443: @@
    </div>
 </div>
+  <div class="row">
+    <div class="col s12 m6">
+      <h5>Rational Evasion of BLAST</h5>
+        <p class="text12">
+          Curiously, the number of recoding events does not necessarily always indicate whether a read is detected by BLAST or not. A small section of data from one SeCReT run is presented to indicate that in some cases, recoding can be especially disruptive to determining the identity of a protein product.
+        </p>
+        <div align="center">
+          <a class="waves-effect waves-light btn orange darken-3" href="https://github.com/chrisaac/CODONxCHANGE">Get Data</a>
+        </div>
+    </div>
+    <div class="col s12 m6">
+      <div align="center">
+      <img src="https://static.igem.org/mediawiki/2017/6/6f/T--Lethbridge--NCBI_Rational.png" class="responsive-img" />
+      </div>
+    </div>
+  </div>
 </div><!--END CONTAINER -->
+<div id="anchor4"></div><br><br>
 <div class="container">
    <div class="row">
      <div class="col s12">
-       <h1 class="segmentHeader"><span style="font-weight:normal;">Building Solutions</h1>
+       <h2 class="segmentHeader">Building Solutions</h2>
      </div>
    </div>
@@ Line 372: / Line 471: @@
        <h5>Changes on the Horizon</h5>
        <p class="text12">
-         Though the power of the BLAST program to detect genetically recoded sequences has been shown to be incredibly limited, there are initiatives to develop new biosecurity tools. Intelligence Advanced Research Projects Activity (IARPA), the cousin of DARPA, has a program called Functional Genomic and Computational Assessment of Threats (Fun GCAT) which aims to catalyze the development of tools to improve DNA screening capabilites. Several of the synthesis companies that we spoke to are involved in this program.
+         We have shown that the power of the BLAST program to identify proteins arising from genetically recoded sequences is extremely limited, there are initiatives to develop new biosecurity tools. For example, Intelligence Advanced Research Projects Activity (IARPA), the cousin of DARPA, has a program called Functional Genomic and Computational Assessment of Threats (Fun GCAT) which aims to catalyze the development of tools to improve DNA screening capabilites. Several of the synthesis companies that we spoke to are involved in this program. It is our hope that we see more mobilization on this front to ensure that the benefits that arise from Synthetic Biology continue to greatly outweigh the risks.
        </p>
      </div>
@@ Line 379: / Line 478: @@
          <div class="card-content white-text">
            <div align="center">
-             <img class="responsive-img" src="https://static.igem.org/mediawiki/2017/5/5b/T--Lethbridge--IARPA.png"/>
+             <a href="https://www.iarpa.gov"><img class="responsive-img" src="https://static.igem.org/mediawiki/2017/5/5b/T--Lethbridge--IARPA.png"/></a>
            </div>
            <a href="https://www.iarpa.gov" class="align-center">Check out <b>IARPA</b>!</a>
@@ Line 389: / Line 488: @@
        <div class="card-content white-text">
          <div align="center">
-           <img class="responsive-img" src="https://static.igem.org/mediawiki/2017/9/99/T--Lethbridge--FunGCAT.png"/>
+           <a href="https://www.iarpa.gov/index.php/research-programs/fun-gcat"><img class="responsive-img" src="https://static.igem.org/mediawiki/2017/9/99/T--Lethbridge--FunGCAT.png"/></a>
          </div>
          <a href="https://www.iarpa.gov/index.php/research-programs/fun-gcat" class="align-center">Check out <b>FunGCAT</b>!</a>
@@ Line 413: / Line 512: @@
          <div class="card-content white-text">
            <div align="center">
-             <img class="responsive-img" style="max-width:100px" src="https://diversity.github.com/assets/svg/mark-github.svg"/>
+             <a href="https://github.com/chrisaac/CODONxCHANGE"><img class="responsive-img" style="max-width:200px" href="https://github.com/chrisaac/CODONxCHANGE" src="https://static.igem.org/mediawiki/2017/9/97/T--Lethbridge--DeToxIT.png"/></a>
            </div>
            <a href="https://github.com/chrisaac/CODONxCHANGE" class="align-center">Check out DeToxIT in the <b>CODONxCHANGE</b> GitHub repository!</a>
@@ Line 421: / Line 520: @@
    </div>
 </div>
+<div id="anchor5"></div><br><br>
 <div class="container">
    <div class="row">
      <div class="col s12">
-       <h1 class="segmentHeader"><span style="font-weight:normal;">Summary</h1>
+       <h2 class="segmentHeader">Summary</h2>
      </div>
    </div>
@@ Line 431: / Line 530: @@
      <div class="col s12 m6">
        <h5>Biosecurity Analysis</h5>
-       <p class="text12">We determined that current biosecurity protocols are ineffective at detecting recoded sequences.</p>
+       <p class="text12">We determined that current biosecurity protocols and tools (BLAST) are ineffective at identiying proteins that could arise from recoded sequences.</p>
-       <p class="text12">We also determined that the minimum degree of recoding required to evade BLAST is X recoding events.</p>
+       <p class="text12">We also determined that the minimum degree of recoding required to evade BLAST is approximately 9 recoding events.</p>
-       <p class="text12">Lastly, we have been in contact with members of the IGSC and are excited to continue working to keep DNA synthesis safe.</p>
+      <p class="text12">We also determined that 4 recoding events are sufficient to reduce the percent identity of a toxin to 80% of its closest reference sequence.</p>
+       <p class="text12">Lastly, we have built relationships with members of the IGSC and are excited to continue working to keep DNA synthesis safe.</p>
      </div>
      <div class="col s12 m6">
        <h5>CODONxCHANGE Software Suite</h5>
-         <p class="text12">GRecoS (Genetic Recoding Space)</p>
+         <p class="text12"><b>GRecoS</b> (Genetic Recoding Space)</p>
-         <p class="text12">SeCReT (Sequential Codon Reassignment Tool)</p>
+         <p class="text12"><b>SeCReT</b> (Sequential Codon Reassignment Tool)</p>
-         <p class="text12">DeToxIT (Decryption and Toxin Identification Tool)</p>
+         <p class="text12"><b>DeToxIT</b> (Decryption and Toxin Identification Tool)</p>
      </div>
    </div>
    <div class="row">
      <div class="col s12">
@@ Line 449: / Line 550: @@
      </div>
        <div align="center">
-         <a class="waves-effect waves-light btn orange darken-3" href="https://github.com/chrisaac/CODONxCHANGE">Get Access</a>
+         <a class="waves-effect waves-light btn orange darken-3" href="https://goo.gl/forms/WFMgcBHVFg8cI13o2">Get Access</a>
        </div>
    </div>
@@ Line 457: / Line 558: @@
    <div class="row">
      <div class="col s12">
-       <h1 class="segmentHeader"><span style="font-weight:normal;">References</h1>
+       <h2>References</h2>
      </div>
    </div>

Difference between revisions of "Team:Lethbridge/Software"

Latest revision as of 02:20, 2 November 2017

One of these sequences encodes a toxin.

Do you know which?

The Next vivo Connection

Rapid Cell-free Systems

Genetic Recoding

Methods for Genetic Recoding

Encrypted Sequences

Novel Genetic Codes

GRecoS (Genetic Recoding Space)

Preliminary Testing

Detecting Encrypted Sequences

Beating BLAST

Basic Local Alignment Search Tool

SeCReT (Sequential Codon Reassignment Tool)

How Robust is BLAST?

Rational Evasion of BLAST

Building Solutions

Changes on the Horizon

DeToxIT (Decryption and Toxin Identification Tool)

Summary

Biosecurity Analysis

CODONxCHANGE Software Suite

Looking for the Source Code?

References