Difference between revisions of "Team:Lethbridge/Software"

Line 32: Line 32:
 
   <br><br>
 
   <br><br>
  
<div>
+
 
 
+
<div style="margin-left:5%; margin-right:5%">
 +
 
 +
 
 
<h1><span style="font-weight:normal;">One of these sequences is a toxin.</h1>
 
<h1><span style="font-weight:normal;">One of these sequences is a toxin.</h1>
 
   <h1>Do you know which?</h1>
 
   <h1>Do you know which?</h1>
Line 46: Line 48:
 
<br /><br />
 
<br /><br />
  
<h1 class="segmentHeader"><span style="font-weight:normal;">THE NEXT VIVO CONNECTION</h1>
+
<div class="segmentDiv">
  <p class="pageText">
+
      <div class="centerContainer">
    In essence, our project is a rapidly purifiable cell-free system to bring the benefits of synthetic biology to as many people as possible. To do so, we provide methods to easily purify all of the necessary transcriptional and translational components. This includes proteins and RNAs- including functional tRNAs. Furthermore, the Next Vivo system lacks genomic DNA and is instead a minimal simple DNA input and protein output system. Because of these characteristics, Next vivo is highly amenable to genetic recoding.  
+
        <h1 class="segmentHeader"><span style="font-weight:normal;">THE NEXT VIVO CONNECTION</h1>
  </p>
+
          <p class="pageText">
  <p class="pageText">
+
            In essence, our project is a rapidly purifiable cell-free system to bring the benefits of synthetic biology to as many people as possible. To do so, we provide methods to easily purify all of the necessary transcriptional and translational components. This includes proteins and RNAs- including functional tRNAs. Furthermore, the Next Vivo system lacks genomic DNA and is instead a minimal simple DNA input and protein output system. Because of these characteristics, Next vivo is highly amenable to genetic recoding.  
    Though there is some discussion surrounding the use of the term “Genetic Recoding” and “Codon Reassignment.” CITATION (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2921827/) Our system falls in between two proposed definitions, and we have chosen to refer to the practice as “Genetic Recoding” in the context of our project. (SIDEBAR?)  
+
          </p>
  </p>
+
          <p class="pageText">
  <p class="pageText">
+
            Though there is some discussion surrounding the use of the term “Genetic Recoding” and “Codon Reassignment.” CITATION (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2921827/) Our system falls in between two proposed definitions, and we have chosen to refer to the practice as “Genetic Recoding” in the context of our project. (SIDEBAR?)  
    Genetic recoding is a process by which the conventional relationships between codon-anticodon and tRNA-amino acid are altered. For instance, the amber stop codon (UAG) can be reassigned to instead incorporate a natural or unnatural amino acid into a growing peptide. CITATION Modifying the relationship between codon and amino acid incorporation is equivalent to the creation of a novel genetic code. This has numerous benefits including the incorporation of unnatural amino acids, biocontainment, and protein engineering. (LINK OUT INTERNALLY, and then also BIELEFELD?)
+
          </p>
  </p>
+
          <p class="pageText">
  <p class="pageText">
+
            Genetic recoding is a process by which the conventional relationships between codon-anticodon and tRNA-amino acid are altered. For instance, the amber stop codon (UAG) can be reassigned to instead incorporate a natural or unnatural amino acid into a growing peptide. CITATION Modifying the relationship between codon and amino acid incorporation is equivalent to the creation of a novel genetic code. This has numerous benefits including the incorporation of unnatural amino acids, biocontainment, and protein engineering. (LINK OUT INTERNALLY, and then also BIELEFELD?)
    Recoding can be accomplished via:
+
          </p>
  </p>
+
          <p class="pageText">
  <ul>
+
            Recoding can be accomplished via:
    <li>Introducing orthogonal tRNA-aaRS pairs CITATION</li>
+
          </p>
    <li>Mutating tRNA-aaRS pairs CITATION</li>
+
          <ul>
    <li>tRNA misacylation by promiscuous RNA enzymes (Flexizymes) CITATION</li>
+
            <li>Introducing orthogonal tRNA-aaRS pairs CITATION</li>
    <br />
+
            <li>Mutating tRNA-aaRS pairs CITATION</li>
  </ul>
+
            <li>tRNA misacylation by promiscuous RNA enzymes (Flexizymes) CITATION</li>
  <p class="pageText">
+
            <br />
    Though this is a developing field, genetic recoding will only develop as scientific understanding and computational design improve. It is not hard to imagine the construction of a library of tRNAs that can be charged with non-canonical amino acids. Whether this is achieved via flexizymes or mutant pairs, selecting internally consistent sets of tRNAs and charging machinery will make it trivially easy to design a novel genetic code, and the Next vivo system would make it readily obtainable.   
+
          </ul>
  </p>
+
          <p class="pageText">
  <p class="pageText">
+
            Though this is a developing field, genetic recoding will only develop as scientific understanding and computational design improve. It is not hard to imagine the construction of a library of tRNAs that can be charged with non-canonical amino acids. Whether this is achieved via flexizymes or mutant pairs, selecting internally consistent sets of tRNAs and charging machinery will make it trivially easy to design a novel genetic code, and the Next vivo system would make it readily obtainable.   
    When the available sample space provided by the genetic code is analyzed, recoding allows for a potential to generate numerous genetic codes according to the following formula:
+
          </p>
  </p>
+
          <p class="pageText">
  <p class="pageText">
+
            When the available sample space provided by the genetic code is analyzed, recoding allows for a potential to generate numerous genetic codes according to the following formula:
    Where n is the number of nucleic acid bases, l is the length of the codon, s is the number of switches, and a is the number of amino acids that need a codon.  
+
          </p>
  </p>
+
          <p class="pageText">
  <p class="pageText">
+
            Where n is the number of nucleic acid bases, l is the length of the codon, s is the number of switches, and a is the number of amino acids that need a codon.  
    At a minimum, a single switch means that there are 64 potential internally consistent genetic codes available. When all codons are reassigned, a simplistic estimation (64!/20!) suggests that there are 5.21 x 10^70 possible combinations available. This is a really really large sample space to search combinatorially. However, it remains to be seen whether or not this relationship is cryptographically strong.  
+
          </p>
  </p>
+
          <p class="pageText">
  <p class="pageText">
+
            At a minimum, a single switch means that there are 64 potential internally consistent genetic codes available. When all codons are reassigned, a simplistic estimation (64!/20!) suggests that there are 5.21 x 10^70 possible combinations available. This is a really really large sample space to search combinatorially. However, it remains to be seen whether or not this relationship is cryptographically strong.  
    <b>The apparent risk of this technology is that genetic recoding may allow harmful sequences to be “encrypted”, thus masking the information contained within while retaining the ability to faithfully produce the encoded protein. </b>
+
          </p>
  </p>
+
          <p class="pageText">
  <p class="pageText">
+
            <b>The apparent risk of this technology is that genetic recoding may allow harmful sequences to be “encrypted”, thus masking the information contained within while retaining the ability to faithfully produce the encoded protein. </b>
    The potential for harm as a result of this technology is not to be underestimated. If recoded systems become as prevalent and easy to obtain as we expect them to be, control over where toxin sequences are sent greatly diminishes. Accordingly, we reached out to gene synthesis companies to determine whether or not current bioinformatic technologies can detect radically re-coded toxin sequences.  
+
          </p>
  </p>
+
          <p class="pageText">
  <p class="pageText">
+
            The potential for harm as a result of this technology is not to be underestimated. If recoded systems become as prevalent and easy to obtain as we expect them to be, control over where toxin sequences are sent greatly diminishes. Accordingly, we reached out to gene synthesis companies to determine whether or not current bioinformatic technologies can detect radically re-coded toxin sequences.  
    Following this reached out to all current members of the IGSC asking them to screen twelve sequences for us. Of the five companies that were willing to help us, all of them correctly identified the un-encrypted toxic proteins. However, no organization could correctly identify the encrypted toxins. The data is available here to try for yourself: LINK TO DATA SET?  
+
          </p>
  </p>
+
          <p class="pageText">
  <h2>TABLE OF RESULTS 0%</h2>
+
            Following this reached out to all current members of the IGSC asking them to screen twelve sequences for us. Of the five companies that were willing to help us, all of them correctly identified the un-encrypted toxic proteins. However, no organization could correctly identify the encrypted toxins. The data is available here to try for yourself: LINK TO DATA SET?  
  <p class="pageText">
+
          </p>
    This experiment was repeated using all the BLAST software hosted on the NCBI website. (https://blast.ncbi.nlm.nih.gov/Blast.cgi) Again, the software could not identify any of the completely recoded sequences.  
+
          <h2>TABLE OF RESULTS 0%</h2>
  </p>
+
          <p class="pageText">
  <p class="pageText">
+
            This experiment was repeated using all the BLAST software hosted on the NCBI website. (https://blast.ncbi.nlm.nih.gov/Blast.cgi) Again, the software could not identify any of the completely recoded sequences.  
    Following the initial testing, we have maintained correspondence with individuals at these companies are are looking forward to working closely with them to ensure that DNA synthesis remains a safe and secure practice. Synthetic biologists need DNA, and DNA synthesis needs new bioinformatic screening tools.  
+
          </p>
  </p>
+
          <p class="pageText">
 +
            Following the initial testing, we have maintained correspondence with individuals at these companies are are looking forward to working closely with them to ensure that DNA synthesis remains a safe and secure practice. Synthetic biologists need DNA, and DNA synthesis needs new bioinformatic screening tools.  
 +
          </p>
 +
 
 +
        </div>
 +
</div>
  
<h1 class="segmentHeader"><span style="font-weight:normal;">BEATING BLAST</h1>
+
  <div class="segmentDiv">
 
+
    <div class="centerContainer">
  <p>
+
      <h1 class="segmentHeader"><span style="font-weight:normal;">BEATING BLAST</h1>
    Currently the only tool maintaining the safety and security of DNA synthesis is BLAST.  We have shown earlier that recoding completely nullifies the ability of BLAST to detect a sequence, but it remains to be seen how much recording BLAST can tolerate before a sequence becomes totally unmatchable to a reference. BLAST works by breaking a query sequence into small ‘words’ of a specified length. Words that exactly match a sequence within the database are ‘high-scoring pairs’ and contribute to a positive scoring alignment. In essence, the more exact word matches in a query sequence to a database sequence, the better the alignment score will be.  
+
       
  </p>
+
        <p>
 +
          Currently the only tool maintaining the safety and security of DNA synthesis is BLAST.  We have shown earlier that recoding completely nullifies the ability of BLAST to detect a sequence, but it remains to be seen how much recording BLAST can tolerate before a sequence becomes totally unmatchable to a reference. BLAST works by breaking a query sequence into small ‘words’ of a specified length. Words that exactly match a sequence within the database are ‘high-scoring pairs’ and contribute to a positive scoring alignment. In essence, the more exact word matches in a query sequence to a database sequence, the better the alignment score will be.  
 +
        </p>
 +
      </div>
 +
    </div>
  
 
</div>
 
</div>
 
</body>
 
</body>
 
</html>
 
</html>

Revision as of 01:33, 29 October 2017



One of these sequences is a toxin.

Do you know which?

ACAGTTACACGGACAACAAGGTGTTCCAGGCTTCTTTCCTCCCTTCGACGATGTATTTCCAATAGGTGTAATCGTCGGCGAAATCGTGTTCGGTTCTCCCGACGACTGCGAAGGAAACGGATTGTTCGGTGTATTCGGCGTGTACAGCAGAATTTCACCCGACAGCGGTCCTTCTTCACCAAATCCCAGCGGCGGCGG

CTGCGCGATGCTCGACGAGTCAATTCCGCTGTAGTACAGGAAGTCGTACGAGTCATTGCTATTGTATCAGCTAGAGATAATCGCGTACGCGCTCGAGCTCGAGCTATTTCGTCCTGAGCTGATGTCTCCGTCGATAATGAAAAATCCTCCGCTGATGTCCAGGTACAGACCCAGTCCGTCGTATCCTCCAATCGCTGA

GGTGCGAAGATTGACGACCCGCTCTATATTCATCATGTGTGGCCGCATGACCCGACAATTACACATTTCATTTTAAAGCTCGCGCATGCGATTGACATTGACATTACACTATATGAAATTAAGCCGTATCCGAAGCTCTGGCGTTATTATATTAAGCCGGTGCATGTGTCTGTGTATCCGCATTATTATCTCGCGGAA

AGGCACTTCCTACTTCTTAAGAAACGGCTAAGCAGCAGAGTTAAGAGCCTTAAGTCACTATCAAGCCCGCTAGTATTCAAACACAGCCACCTACTTCTACTTCTATCATGGCGGATGCTATTCAAGCGGAAGTTCAAAGTTTGCCGGCGGCTATTCAAGAGAAGCAGACCAAGACGGAAGAGCCGGCGGAAACACATG

V



THE NEXT VIVO CONNECTION

In essence, our project is a rapidly purifiable cell-free system to bring the benefits of synthetic biology to as many people as possible. To do so, we provide methods to easily purify all of the necessary transcriptional and translational components. This includes proteins and RNAs- including functional tRNAs. Furthermore, the Next Vivo system lacks genomic DNA and is instead a minimal simple DNA input and protein output system. Because of these characteristics, Next vivo is highly amenable to genetic recoding.

Though there is some discussion surrounding the use of the term “Genetic Recoding” and “Codon Reassignment.” CITATION (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2921827/) Our system falls in between two proposed definitions, and we have chosen to refer to the practice as “Genetic Recoding” in the context of our project. (SIDEBAR?)

Genetic recoding is a process by which the conventional relationships between codon-anticodon and tRNA-amino acid are altered. For instance, the amber stop codon (UAG) can be reassigned to instead incorporate a natural or unnatural amino acid into a growing peptide. CITATION Modifying the relationship between codon and amino acid incorporation is equivalent to the creation of a novel genetic code. This has numerous benefits including the incorporation of unnatural amino acids, biocontainment, and protein engineering. (LINK OUT INTERNALLY, and then also BIELEFELD?)

Recoding can be accomplished via:

  • Introducing orthogonal tRNA-aaRS pairs CITATION
  • Mutating tRNA-aaRS pairs CITATION
  • tRNA misacylation by promiscuous RNA enzymes (Flexizymes) CITATION

Though this is a developing field, genetic recoding will only develop as scientific understanding and computational design improve. It is not hard to imagine the construction of a library of tRNAs that can be charged with non-canonical amino acids. Whether this is achieved via flexizymes or mutant pairs, selecting internally consistent sets of tRNAs and charging machinery will make it trivially easy to design a novel genetic code, and the Next vivo system would make it readily obtainable.

When the available sample space provided by the genetic code is analyzed, recoding allows for a potential to generate numerous genetic codes according to the following formula:

Where n is the number of nucleic acid bases, l is the length of the codon, s is the number of switches, and a is the number of amino acids that need a codon.

At a minimum, a single switch means that there are 64 potential internally consistent genetic codes available. When all codons are reassigned, a simplistic estimation (64!/20!) suggests that there are 5.21 x 10^70 possible combinations available. This is a really really large sample space to search combinatorially. However, it remains to be seen whether or not this relationship is cryptographically strong.

The apparent risk of this technology is that genetic recoding may allow harmful sequences to be “encrypted”, thus masking the information contained within while retaining the ability to faithfully produce the encoded protein.

The potential for harm as a result of this technology is not to be underestimated. If recoded systems become as prevalent and easy to obtain as we expect them to be, control over where toxin sequences are sent greatly diminishes. Accordingly, we reached out to gene synthesis companies to determine whether or not current bioinformatic technologies can detect radically re-coded toxin sequences.

Following this reached out to all current members of the IGSC asking them to screen twelve sequences for us. Of the five companies that were willing to help us, all of them correctly identified the un-encrypted toxic proteins. However, no organization could correctly identify the encrypted toxins. The data is available here to try for yourself: LINK TO DATA SET?

TABLE OF RESULTS 0%

This experiment was repeated using all the BLAST software hosted on the NCBI website. (https://blast.ncbi.nlm.nih.gov/Blast.cgi) Again, the software could not identify any of the completely recoded sequences.

Following the initial testing, we have maintained correspondence with individuals at these companies are are looking forward to working closely with them to ensure that DNA synthesis remains a safe and secure practice. Synthetic biologists need DNA, and DNA synthesis needs new bioinformatic screening tools.

BEATING BLAST

Currently the only tool maintaining the safety and security of DNA synthesis is BLAST. We have shown earlier that recoding completely nullifies the ability of BLAST to detect a sequence, but it remains to be seen how much recording BLAST can tolerate before a sequence becomes totally unmatchable to a reference. BLAST works by breaking a query sequence into small ‘words’ of a specified length. Words that exactly match a sequence within the database are ‘high-scoring pairs’ and contribute to a positive scoring alignment. In essence, the more exact word matches in a query sequence to a database sequence, the better the alignment score will be.