Latest revision as of 01:26, 2 November 2017

By modifying the Translationally Controlled Tumor Protein (TCTP) homologue in P. falciparum, we were able to create two novel synthetic proteins. The first protein has the areas of interest 1 —including the two binding sites—removed and replaced with partly randomized residues. The second has the areas replaced with a single, self-contained binding site. These will serve as the basis for our negative control and our artemisinin binding device, respectively. In the next iGEM 2018, the Owlgems will test our new synthetic protein in wet-lab that is proposed by the machine learning model and develop the colorimetric assay for counterfeit Arteminisin drug detection.

Figure 1: Unaltered TCTP

Figure 2: TCTP with no binding sites.

Figure 3: TCTP with copied binding site repeats.

Machine Learning Results

The most notable results form our machine learning model are shown and briefly described below.

Artemisinin Binding

The model was trained to determine if a novel sequence would bind our not bind depending to artemisinin target drug. The model can predict binding with decent accuracy probability (Figure A). We showed the model 3 different novel amino acid sequences it has never seen before, gathered from literature and our proposed protein to be used in the genetic engineering component (Figure B). The results were as predicted (Figure C.)

Figure A

Figure B

Figure C

Artemisinin Consensus Sequence

Sequence length of 150, among others, performed rather well. Validation accuracy of around 90 percent (FigureD).The loss/validation graph (Figure E) shows the value of the error of function within the validation set (lower number = more accurate prediction). The model was also able to determine the composition of sequence that is most probable to be associated with binding (Figure F) with the lowest sequence length parameter being only 40 amino acids long. Since binding to Artemisinin is controlled by multiple unknown mechanisms, we proposed a new dataset to show the model could show a consensus sequence if one was present in a known control experiment (Homeobox Consensus Sequence).

Figure D

Figure E

Figure F

Homeobox Consensus Sequence

Sequence length of 100 showed best and consistent results on our model. This result was predicted because the homeo-domain protein in around 60 amino acids in length. A sequence length of 100 allows for the machine to cut the sub-sequence (the parts of the sequence it views) more often, allowing for it to get the entire 60 amino acid long sequence in view (instead of the first half or later half if only viewing 60 amino acid long sequence length). The model also showed a predicted a sequence composition very close to the theoretical accepted homeo-domain consensus sequence (Figure H). The accuracy graph (Figure I) showed the highest score of around 80%, which is what you would expect to find for a variably conserved sequence (100% would mean the sequence is exactly the same). The loss/validation graph (Figure G) shows the value of the error of function within the validation set (lower number = more accurate prediction). The accuracy is an average of all the proteins in the data set that the model was tested on in predicting if the sequence it was looking at had the theoretical homeo-domain sequence or not. This model was used as the control to compare to the previous artemisinin binding set. Once the model was trained, we ran the theoretical consensus sequence of homeo-domain through the model, which detected the sequence was present with a probability of 94% (see software page).

Figure H

Figure I _________________________________________________________________________________________________________ Figure G

Future Applications

Using the LSTM learning protocol, prospective Artemisinin binding proteins can be validated preliminarily without needing to generate them in vitro. This allows us to rapidly determine what sequences are likely to bind, saving time and money. However, there are only a limited number of proteins that have known Artemisinin binding functionality, and testing random mutations generated manually is tedious; the next logical step is to have the LSTM protocol generate likely binding proteins autonomously. This could also be applied to other functions besides binding to artemisinin, allowing researchers to create novel synthetic proteins for many different applications.

Using the SnapGene visualization software, we created a device that can constitutively express a protein or set of proteins and then inducibly lyse E. coli cells. This system can be used in other applications to more effectively remove proteins from cells, as larger proteins can be difficult for bacteria to transport. Indeed, it can also be used as a “kill switch” to lyse bacteria that have served their purpose, allowing for rapid disposal of cell cultures.

@@ Line 18: / Line 18: @@
 <div align="justify">
 <h1 style=" font-size:24px ; ">Results</h1>
+<form><fieldset>
 <center><h2>Synthetic Biology Results</h2></center>
 <p style="font-size: 18px">
@@ Line 24: / Line 25: @@
 interest 1 —including the two binding sites—removed and replaced with partly randomized
 residues. The second has the areas replaced with a single, self-contained binding site. These will
-serve as the basis for our negative control and our artemisinin binding device, respectively.
+serve as the basis for our negative control and our artemisinin binding device, respectively. In the next iGEM 2018, the Owlgems will test our new synthetic protein in wet-lab that is proposed by the machine learning model and develop the colorimetric assay for counterfeit Arteminisin drug detection.
 </p>
 <h5> Figure 1: Unaltered TCTP</h5>
-<img src="https://static.igem.org/mediawiki/2017/f/fa/T--Florida_Atlantic--TCTP.png" width= 200; height= 200px;>
+<img style="max-width:95%;border:3px solid darkred;" src="https://static.igem.org/mediawiki/2017/f/fa/T--Florida_Atlantic--TCTP.png" width= 200; height= 200px;>
 <h5> Figure 2: TCTP with no binding sites.</h5>
-<img src="https://static.igem.org/mediawiki/2017/1/1b/T--Florida_Atlantic--BrokenTCTP.png" width= 200; height= 200px;>
+<img style="max-width:95%;border:3px solid darkred;" src="https://static.igem.org/mediawiki/2017/1/1b/T--Florida_Atlantic--BrokenTCTP.png" width= 200; height= 200px; ">
 <h5> Figure 3: TCTP with copied binding site repeats.</h5>
-<img src="https://static.igem.org/mediawiki/2017/c/ce/T--Florida_Atlantic--PurposedTCTP.png" width= 200; height= 200px;>
+<img style="max-width:95%;border:3px solid darkred;" src="https://static.igem.org/mediawiki/2017/c/ce/T--Florida_Atlantic--PurposedTCTP.png" width= 200; height= 200px;>
 </br>
 </br>
@@ Line 37: / Line 38: @@
 <p style="font-size: 18px">
 The most notable results form our machine learning model are shown and briefly described below.</p>
+</br>
 </br>
 <h4>Artemisinin Binding</h4>
@@ Line 42: / Line 44: @@
 The model was trained to determine if a novel sequence would bind our not bind depending to artemisinin target drug. The model can predict binding with decent accuracy probability (Figure A). We showed the model 3 different novel amino acid sequences it has never seen before, gathered from literature and our proposed protein to be used in the genetic engineering component (Figure B). The results were as predicted (Figure C.)</p>
 <h5> Figure A </h5>
-<img src="https://static.igem.org/mediawiki/2017/3/3d/T--Florida_Atlantic--ArtemisininBindingAccuracy.png" width= 200; height= 400px;>
+<img style="max-width:95%;border:3px solid darkred;" src="https://static.igem.org/mediawiki/2017/3/3d/T--Florida_Atlantic--ArtemisininBindingAccuracy.png" width= 200; height= 400px;>
 <h5> Figure B </h5>
-<img src="https://static.igem.org/mediawiki/2017/3/36/T--Florida_Atlantic--novelsequences.png" width= 200; height= 400px;>
+<img style="max-width:95%;border:3px solid darkred;" src="https://static.igem.org/mediawiki/2017/3/36/T--Florida_Atlantic--novelsequences.png" width= 200; height= 400px;>
 <h5> Figure C </h5>
-<img src="https://static.igem.org/mediawiki/2017/a/ac/T--Florida_Atlantic--ArtemisininNovelBinding.png" width= 200; height= 400px;>
+<img style="max-width:95%;border:3px solid darkred;" src="https://static.igem.org/mediawiki/2017/a/ac/T--Florida_Atlantic--ArtemisininNovelBinding.png" width= 200; height= 500px;>
+</br>
+</br>
 </br>
 <h4>Artemisinin Consensus Sequence</h4>
 <p style="font-size: 18px">
-After running this experiment many times, the model could not show any specific sequence length being more favorable in sequences that bind to artemisinin. All different sequence lengths form 3-100 aggregated around 62% accuracy (Figure D,E,), showing that the sequence may be involved in binding but a preferred consensus sequence length cannot be determined. One possible explanation for why different sequence lengths had varying accuracy/validation could be due stochastic nature of initial network weights. The loss/validation graph (Figure F) shows the value of the error of function within the validation set (lower number = more accurate prediction). Since binding to Artemisinin is controlled by multiple unknown mechanisms, we proposed a new dataset to show the model could show a consensus sequence if one was present (see results below).
+Sequence length of 150, among others, performed rather well. Validation accuracy of around 90 percent (FigureD).The loss/validation graph (Figure E) shows the value of the error of function within the validation set (lower number = more accurate prediction). The model was also able to determine the composition of sequence that is most probable to be associated with binding (Figure F) with the lowest sequence length parameter being only 40 amino acids long. Since binding to Artemisinin is controlled by multiple unknown mechanisms, we proposed a new dataset to show the model could show a consensus sequence if one was present in a known control experiment (Homeobox Consensus Sequence).
 </p>
 <h5> Figure D </h5>
-<img src=" ">
+<img style="max-width:95%;border:3px solid darkred;" src="https://static.igem.org/mediawiki/2017/5/55/T--Florida_Atlantic--ArtemisininAccuracyValidation.png" width= 200; height= 400px;>
 <h5> Figure E </h5>
-<img src=" ">
+<img style="max-width:95%;border:3px solid darkred;" src="https://static.igem.org/mediawiki/2017/f/fc/T--Florida_Atlantic--ArtemisininValidationLoss.png" width= 200; height= 400px;>
 <h5> Figure F </h5>
-<img src=" ">
+<img style="max-width:95%;border:3px solid darkred;" src="https://static.igem.org/mediawiki/2017/0/0f/T--Florida_Atlantic--ArtemisininTheoreticalConsensus.png" width= 200; height= 400px;>
+</br>
+</br>
 </br>
 <h4>Homeobox Consensus Sequence</h4>
 <p style="font-size: 18px">
-Sequence length of 100 showed best and consistent results on our model. This result was predicted because the homeo-domain protein in around 60 amino acids in length. A sequence length of 100 allows for the machine to cut the sub-sequence (the parts of the sequence it views) more often, allowing for it to get the entire 60 amino acid long sequence in view (instead of the first half or later half if only viewing 60 amino acid long sequence length). The accuracy graph (Figure H)  showed the highest score of around 80%, which is what you would expect to find for a variably conserved sequence (100% would mean the sequence is exactly the same). The accuracy is an average of all the proteins in the data set that the model was tested on in predicting if the sequence it was looking at had the theoretical homeo-domain sequence (Figure G)  or not. The loss/validation graph (Figure I) shows the value of the error of function within the validation set (lower number = more accurate prediction). This model was used as the control to compare to the previous artemisinin binding set. Once the model was trained, we ran the theoretical consensus sequence of homeo-domain through the model, which detected the seqeunce was present with a probability of 94% (see software page).
+Sequence length of 100 showed best and consistent results on our model. This result was predicted because the homeo-domain protein in around 60 amino acids in length. A sequence length of 100 allows for the machine to cut the sub-sequence (the parts of the sequence it views) more often, allowing for it to get the entire 60 amino acid long sequence in view (instead of the first half or later half if only viewing 60 amino acid long sequence length). The model also showed a predicted a sequence composition very close to the theoretical accepted homeo-domain consensus sequence (Figure H). The accuracy graph (Figure I)  showed the highest score of around 80%, which is what you would expect to find for a variably conserved sequence (100% would mean the sequence is exactly the same). The loss/validation graph (Figure G) shows the value of the error of function within the validation set (lower number = more accurate prediction). The accuracy is an average of all the proteins in the data set that the model was tested on in predicting if the sequence it was looking at had the theoretical homeo-domain sequence or not.  This model was used as the control to compare to the previous artemisinin binding set. Once the model was trained, we ran the theoretical consensus sequence of homeo-domain through the model, which detected the sequence was present with a probability of 94% (see software page).
 </p>
+<h5> Figure H </h5>
+<img style="max-width:95%;border:3px solid darkred;" src="https://static.igem.org/mediawiki/2017/8/84/T--Florida_Atlantic--HomeodomainPredicted_Theoretical.png" width= 200; height= 200px;>
+<center><h5> Figure I _________________________________________________________________________________________________________  Figure G</h5></center>
+<img style="max-width:95%;border:3px solid darkred;" src="https://static.igem.org/mediawiki/2017/5/5b/T--Florida_Atlantic--HomeoModelResults.png" width= 200; height= 400px;>
-<center><h5> Figure H _________________________________________________________________________________________________________  Figure I</h5></center>
-<img src="https://static.igem.org/mediawiki/2017/5/5b/T--Florida_Atlantic--HomeoModelResults.png">
-<h5> Figure G </h5>
-<img src=" ">
+</br>
+</br>
 </br>
@@ Line 92: / Line 98: @@
 that have served their purpose, allowing for rapid disposal of cell cultures.
 </p>
+</feildset>
+</form>
 </div>

Difference between revisions of "Team:Florida Atlantic/Results"