Revision as of 20:31, 1 November 2017

By modifying the Translationally Controlled Tumor Protein (TCTP) homologue in P. falciparum, we were able to create two novel synthetic proteins. The first protein has the areas of interest 1 —including the two binding sites—removed and replaced with partly randomized residues. The second has the areas replaced with a single, self-contained binding site. These will serve as the basis for our negative control and our artemisinin binding device, respectively.

Figure 1: Unaltered TCTP

Figure 2: TCTP with no binding sites.

Figure 3: TCTP with copied binding site repeats.

Machine Learning Results

The most notable results form our machine learning model are shown and briefly described below.

Artemisinin Binding

The model was trained to determine if a novel sequence would bind our not bind depending to artemisinin target drug. The model can predict binding with decent accuracy probability (Figure A). We showed the model 3 different novel amino acid sequences it has never seen before, gathered from literature and our proposed protein to be used in the genetic engineering component (Figure B). The results were as predicted (Figure C.)

Figure A

Figure B

Figure C

Artemisinin Consensus Sequence

After running this experiment many times, the model could not show any specific sequence length being more favorable in sequences that bind to artemisinin. All different sequence lengths form 3-100 aggregated around 62% accuracy (Figure D,E,), showing that the sequence may be involved in binding but a preferred consensus sequence length cannot be determined. One possible explanation for why different sequence lengths had varying accuracy/validation could be due stochastic nature of initial network weights. The loss/validation graph (Figure F) shows the value of the error of function within the validation set (lower number = more accurate prediction). Since binding to Artemisinin is controlled by multiple unknown mechanisms, we proposed a new dataset to show the model could show a consensus sequence if one was present (see results below).

Figure D

Figure E

Figure F

Homeobox Consensus Sequence

Sequence length of 100 showed best and consistent results on our model. This result was predicted because the homeo-domain protein in around 60 amino acids in length. A sequence length of 100 allows for the machine to cut the sub-sequence (the parts of the sequence it views) more often, allowing for it to get the entire 60 amino acid long sequence in view (instead of the first half or later half if only viewing 60 amino acid long sequence length). The accuracy graph (Figure H) showed the highest score of around 80%, which is what you would expect to find for a variably conserved sequence (100% would mean the sequence is exactly the same). The accuracy is an average of all the proteins in the data set that the model was tested on in predicting if the sequence it was looking at had the theoretical homeo-domain sequence (Figure G) or not. The loss/validation graph (Figure I) shows the value of the error of function within the validation set (lower number = more accurate prediction). This model was used as the control to compare to the previous artemisinin binding set. Once the model was trained, we ran the theoretical consensus sequence of homeo-domain through the model, which detected the seqeunce was present with a probability of 94% (see software page).

Figure H _________________________________________________________________________________________________________ Figure I

Figure G

Future Applications

Using the LSTM learning protocol, prospective Artemisinin binding proteins can be validated preliminarily without needing to generate them in vitro. This allows us to rapidly determine what sequences are likely to bind, saving time and money. However, there are only a limited number of proteins that have known Artemisinin binding functionality, and testing random mutations generated manually is tedious; the next logical step is to have the LSTM protocol generate likely binding proteins autonomously. This could also be applied to other functions besides binding to artemisinin, allowing researchers to create novel synthetic proteins for many different applications.

Using the SnapGene visualization software, we created a device that can constitutively express a protein or set of proteins and then inducibly lyse E. coli cells. This system can be used in other applications to more effectively remove proteins from cells, as larger proteins can be difficult for bacteria to transport. Indeed, it can also be used as a “kill switch” to lyse bacteria that have served their purpose, allowing for rapid disposal of cell cultures.

@@ Line 27: / Line 27: @@
 </p>
 <h5> Figure 1: Unaltered TCTP</h5>
-<img src="https://static.igem.org/mediawiki/2017/f/fa/T--Florida_Atlantic--TCTP.png" width= 200; height= 200px;>
+<img src="https://static.igem.org/mediawiki/2017/f/fa/T--Florida_Atlantic--TCTP.png" width= 200; height= 200px; border="5">
 <h5> Figure 2: TCTP with no binding sites.</h5>
-<img src="https://static.igem.org/mediawiki/2017/1/1b/T--Florida_Atlantic--BrokenTCTP.png" width= 200; height= 200px;>
+<img src="https://static.igem.org/mediawiki/2017/1/1b/T--Florida_Atlantic--BrokenTCTP.png" width= 200; height= 200px; border="5">
 <h5> Figure 3: TCTP with copied binding site repeats.</h5>
-<img src="https://static.igem.org/mediawiki/2017/c/ce/T--Florida_Atlantic--PurposedTCTP.png" width= 200; height= 200px;>
+<img src="https://static.igem.org/mediawiki/2017/c/ce/T--Florida_Atlantic--PurposedTCTP.png" width= 200; height= 200px; border="5">
 </br>
 </br>

Difference between revisions of "Team:Florida Atlantic/Results"