By modifying the Translationally Controlled Tumor Protein (TCTP) homologue in P.
falciparum, we were able to create two novel synthetic proteins. The first protein has the areas of
interest 1 —including the two binding sites—removed and replaced with partly randomized
residues. The second has the areas replaced with a single, self-contained binding site. These will
serve as the basis for our negative control and our artemisinin binding device, respectively. In the next iGEM 2018, the Owlgems will test our new synthetic protein in wet-lab that is proposed by the machine learning model and develop the colorimetric assay for counterfeit Arteminisin drug detection.
The most notable results form our machine learning model are shown and briefly described below.
The model was trained to determine if a novel sequence would bind our not bind depending to artemisinin target drug. The model can predict binding with decent accuracy probability (Figure A). We showed the model 3 different novel amino acid sequences it has never seen before, gathered from literature and our proposed protein to be used in the genetic engineering component (Figure B). The results were as predicted (Figure C.)
Sequence length of 150, among others, performed rather well. Validation accuracy of around 90 percent (FigureD).The loss/validation graph (Figure E) shows the value of the error of function within the validation set (lower number = more accurate prediction). The model was also able to determine the composition of sequence that is most probable to be associated with binding (Figure F) with the lowest sequence length parameter being only 40 amino acids long. Since binding to Artemisinin is controlled by multiple unknown mechanisms, we proposed a new dataset to show the model could show a consensus sequence if one was present in a known control experiment (Homeobox Consensus Sequence).
Sequence length of 100 showed best and consistent results on our model. This result was predicted because the homeo-domain protein in around 60 amino acids in length. A sequence length of 100 allows for the machine to cut the sub-sequence (the parts of the sequence it views) more often, allowing for it to get the entire 60 amino acid long sequence in view (instead of the first half or later half if only viewing 60 amino acid long sequence length). The model also showed a predicted a sequence composition very close to the theoretical accepted homeo-domain consensus sequence (Figure H). The accuracy graph (Figure I) showed the highest score of around 80%, which is what you would expect to find for a variably conserved sequence (100% would mean the sequence is exactly the same). The loss/validation graph (Figure G) shows the value of the error of function within the validation set (lower number = more accurate prediction). The accuracy is an average of all the proteins in the data set that the model was tested on in predicting if the sequence it was looking at had the theoretical homeo-domain sequence or not. This model was used as the control to compare to the previous artemisinin binding set. Once the model was trained, we ran the theoretical consensus sequence of homeo-domain through the model, which detected the sequence was present with a probability of 94% (see software page).
Using the LSTM learning protocol, prospective Artemisinin binding proteins can be validated
preliminarily without needing to generate them in vitro. This allows us to rapidly determine what
sequences are likely to bind, saving time and money. However, there are only a limited number
of proteins that have known Artemisinin binding functionality, and testing random mutations
generated manually is tedious; the next logical step is to have the LSTM protocol generate likely
binding proteins autonomously. This could also be applied to other functions besides binding to
artemisinin, allowing researchers to create novel synthetic proteins for many different
Using the SnapGene visualization software, we created a device that can constitutively
express a protein or set of proteins and then inducibly lyse E. coli cells. This system can be used
in other applications to more effectively remove proteins from cells, as larger proteins can be
difficult for bacteria to transport. Indeed, it can also be used as a “kill switch” to lyse bacteria
that have served their purpose, allowing for rapid disposal of cell cultures.