Experiments
Wet Lab Protocols
LB Broth
25g LB Broth, Miller
1L Water
-Mix LB powder into water, autoclave for 15 minutes. Supplement with antibiotics
(Chloramphenicol, 35μL/mL) as needed.
LB Agar
25g LB Broth, Miller
15g Granulated Agar
1L Water
-Mix LB powder and Agar into water. Heat until agar is dissolved and solution is clear, avoid
boiling over. Autoclave for 15 minutes. Supplement with antibiotics (Chloramphenicol,
35μL/mL) as needed. Pour 20mL into sterile culture plates and let cool.
Transformation of Competent Cells
50μL Competent Cells (DH5α E. coli)
10ng Plasmid DNA
450μL LB
LB/Chloramphenicol Plate
-Thaw Competent cells on ice and add plasmid DNA. Let sit for 20 minutes on ice. Heat shock
cells at 45 o C for 30 seconds and then return to ice for 2 minutes. Add LB and incubate at 37 o C
for 2-3 hours. Plate 100μL of the cells on a LB/Chloramphenicol plate and incubate at 37 o C
overnight.
Dry Lab Protocols
Protein Reverse Translation
-Isolate the protein sequence of interest and reverse translate using the E. coli preferred codon
library in SnapGene. After reverse translation, look for out-of- frame coding regions and alter the
codons so that no transcription is likely to occur. Finally, run a BLASTX protocol to ensure that
the nucleotide sequence still encodes the protein of interest.
Machine Learning Protocols
LSTM model was coded using Tensorflow library and imported to Jupyter notebooks for 3 main experiments.
Artemisinin Binding
- Created LSTM model.
- Collected and imported positive dataset for proteins that bind to Artemisinin and negative dataset for proteins that do not bind to Artemisinin by hand from NCBI and literature.
- Train model to learn binding vs. not binding.
- Established consistent parameters for learning process (number of proteins per dataset, learning rate, network size, and sub-sequence length viewed).
- Ran test and repeated to establish consistency.
Artemisinin Consensus Sequence
- Created LSTM.
- Imported dataset (from previous model^).
- Trained model to look at specified sequence length for binding vs. not binding.
- Established consistent parameters for learning process (number of proteins per dataset, learning rate, network size, and sub-sequence length viewed).
- Set variables consistent (batch size, epochs, and sequence length to run in range loop).
- Ran test and repeated to establish consistency.
Homeobox Consensus Sequence
- Created LSTM.
- Imported dataset from Uniprot data website for proteins containing homeo-domain sequence and proteins and proteins not containing homeodomain sequence.
- Trained model to look at specified sequence length for binding vs. not binding.
- Established consistent parameters for learning process (number of proteins per dataset, learning rate, network size, and sub-sequence length viewed).
- Set variables consistent (batch size, epochs, and sequence length to run in range loop).
- Ran test and repeated to establish consistency.