Difference between revisions of "Team:Heidelberg/Notebook test"

Line 149: Line 149:
 
<h2 id="references-">References:</h2>
 
<h2 id="references-">References:</h2>
 
<p>Carlson, J.C., Badran, A.H., Guggiana-Nilo, D.A., and Liu, D.R. (2014). Negative selection and stringency modulation in phage-assisted continuous evolution. Nat Chem Biol 10, 216-222.<br>Pu, J., Zinkus-Boltz, J., and Dickinson, B.C. (2017). Evolution of a split RNA polymerase as a versatile biosensor platform. Nat Chem Biol 13, 432-438.</p>
 
<p>Carlson, J.C., Badran, A.H., Guggiana-Nilo, D.A., and Liu, D.R. (2014). Negative selection and stringency modulation in phage-assisted continuous evolution. Nat Chem Biol 10, 216-222.<br>Pu, J., Zinkus-Boltz, J., and Dickinson, B.C. (2017). Evolution of a split RNA polymerase as a versatile biosensor platform. Nat Chem Biol 13, 432-438.</p>
}}
+
}}|week_id-0}}
 
{{Heidelberg/accord|
 
{{Heidelberg/accord|
 
28|#tag:html|
 
28|#tag:html|
Line 502: Line 502:
 
<h2 id="reference-">Reference:</h2>
 
<h2 id="reference-">Reference:</h2>
 
<p>Esvelt, K.M., Carlson, J.C., and Liu, D.R. (2011). A system for the continuous directed evolution of biomolecules. Nature 472, 499-503.</p>
 
<p>Esvelt, K.M., Carlson, J.C., and Liu, D.R. (2011). A system for the continuous directed evolution of biomolecules. Nature 472, 499-503.</p>
}}
+
}}|week_id-1}}
 
{{Heidelberg/accord|
 
{{Heidelberg/accord|
 
30|#tag:html|
 
30|#tag:html|
Line 680: Line 680:
 
<li><em>Sunday</em>: Qiagen Midi Prep protocol for M13 phages was implemented for purification of our selection phage. Following this further, oxford nanopore standard protocol for testing the MinION sequencing was tested for feasibility. Several kits, supplied by NEB were ordered for preparation of the test library. A cooperation with AG Conrad of the Eils Labs should be started. They work with single cell sequencing and should therefore know a lot about library preparation and NGS.</li>
 
<li><em>Sunday</em>: Qiagen Midi Prep protocol for M13 phages was implemented for purification of our selection phage. Following this further, oxford nanopore standard protocol for testing the MinION sequencing was tested for feasibility. Several kits, supplied by NEB were ordered for preparation of the test library. A cooperation with AG Conrad of the Eils Labs should be started. They work with single cell sequencing and should therefore know a lot about library preparation and NGS.</li>
 
</ul>
 
</ul>
}}
+
}}|week_id-2}}
 
{{Heidelberg/accord|
 
{{Heidelberg/accord|
 
31|#tag:html|
 
31|#tag:html|
Line 1,106: Line 1,106:
 
<p>The from silica column chromatography obtained fractions were analysed first with DC and building on that, the most prosperous fraction was sent to the cooperating  Grebe group for GC-MS. The spectra was analysed with the help of a chemist from the group, Fabian Ebner, revealing that the reaction did not successfully take place as only educt was detected by the GC-MS. After further informing ourselves about the exact reaction mechanisms and the conditions as far as published in the technical literature, we derived at two suspicions for the failure of there reaction. One was that the incativation of the catalytic activity of the protein through carben transfer to a protein side chain as described in &gt;“Identification of mechanism-based inactivation in P450-catalyzed<br>cyclopropanation facilitates engineering of improved enzymes“ (H.Renata, 2016). Here the author states, that &gt;“inactivation of whole-cell catalyst was previously found to be slower than purified enzyme“.<br>The other approach suggested that our protein was in its oxidised state due to lack of sufficient oxygen exclusion and no addition of reduction reagent to the samples, implying that Carben Intermediat formation is only possible with the reduced Heme group.</p>
 
<p>The from silica column chromatography obtained fractions were analysed first with DC and building on that, the most prosperous fraction was sent to the cooperating  Grebe group for GC-MS. The spectra was analysed with the help of a chemist from the group, Fabian Ebner, revealing that the reaction did not successfully take place as only educt was detected by the GC-MS. After further informing ourselves about the exact reaction mechanisms and the conditions as far as published in the technical literature, we derived at two suspicions for the failure of there reaction. One was that the incativation of the catalytic activity of the protein through carben transfer to a protein side chain as described in &gt;“Identification of mechanism-based inactivation in P450-catalyzed<br>cyclopropanation facilitates engineering of improved enzymes“ (H.Renata, 2016). Here the author states, that &gt;“inactivation of whole-cell catalyst was previously found to be slower than purified enzyme“.<br>The other approach suggested that our protein was in its oxidised state due to lack of sufficient oxygen exclusion and no addition of reduction reagent to the samples, implying that Carben Intermediat formation is only possible with the reduced Heme group.</p>
 
<p>For this reason, a set of test reactions were performed with the commercially available Dimethylphenylsilan as a replacement of the 4-(dimethylsilyl)-anilin, always requiring custom synthesis. Hereby, we referred to the procedure of &gt;“Lysate-small scale biocatalytic reactions“, mainly differing from the previous experiment by the addition of Dithionite solution as a reducing agent and GC Septum vials as reaction vials. Besides four samples with purified cyt c under various conditions, two samples still contained the whole cell solution.</p>
 
<p>For this reason, a set of test reactions were performed with the commercially available Dimethylphenylsilan as a replacement of the 4-(dimethylsilyl)-anilin, always requiring custom synthesis. Hereby, we referred to the procedure of &gt;“Lysate-small scale biocatalytic reactions“, mainly differing from the previous experiment by the addition of Dithionite solution as a reducing agent and GC Septum vials as reaction vials. Besides four samples with purified cyt c under various conditions, two samples still contained the whole cell solution.</p>
}}
+
}}|week_id-3}}
 
{{Heidelberg/accord|
 
{{Heidelberg/accord|
 
32|#tag:html|
 
32|#tag:html|
Line 1,368: Line 1,368:
 
</table>
 
</table>
 
<p>The Golden Gate product is purified using the ZymoResearch Kit and then 1 µl is<br>electroporated into 25 µl of S1059 E. coli. After 4 hours incubation in 3 ml SOC medium the cell culture is used for an plaque assay. A plaque PCR showed that the Tet(X)-insert is not included in the Golden Gate product and<br>therefore new Golden Gate overhangs are designed.</p>
 
<p>The Golden Gate product is purified using the ZymoResearch Kit and then 1 µl is<br>electroporated into 25 µl of S1059 E. coli. After 4 hours incubation in 3 ml SOC medium the cell culture is used for an plaque assay. A plaque PCR showed that the Tet(X)-insert is not included in the Golden Gate product and<br>therefore new Golden Gate overhangs are designed.</p>
}}
+
}}|week_id-4}}
 
{{Heidelberg/accord|
 
{{Heidelberg/accord|
 
33|#tag:html|
 
33|#tag:html|
Line 2,089: Line 2,089:
 
</table>
 
</table>
 
<p>PCR proofed that the insertes are present in AP_destructase and a glycerol stock is created.</p>
 
<p>PCR proofed that the insertes are present in AP_destructase and a glycerol stock is created.</p>
}}
+
}}|week_id-5}}
 
{{Heidelberg/accord|
 
{{Heidelberg/accord|
 
34|#tag:html|
 
34|#tag:html|
Line 2,441: Line 2,441:
 
<h1 id="-ap-"><strong>AP</strong></h1>
 
<h1 id="-ap-"><strong>AP</strong></h1>
 
<p>To test under which conditions the selection phage propagates best, 12 variations of the destrucatase accessory plasmid are assembled by gibson assembly.<br>The variations include different RBS and origins of replication. Because time was running out we decided to end this project at that point and concentrated on other PACE and PREDCEL experiments.</p>
 
<p>To test under which conditions the selection phage propagates best, 12 variations of the destrucatase accessory plasmid are assembled by gibson assembly.<br>The variations include different RBS and origins of replication. Because time was running out we decided to end this project at that point and concentrated on other PACE and PREDCEL experiments.</p>
}}
+
}}|week_id-6}}
 
{{Heidelberg/accord|
 
{{Heidelberg/accord|
 
35|#tag:html|
 
35|#tag:html|
Line 2,566: Line 2,566:
 
<p>Each run was implemented for 10 h according to the supplied ONT script. Live basecalling was performed, yielding 2000 reads and 11 million events. The accuracy of both runs was approximately 65 %, which is not good enough for sequencing of our amplicons. Possible reasons for this fact, could be the age of the flow cells as well as deviations from the standard protocol, normally using AMPure beads instead of our DNA Dynabeads from ThermoFisher. For this reason, another experiment using a newer flow cell will be implemented. In this experiment, we will use PCR products from supernatant from different timepoints of Dickinson-PACE, in order to investigate the behavior of the MinION sequencing in this case. Further knowledge about the sequencing should be acquired by using the ONT community.</p>
 
<p>Each run was implemented for 10 h according to the supplied ONT script. Live basecalling was performed, yielding 2000 reads and 11 million events. The accuracy of both runs was approximately 65 %, which is not good enough for sequencing of our amplicons. Possible reasons for this fact, could be the age of the flow cells as well as deviations from the standard protocol, normally using AMPure beads instead of our DNA Dynabeads from ThermoFisher. For this reason, another experiment using a newer flow cell will be implemented. In this experiment, we will use PCR products from supernatant from different timepoints of Dickinson-PACE, in order to investigate the behavior of the MinION sequencing in this case. Further knowledge about the sequencing should be acquired by using the ONT community.</p>
 
<p>In analogy to the Bt toxin paper, it should be possible to implement a sequencing workflow, enabling a reliable basecalling accuracy. A scheme of this workflow is provided as a picture.</p>
 
<p>In analogy to the Bt toxin paper, it should be possible to implement a sequencing workflow, enabling a reliable basecalling accuracy. A scheme of this workflow is provided as a picture.</p>
}}
+
}}|week_id-7}}
 
{{Heidelberg/accord|
 
{{Heidelberg/accord|
 
36|#tag:html|
 
36|#tag:html|
Line 2,822: Line 2,822:
 
<h2 id="embedding">Embedding</h2>
 
<h2 id="embedding">Embedding</h2>
 
<p>The embedding script did not really support restoring, that was fixed. The first embedding on uniprot that reached the second epoch was trained. Finding the best hyperparameters, especially the embedding dimension, and the length of the k-mers the embedding is based on, still remains. Checkpoints are saved as picklefiles now, making them accessible for applications without tensorflow.</p>
 
<p>The embedding script did not really support restoring, that was fixed. The first embedding on uniprot that reached the second epoch was trained. Finding the best hyperparameters, especially the embedding dimension, and the length of the k-mers the embedding is based on, still remains. Checkpoints are saved as picklefiles now, making them accessible for applications without tensorflow.</p>
}}
+
}}|week_id-8}}
 
{{Heidelberg/accord|
 
{{Heidelberg/accord|
 
37|#tag:html|
 
37|#tag:html|
Line 2,907: Line 2,907:
 
<h2 id="embedding">Embedding</h2>
 
<h2 id="embedding">Embedding</h2>
 
<p>Training of the embedding on Uniprot continues. First approaches to optimize the current multilabel classifier by adding parameters to it, either in depth or in width started.<br>The concept of a generator that models natural evolution was designed. There, a starting sequence is exchanged by randomly chosen point mutations, which are then scored by the classifier with a sigmoid output for each label. Sequences that score higher are more likely to be chosen to continue working with. Those will then be changed by point mutations again. In this scenario the selection is modeled by a function that uses the classifier. However it remains unclear, if the classifier supports gradual improvements in a direction that improves the function in reality and in the classifier.</p>
 
<p>Training of the embedding on Uniprot continues. First approaches to optimize the current multilabel classifier by adding parameters to it, either in depth or in width started.<br>The concept of a generator that models natural evolution was designed. There, a starting sequence is exchanged by randomly chosen point mutations, which are then scored by the classifier with a sigmoid output for each label. Sequences that score higher are more likely to be chosen to continue working with. Those will then be changed by point mutations again. In this scenario the selection is modeled by a function that uses the classifier. However it remains unclear, if the classifier supports gradual improvements in a direction that improves the function in reality and in the classifier.</p>
}}
+
}}|week_id-9}}
 
{{Heidelberg/accord|
 
{{Heidelberg/accord|
 
38|#tag:html|
 
38|#tag:html|
Line 3,059: Line 3,059:
 
<h2 id="gaia">GAIA</h2>
 
<h2 id="gaia">GAIA</h2>
 
<p>The concept of a genetical artificial intelligent algorithm (GAIA)  that is based on the classifier was developed. Starting from a given sequence it maximizes given labels and minimizes other given labels. At the same time the garbage score of the sequence is minimized. The model starts with one sequence but always works with B sequences. Therefore the starting sequence is in silico mutated B times to yield B different sequences. Those are then classified by DeeProtein and the outputs are used to calculate a sequence score. A given number of best sequences is selected and again mutated in silico to yield B sequences. This cycle of mutation and selection repeats and optimizes the starting sequence.<br>Additionally it is possible to also use recombination after the mutation to combine good subsequences</p>
 
<p>The concept of a genetical artificial intelligent algorithm (GAIA)  that is based on the classifier was developed. Starting from a given sequence it maximizes given labels and minimizes other given labels. At the same time the garbage score of the sequence is minimized. The model starts with one sequence but always works with B sequences. Therefore the starting sequence is in silico mutated B times to yield B different sequences. Those are then classified by DeeProtein and the outputs are used to calculate a sequence score. A given number of best sequences is selected and again mutated in silico to yield B sequences. This cycle of mutation and selection repeats and optimizes the starting sequence.<br>Additionally it is possible to also use recombination after the mutation to combine good subsequences</p>
}}
+
}}|week_id-10}}
 
{{Heidelberg/accord|
 
{{Heidelberg/accord|
 
39|#tag:html|
 
39|#tag:html|
Line 3,081: Line 3,081:
 
<h2 id="embedding">Embedding</h2>
 
<h2 id="embedding">Embedding</h2>
 
<p>Embedding on Uniprot is calculated over 15 epochs, vectors that are close to each other in the high dimensional space seem to be similar on sequence level on first sight.<br>Genetic artificial intelligent algorithm is implemented. Features are mutation of a given maximum of residues, mutation of specific residues. Those are inserted in the model in small letters while the rest of the sequence is written in capital letters. Furthermore weighting arbitrary numbers of goal- and avoid- GO-terms are possible. The variance of the given classes, a measure for the authenticity of a protein, is substracted from the overall score. Recombination of sequences was not implemented as it corrupts maintaining less then the maximum allowed mutations in a sequence.</p>
 
<p>Embedding on Uniprot is calculated over 15 epochs, vectors that are close to each other in the high dimensional space seem to be similar on sequence level on first sight.<br>Genetic artificial intelligent algorithm is implemented. Features are mutation of a given maximum of residues, mutation of specific residues. Those are inserted in the model in small letters while the rest of the sequence is written in capital letters. Furthermore weighting arbitrary numbers of goal- and avoid- GO-terms are possible. The variance of the given classes, a measure for the authenticity of a protein, is substracted from the overall score. Recombination of sequences was not implemented as it corrupts maintaining less then the maximum allowed mutations in a sequence.</p>
}}
+
}}|week_id-11}}
 
{{Heidelberg/accord|
 
{{Heidelberg/accord|
 
40|#tag:html|
 
40|#tag:html|
Line 3,104: Line 3,104:
 
<h2 id="gaia">GAIA</h2>
 
<h2 id="gaia">GAIA</h2>
 
<p>GAIA was expanded by allowing multiple mutations before selection takes place, a linear decay of the number of mutations before a round of selection is applied. Results look slightly better compared to one mutation before selection. With this mode epistatic mutations can be found, that means mutations that are beneficial together but decrease the score of a protein when occuring alone.</p>
 
<p>GAIA was expanded by allowing multiple mutations before selection takes place, a linear decay of the number of mutations before a round of selection is applied. Results look slightly better compared to one mutation before selection. With this mode epistatic mutations can be found, that means mutations that are beneficial together but decrease the score of a protein when occuring alone.</p>
}}
+
}}|week_id-12}}
 
{{Heidelberg/accord|
 
{{Heidelberg/accord|
 
41|#tag:html|
 
41|#tag:html|
Line 3,279: Line 3,279:
 
<h2 id="gaia">GAIA</h2>
 
<h2 id="gaia">GAIA</h2>
 
<p>GAIA was used to generate mutations in a glucoronidase sequence, beneficial for galactosidase activity. A subset of the generated sequences was ordered as oligonucleotides for kinetic assays. In order to show that the score from the used classifier correlates with the real-world activity, a set of less active beta-lactamase sequences was predicted by gaia. Those were also bought as oligonucleotides and will be tested for their activity.<br>A randomization function was implemented to examine how the different scores develop when the number of mutations increases. The results look sigmoid, showing almost no change in the score for the first mutations, then rapidly dropping below the threshold for positive classification, after which they only decrease slowly.<br>A combination function to examine all possible combinations of a set of mutations was implemented in order to aid the decisions on which sequences to test in the lab.</p>
 
<p>GAIA was used to generate mutations in a glucoronidase sequence, beneficial for galactosidase activity. A subset of the generated sequences was ordered as oligonucleotides for kinetic assays. In order to show that the score from the used classifier correlates with the real-world activity, a set of less active beta-lactamase sequences was predicted by gaia. Those were also bought as oligonucleotides and will be tested for their activity.<br>A randomization function was implemented to examine how the different scores develop when the number of mutations increases. The results look sigmoid, showing almost no change in the score for the first mutations, then rapidly dropping below the threshold for positive classification, after which they only decrease slowly.<br>A combination function to examine all possible combinations of a set of mutations was implemented in order to aid the decisions on which sequences to test in the lab.</p>
}}
+
}}|week_id-13}}
 
{{Heidelberg/accord|
 
{{Heidelberg/accord|
 
42|#tag:html|
 
42|#tag:html|
Line 3,417: Line 3,417:
 
<h2 id="gaia">GAIA</h2>
 
<h2 id="gaia">GAIA</h2>
 
<p>For further insight in properties of a sequence, a function was implemented, that scores every possible single mutant and plots changes in the score. Potentially active site and other crucial sequence regions can be predicted by this.</p>
 
<p>For further insight in properties of a sequence, a function was implemented, that scores every possible single mutant and plots changes in the score. Potentially active site and other crucial sequence regions can be predicted by this.</p>
}}
+
}}|week_id-14}}
 
{{Heidelberg/accord|
 
{{Heidelberg/accord|
 
43|#tag:html|
 
43|#tag:html|
Line 3,619: Line 3,619:
 
<h2 id="gaia">GAIA</h2>
 
<h2 id="gaia">GAIA</h2>
 
<p>The usability of the script was improved by adding supports for passing parameters via flags. Plotting heatmaps of scores with the sequence on x-axis and the amino acids to which exchanges were scored on the y-axis was improved.</p>
 
<p>The usability of the script was improved by adding supports for passing parameters via flags. Plotting heatmaps of scores with the sequence on x-axis and the amino acids to which exchanges were scored on the y-axis was improved.</p>
}}
+
}}|week_id-15}}
 
}}
 
}}
 
}}
 
}}

Revision as of 04:03, 31 October 2017

Notebook
Weekly Reports
content
27
#tag:html
28
#tag:html
30
#tag:html
31
#tag:html
32
#tag:html
|week_id-4}}
33
#tag:html
|week_id-5}}
34
#tag:html
|week_id-6}}
35
#tag:html
|week_id-7}}
36
#tag:html
|week_id-8}}
37
#tag:html
|week_id-9}}
38
#tag:html
|week_id-10}}
39
#tag:html
|week_id-11}}
40
#tag:html
|week_id-12}}
41
#tag:html
|week_id-13}}
42
#tag:html
|week_id-14}}
43
#tag:html
|week_id-15}} }} }} }} }}