Line 2: | Line 2: | ||
{{Heidelberg/header}} | {{Heidelberg/header}} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Weekly summary 14.-20.08.2017 CG | Weekly summary 14.-20.08.2017 CG | ||
==== | ==== | ||
− | |||
− | |||
− | |||
− | |||
Phage propagation of the **unevolved** Dickinson phage | Phage propagation of the **unevolved** Dickinson phage | ||
Line 101: | Line 91: | ||
}} | }} | ||
{{Heidelberg/boxopen| | {{Heidelberg/boxopen| | ||
− | Week 35 | + | Week 35| |
{{#tag:html| | {{#tag:html| | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
KW35 | KW35 | ||
Line 292: | Line 182: | ||
Experimentation with elements of a Predcel model based on distributions instead of scalars were started. The idea is that a population of phages does not have one fitness between 0 and 1 but rather has individuals that have different fitness values. In this more complex model the concentrations have to be calculated for each phage fitness value, depending on the amount of phages that have that fitness value. The fitness distribution is changed by mutation and by selection. The first naive approach for mutation was programmed. It simply substracts a given percentage of the difference between the amount of a fitness value and the mean amount from the amount of a fitness value. Obviously this is oversimplified and will therefore be replaced by a model based on the idea that every sequence that mutates gets better or worse with normally distributed changes. | Experimentation with elements of a Predcel model based on distributions instead of scalars were started. The idea is that a population of phages does not have one fitness between 0 and 1 but rather has individuals that have different fitness values. In this more complex model the concentrations have to be calculated for each phage fitness value, depending on the amount of phages that have that fitness value. The fitness distribution is changed by mutation and by selection. The first naive approach for mutation was programmed. It simply substracts a given percentage of the difference between the amount of a fitness value and the mean amount from the amount of a fitness value. Obviously this is oversimplified and will therefore be replaced by a model based on the idea that every sequence that mutates gets better or worse with normally distributed changes. | ||
+ | }} | ||
+ | }} | ||
+ | {{Heidelberg/boxopen| | ||
+ | Week Icon | ||
+ | | | ||
+ | {{#tag:html| | ||
+ | <h2>Optopace</h2> | ||
+ | |||
+ | |||
+ | No entry for this subproject this week.<h2>Software</h2> | ||
+ | |||
+ | |||
+ | No entry for this subproject this week.<h2>Modeling</h2> | ||
+ | |||
+ | |||
+ | No entry for this subproject this week. | ||
}} | }} | ||
}} | }} |
Revision as of 20:39, 20 October 2017
Weekly summary 14.-20.08.2017 CG
Contents
==
Phage propagation of the **unevolved** Dickinson phage
=
Phage supernatant of the unevolved Dickinson phage: target_133_N-term_T7-C was received from Dickinson group. To propagate the phages, 4 ml *E. coli* culture (Stock ID: 47) was cultivated to an OD600 of 0.6 (in LB media + 25 mM Glucose + Amp) and infected with 4 µl of the phage supernatant. Culture was shaked at 37 °C overnight. On the next morning culture was centrifuged at 6,000 g for 5 min and supernatant, which contains the phages, was stored at 4 °C.
A Blue Plaque Assay was performed to determine the phage titer of the supernatant. 143 plaques were counted at the 10-10 dilution, which leads to a phage titer of 1.43*1015 PFU/ml.
A plaque of this plate was picked to infect a 4 ml *E. coli* culture (Stock ID: 47) with an OD600 of 0.4. This culture was cultivated for two hours shaking at 37 °C and was subsequently transferred to 100 ml fresh 2xYT medium. After 1 hour carbenicillin (1000x) was add.
On the next day, culture was centrifuged at 3640 g for 20 min. Supernatant was stored at 4 °C.
A Blue Plaque Assay was performed to determine the phage titer of the supernatant. The monoclonal Dickinson phage target_133_N-term_T7-C exhibited a phage titer of 1.85*109 PFU/ml.
Software
KW34
=
Word2Vec Embeddings on Proteinsequences
We rewrote a word2vec implementation from tensorflows tutorials that implements Efficient Estimation of Word Representations in Vector Space, ICLR 2013 (Mikolov, et. al.). The model is a skipgram model with negative sample that uses custom ops written in C. The code was adapted to our needs, mainly by changing datatypes in the C kernels and writing a different evaluation function based on predicting the nearest words to the most frequent words instead of using analogies. Two new datasets were generated based on both swissprot and uniprot. Training of 4mer embeddings in 50, 100 and 200 dimensions were started but have not been calculated yet. Visualisation of the first checkpoints is possible via tensorboard [Visualisation of an example embedding via tensorboard](170820ai-vistestemb).
IMPLEMENTATION OF SQUEEZENET Architecture
With implamentation of a new architecture based on Sequeeze-net (Forrest N. Iandola, 2017), relying on 1x1 convolutions we were able to grasp the 299 as well as the 637 classes dataset. The new model architecture looks the following:
- InputLayer model_valid/input_layer_valid: (64, 20, 1000, 1) - PadLayer model_valid/block1/pad_layer_valid: paddings:[[0, 0], [0, 0], [3, 3], [0, 0]] mode:CONSTANT - Conv2dLayer model_valid/block1/cnn_layer_valid: shape:[20, 7, 1, 128] strides:[1, 5, 1, 1] pad:VALID act:prelu - Conv1dLayer model_valid/block2/cnn_layer_valid: shape:[6, 128, 128] stride:1 pad:SAME act:prelu - Conv1dLayer model_valid/1x1_I/1x1_valid: shape:[1, 128, 64] stride:1 pad:SAME act:prelu - BatchNormLayer model_valid/1x1_I/batchnorm_layer_valid: decay:0.900000 epsilon:0.000010 act:identity is_train:False - Conv1dLayer model_valid/block3/cnn_layer_valid: shape:[5, 64, 256] stride:1 pad:SAME act:prelu - PoolLayer model_valid/block3/pool_layer_valid: ksize:[2] strides:[2] padding:VALID pool:pool - BatchNormLayer model_valid/block3/batchnorm_layer_valid: decay:0.900000 epsilon:0.000010 act:identity is_train:False - Conv1dLayer model_valid/block4/cnn_layer_valid: shape:[5, 256, 256] stride:1 pad:SAME act:prelu - PoolLayer model_valid/block4/pool_layer_valid: ksize:[2] strides:[2] padding:VALID pool:pool - BatchNormLayer model_valid/block4/batchnorm_layer_valid: decay:0.900000 epsilon:0.000010 act:identity is_train:False - Conv1dLayer model_valid/1x1_II/1x1_valid: shape:[1, 256, 128] stride:1 pad:SAME act:prelu - BatchNormLayer model_valid/1x1_II/batchnorm_layer_valid: decay:0.900000 epsilon:0.000010 act:identity is_train:False - Conv1dLayer model_valid/block5/cnn_layer_valid: shape:[5, 128, 256] stride:1 pad:SAME act:prelu - PoolLayer model_valid/block5/pool_layer_valid: ksize:[2] strides:[2] padding:VALID pool:pool - BatchNormLayer model_valid/block5/batchnorm_layer_valid: decay:0.900000 epsilon:0.000010 act:identity is_train:False - Conv1dLayer model_valid/block6/cnn_layer_valid: shape:[5, 256, 512] stride:1 pad:SAME act:prelu - PoolLayer model_valid/block6/pool_layer_valid: ksize:[2] strides:[2] padding:VALID pool:pool - BatchNormLayer model_valid/block6/batchnorm_layer_valid: decay:0.900000 epsilon:0.000010 act:identity is_train:False - Conv1dLayer model_valid/1x1_III/1x1_valid: shape:[1, 512, 256] stride:1 pad:SAME act:prelu - BatchNormLayer model_valid/1x1_III/batchnorm_layer_valid: decay:0.900000 epsilon:0.000010 act:identity is_train:False - Conv1dLayer model_valid/block7/cnn_layer_valid: shape:[5, 256, 516] stride:1 pad:SAME act:prelu - PoolLayer model_valid/block7/pool_layer_valid: ksize:[2] strides:[2] padding:VALID pool:pool - BatchNormLayer model_valid/block7/batchnorm_layer_valid: decay:0.900000 epsilon:0.000010 act:identity is_train:False - Conv1dLayer model_valid/block8/cnn_layer_valid: shape:[5, 516, 1024] stride:1 pad:SAME act:prelu - PoolLayer model_valid/block8/pool_layer_valid: ksize:[2] strides:[2] padding:VALID pool:pool - BatchNormLayer model_valid/block8/batchnorm_layer_valid: decay:0.900000 epsilon:0.000010 act:identity is_train:False - Conv1dLayer model_valid/1x1_IV/cnn_layer_valid: shape:[1, 1024, 512] stride:1 pad:SAME act:prelu - BatchNormLayer model_valid/1x1_IV/batchnorm_layer_valid: decay:0.900000 epsilon:0.000010 act:identity is_train:False - Conv1dLayer model_valid/block9/cnn_layer_valid: shape:[5, 512, 1024] stride:1 pad:SAME act:prelu - PoolLayer model_valid/block9/pool_layer_valid: ksize:[2] strides:[2] padding:VALID pool:pool - BatchNormLayer model_valid/block9/batchnorm_layer_valid: decay:0.900000 epsilon:0.000010 act:identity is_train:False - Conv1dLayer model_valid/outlayer/cnn_layer_valid: shape:[1, 1024, 637] stride:1 pad:SAME act:prelu - BatchNormLayer model_valid/outlayer/batchnorm_layer_valid: decay:0.900000 epsilon:0.000010 act:identity is_train:False - MeanPool1d global_avg_pool: filter_size:[7] strides:1 padding:valid
The architecture is fully convolutional, ending in an average pooling layer as outlayer, with the channels dimension corresponding to the number of classes. All inputs were 1-hot encoded and zero padded to a boxsize of 1000 positions.
| Model | lr | classes | Comment | restored | maxstep | boxsize | ACC | |-------|------|---------|---------|----------|---------|---------|--------------| | | 0.01 | 299 | | NO | 220000 | 1000 | 0.8 (valid) | | | 0.01 | 637 | | NO | 180000 | 1000 | 0.55 (valid) | | | 0.01 | 637 | | YES | 35000 | 1000 | 0.75 (valid) |
References:
1. Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360. 2. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119).
Modeling
A first numeric model of PredCel works without oscillations. Graphs look reasonable so far. Modelling was performed on three levels: On the lowest level one Step of Predcel monitoring Phage concentration as well as uninfected, infected and phage producing E. coli concentration [Graph of level 1 Predcel model](170820mod-lvl1.png). One level above all the concentrations were tracked over 100 iterations of Predcel [Graph of level 2 Predcel model](170820mod-lvl2.png). And on the third level different sets of values for starting fitness, starting phage concentration and starting E. coli concentration were tested. In this case we only monitored how long the phage titer at the end of each iteration of Predcel stayed above 1 pfu/mL and below 1e8 pfu/mL.[Graph of level 3 Predcel model](170820mod-lvl3.png)
At least the two higher levels probably only work in python, but maybe an interactive version of what happens during one iteration is possible. A final more comfortable version of the script was started.
}} }}
Optopace
No entry for this subproject this week.Software
No entry for this subproject this week.Modeling
No entry for this subproject this week.