Difference between revisions of "Team:Heidelberg/InteractiveToolsTest"

Line 252: Line 252:
 
             $$ \Leftrightarrow N_{S} = \frac{ln(1-p_{(N_{M}>0)})}{ln\Big((1 - p_{m})^{L_{S} } \Big)}$$
 
             $$ \Leftrightarrow N_{S} = \frac{ln(1-p_{(N_{M}>0)})}{ln\Big((1 - p_{m})^{L_{S} } \Big)}$$
  
             Set \(\Phi\) to zero to use the number of generations for the calculation. If \(\Phi\) and the number of generations are given, \(\Phi\) is used.
+
             Set \(\Phi\) to zero to use the number of generations for the calculation. If \(\Phi_{L}\) and the number of generations are given, \(\Phi_{L}\) is used.
 
             Consider \(L_{Sequence}\) as the number of basepairs that is expected to be mutated. If half of the sequence you are interested in, is highly conserved choose a lower \(L_{Sequence}\).
 
             Consider \(L_{Sequence}\) as the number of basepairs that is expected to be mutated. If half of the sequence you are interested in, is highly conserved choose a lower \(L_{Sequence}\).
 
             {{Heidelberg/boxopen|Parameter Overview|
 
             {{Heidelberg/boxopen|Parameter Overview|

Revision as of 19:33, 26 October 2017

Modeling.

Interactive tools

Glucose Concentration

Mutagenesis plasmids are important to enable rapid mutation that makes continuous evolution possible in short time scales. The mutagenesis plasmids we used have a \(P_{BAD}\) promotor that is arabinose inducible but suppressed by glucoseRN159. So in order to have a high mutation rate, controlling the glucose concentration is important. This model uses ODEs to model both the glucose and the E. coli concentration, assuming both are independend of each other. This is plausible because the medium used in the experiments contained other carbon sources than glucose. The glucose consumption rate per E. coli is assumed to be independent of the glucose concentration. Calculate the ideal glucose concentration in the medium used for either a turbidostat when working with PACE or a single flask when working with PREDCEL. The glucose concentration in the turbidostat \(c_{G_{T} }\) is increased with the incoming medium with a flow rate of \(\Phi\) and a glucose concentration of \(c_{G_{M} }\). It is decreased by with the medium that leaves the turbidostat with the same flow rate, but a glucose concentration of \(c_{G_{T} }\). Additionally E. coli take up glucose with a concentration of \(c_{E}\) and a rate of \(q\). $$ \frac{\partial c_{G_{T} }(t)}{\partial t} = \Phi_{T} \cdot c_{G_{M} } - \Phi_{T} \cdot c_{G_{T} } - c_{E} \cdot q $$ In the case of a turbidostat we can assume a dynamic equilibrium: $$ \frac{\partial c_{G_{T} }(t)}{\partial t} = 0 $$ This results in $$ c_{G_{T} } = c_{G_{M} } - \frac{c_{E} \cdot q}{\Phi_{T} } $$ $$ \Leftrightarrow c_{G_{M} } (c_{G_{T} }) = c_{G_{T} } + \frac{c_{E} \cdot q}{\Phi_{T} } $$ When a lagoon with Volume \(V_{L}\) and a flowrate of \(\Phi_{L}\) is supplied by the turbidostat the glucose consumption in that lagoon can be modeled the same way. Because the E. coli titer, glucose concentration and flow rate into the lagoon are constant, a steady state equilibrium can be assumed: $$ c_{G_{L} } = c_{G_{T} } - \frac{c_{E. coli} \cdot q}{\Phi_{L} } $$ In the context of PACE mutagenesis plasmids are induced in the lagoons which stops growth of E. coli, hence the E. coli titer is assumed to be the same as in the turbidostat. $$ c_{G_{L} } = c_{G_{M} } - \frac{c_{E} \cdot q}{\Phi_{T} } - \frac{c_{E} \cdot q}{\Phi_{L} } $$ $$ \Leftrightarrow c_{G_{M} } (c_{G_{L} }) = c_{G_{L} } + \frac{c_{E} \cdot q}{\Phi_{T} } + \frac{c_{E} \cdot q}{\Phi_{L} } $$
If the concentration of glucose in a flask, \(c_{G_{F} }\) needs to be determined, the functional dependencies are as follows. As there is no incoming medium, or medium that leaves the flask, the concentration of glucose is only changed by E. coli degrading it. $$ \frac{\partial c_{G_{F} }(t)}{\partial t} = q \cdot \int_{t_{0} }^{t} c_{E}(t) \: dt $$ Exponential growth of the E. coli is assumed, resulting in $$c_{G_{F} }(t) = c_{G_{F} }(t_{0}) - q \cdot \int_{t_{0} }^{t} c_{E}(t) \: dt $$ $$ = c_{G_{F} }(t_{0}) -q \cdot \int_{t_{0} }^{t} c_{E}(t_{0}) \cdot exp\left(\frac{ln(2) \cdot t}{t_{E} }\right) dt $$ $$ = c_{G_{F} }(t_{0}) - q \cdot c_{E}(t_{0}) \cdot t_{E} \cdot \left(exp\left(\frac{ln(2) \cdot t}{t_{E} }\right) - exp\left(\frac{ln(2) \cdot t_{0} }{t_{E} }\right)\right) $$ So the glucose starting concentration \(c_{G_{F} }(t_{0})\) needed to get a concentration of \(c_{G_{f} }(t)\) afer a duration of \(t\) is calculated by $$ c_{G_{F} }(t_{0}) = c_{G_{F} }(t) + q \cdot c_{E}(t_{0}) \cdot t_{E} \cdot \left(exp\left(\frac{ln(2) \cdot t}{t_{E} }\right) - exp\left(\frac{ln(2) \cdot t_{0} }{t_{E} }\right)\right) $$ If logistic growth is assumed, the term for \(c_{E}(t)\) changes. Here \(c_{c}\) is the capacity, the maximum concentration of E. coli under the present conditions. $$ c_{G_{F} }(t) = c_{G_{F} }(t_{0}) - q \cdot \int_{t_{0} }^{t} c_{E}(t) \: dt $$ $$ = c_{G_{F} } (t_{0}) -q \int_{t_{0} }^{t} \frac{c_{E}(t_{0}) \: exp \big(ln(2) \cdot \frac{t}{t_{E} } \big)}{1+ \frac{c_{E}(t_{0})}{c_{c} } \: exp \big(ln(2) \cdot \frac{t}{t_{E} }\big)} \: dt $$ $$ = c_{G_{F} } (t_{0}) - q \: \frac{t_{E} \cdot c_{c} }{ln(2)} \cdot ln \Bigg( \frac{1 + \frac{c_{E}(t_{0})}{c_{c} } exp\big(ln(2) \: \frac{t}{t_{e} } \big)}{1 + \frac{c_{E}(t_{0})}{c_{c} } } \Bigg) $$ So the glucose starting concentration \(c_{G_{F} }(t_{0})\) needed to get a concentration of \(c_{G_{f} }(t)\) afer a duration of \(t\) is calculated by $$ c_{G_{F} } (t_{0}) = c_{G_{F} } (t) + q \: \frac{t_{E} \cdot c_{c} }{ln(2)} \cdot ln \Bigg( \frac{1 + \frac{c_{E}(t_{0})}{c_{c} } exp\big(ln(2) \: \frac{t}{t_{e} } \big)}{1 + \frac{c_{E}(t_{0})}{c_{c} } } \Bigg) $$
Further calculations for simplification of entering data: $$ c_{E. coli_{DW} } = c_{E. coli_{OD600} } \cdot 0.36 $$ according to Milo et al.Milo2009. $$ q = 0.183 \: g_{Glucose} \: g_{DW}^{-1} \: h^{-1} $$ according to Neubauer et al.Neubauer2001. Because turbidstats are operated at a constant cell density, the flow rate \(\Phi\) can be calculated from the generation time \(t_{E}\). $$ \Phi = \frac{ln(2)}{t_{E} } $$ If the E. coli titer in \(g_{DW}/l\) is zero, it is calculated from the OD, else the dryweight value is used. If the glucose concentration in \(mmol/l\) not zero, it is used for the calulation. If the generation time \(t_{E}\) is not zero, it is used to calculate the flow rate \(\Phi\).

Table 1: Variables and Parameters used for the calculation of the glucose and E. coli concentrations List of all paramters and variables used in the numeric solution of this model.

Symbol Value and Unit Explanation
\(c_{G_{T} }\) [g/ml] or [mmol/ml] Glucose concentration in Turbidostat
\(c_{G_{M} }\) [g/ml] or [mmol/ml] Glucose concentration in medium
\(c_{G_{L} }\) [g/ml] or [mmol/ml] Glucose concentration in lagoon
\(t\) [min] Time
\(\Phi_{T}\) [ml/min] Flow rate through Turbidostat
\(\Phi_{L}\) [ml/min] Flow rate through Lagoon
\(c_{E}\) [cfu/ml] or OD600 E. coli concentration
\(q\) \([g_{glucose} \: g_{DW}^{-1} h^{-1}]\) Glucose consumption by E. coliNeubauer2001
\(t_{E}\) [min] E. coli generation time
Get the ideal concentration
Changes in E. coli and glucose concentration over time

Number of mutations and mutated sequences

Directed evolution experiments are basically a search for a set of mutations. Consequently sequencing a few plaques from a PACE or PREDCEL experiment is regularly performed to monitor the current state of the experiments and to make sure mutagenesis plasmids work. To minimize the required time, consumed materials and costs, it is helpful to estimate the number of clones that are sequenced so that with a given probability at least one clone contains a mutation. To calculate this probability or the number of clones needed to reach it, we used a basic mutation model. The mutation rate is assumed to be the same for each sequence and each position of a sequence. Mutations that revert previous mutations are ignored since we ususally expected only a few mutations in hundreds of basepairs. Expected number of mutations in a single sequence \(p_{m}\): $$p_{m} = \frac{N_{M} }{L_{S} } = N_{g} \cdot r_{M} = \frac{t \cdot \Phi_{L} }{2} \cdot r_{M}$$ \(N_{M}\) is the number of mutations, while \(L_{S}\) is the length of the sequence that can mutate free of seleciton and is sequenced. \(N_{g}\) is the number of generations that happened before the sequencing. According to Esvelt et al.RN66 the number of generations translates into half the flow rate \(\Phi_{L}\) in volumes per hour. The basic assumption is a steady state of phage replication and dilution which is reasonable for PACE. For PREDCEL experiments arbitrary numbers of generations can be set. The expected share of sequences that shows at least one mutation in \(L_{S}\) basepairs, is the probability that \(L_{S}\) basepairs stay unchanged when \(p_{m}\) mutations per sequence length are expected: $$p_{M} = \frac{N_{M} }{N_{S} } = 1 - p_{(N_{M}=0)} = 1 - (1-p_{m})^{L_{S} } $$ With this equation it is possible to calculate the number of sequences \(N_{S}\) that have to be sequenced in order to find a mutated one with a probability of \(p_{(N_{M} > 0)}\). The number of sequences \(N_{S}\) that need to be sequenced is the relation of the probability to find at least one mutated sequence \(P_{(N_{M}>0)}\) to the probability of a single sequence to be mutated \(P_{M}\). $$ N_{S} = \frac{p_{M} }{p_{(N_{M} > 0)} } $$ The probability to find at least one mutated sequence under the given conditions can be calculated using the complementary probability. It is the probability to find exactly zero mutated sequences. $$p_{(N_{M}>0)} = 1 - (1-p_{M})^{N_{S} }$$ Using this equation leads to $$N_{S} = \frac{p_{M} }{1 - (1-p_{M})^{N_{S} } } $$ $$ \Leftrightarrow N_{S} = \frac{ln(1-p_{(N_{M}>0)})}{ln(1-p_{M})}$$ $$ \Leftrightarrow N_{S} = \frac{ln(1-p_{(N_{M}>0)})}{ln\Big((1 - p_{m})^{L_{S} } \Big)}$$ Set \(\Phi\) to zero to use the number of generations for the calculation. If \(\Phi_{L}\) and the number of generations are given, \(\Phi_{L}\) is used. Consider \(L_{Sequence}\) as the number of basepairs that is expected to be mutated. If half of the sequence you are interested in, is highly conserved choose a lower \(L_{Sequence}\).

Table 2: Additional Variables and Parameters used for the calculation of the number of mutated sequences List of all additional paramters and variables used in the numeric solution of this model. When possible values are given.

Symbol Value and Unit Explanation
\(t \) [h] Total time in lagoon
\(p_{m} \) [bp/bp] Expected number of mutations per sequence
\(p_{M} \) [bp/sequences] Expected number of mutations in all sequences
\(N_{M} \) [bp] Number of mutated basepairs
\(L_{S} \) [bp] Length of sequence that is considered
\(N_{g} \) [generations] Number of generations
\(r_{M} \) \([\frac{1}{bp \cdot generation}]\)
\(\Phi_{L} \) [Vol/h]
\(N_{S} \) [sequences] Number of sequences
\(p_{(N_{M} > 0)} \) Probability to find at least one mutated sequence in a pool of sequences
\(p_{(N_{M} = 0)} \) Probability to find no mutated sequences in a pool of sequences
Get your probabilities
\(p_{m} =\) % (bp/bp). \(N_{mutations} =\) bp per sequence. The share of sequences that shows at least one mutation in \(L_{Sequence}\) bp is \(p_{M}=\) % of sequences

Diff tool

Since a comfortable tool to mark differences in two aligned strings was not available online, we implemented it. Case sensitivity can be enabled, if needed, whitespaces and newlines are ignored, which makes handling FASTA files easy.
Insert strings to compare
Comparison:

References