DeeProtein.
Disentangling protein sequence space with artificial intelligence.
Modeling
With Interactive Modelling iGEM Heidelberg provides a comprehensive set
of tools that not only help to facilitate the implementation of PACE but
also give an intuitive understanding of underlying mechanisms. To control
highly complex processes such as PACE or PALE in a near-ideal way enables
to exploit as much of it's potential as possible. The most important
parameters were determined and examined with ODE systems, solved
analytically or numerically, [stochastic and
distributional] models. As far as possible the models are available
online to make them accessible to anyone interested. When useful, a [tool
for comparison of experimental data and the model] is available.
In addition the Interactive modelling helps to monitor parameters that
cannot be easily be interpreted from raw data, such as [] and combines
different parameters to make useful statements about an experiment.
Introduction
iGEM Heidelberg provides a comprehensive set of models that allows for both control and evaluation of continuous and discontinuous direction evolution. The interactive models facilitate regular use of the models in everyday lab work and are easier to understand as they provide an intuitive understanding by enabling the user to observe how the model behaves when parameters are changed. Predictions from the models helped to design the novel method Predcel to be both reliable and time efficient. To get accurate modelling results for the used setup, a selection of parameters was determined experimentally and included in the models. As models for different levels of abstraction were needed, a variety of approaches from ordinary differential equations, delayed differential equations over stochastic simulations to molecular dynamics was applied to obtain valuable information on the different aspects of directed evolution.Table 1: Variables and Parameters used in this model List of all paramters and variables used in this model. When possible values are given.
Symbol | Name in source code | Value and Unit | Explanation |
---|---|---|---|
\(c \) | - | [cfu] or [pfu] | colony forming units for E. coli [cfu] or plaque forming units [pfu] for M13 phage |
\( _u\) | - | - | Subscript for uninfected E. coli |
\( _i\) | - | - | Subscript for infected E. coli |
\( _p\) | - | - | Subscript for phage-producing E. coli |
\( _e\) | - | - | Subscript any the of E. coli populations on its own |
\( _E\) | - | - | Subscript for all populations of E. coli together |
\( _P\) | - | - | Subscript for M13 phage |
\(c_{c} \) | capacity |
[cfu/ml] | Maximum concentration of E. coli possible under given conditions, important for logistic growth |
\(t\) | t |
[min] | Duration since the experiment modeled was started |
\(t_{u} \) | tu |
\(20\) min | Duration one division of uninfected E. coli |
\(t_{i} \) | ti |
\(30\) min | Duration one division of infected E. coli |
\(t_{p} \) | tp |
\(40\) min | Duration one division of phage producing E. coli |
\( t_{P}\) | tpp |
[min] | Duration between an E. coli being infected by an M13 phage and releasing the first new phage |
\(g_{e} \) | e_growth_rate |
[cfu/min] | Growth rate of E. coli, depending on the type of growth (either logistic or exponential), the current concentration \(c_{e}\), the maximum concentration \(c_{c}\), and the division time \(t_{e}\) |
\( k\) | k |
\(3 \cdot 10^{-11}\frac{1}{cfu \cdot pfu \cdot ml \cdot min}\) | Affinity of M13 phage for E. coli |
\( \mu_{max}\) | mumax |
\(16.67 \frac{cfu}{min \cdot ml \cdot cfu}\) | Wildtype M13 phage production rate |
\( f\) | f |
? | Fitnessvalue, fraction of actual \(\mu\) and \(\mu_{max}\) |
Modelling concentrations over multiple Lagoons
When transfer from one volume to the next is performed, new lagoon can be modelled with starting values calculated from the last lagoons end values. For each concentration from the previous lagoon \(c_{t}\), the concentration in the next lagoon \(c_{t+1}\) is calculated as $$ c_{t+1} = \frac{v_{t} }{v_{l} } \cdot c_{t} $$ with \(v_{l}\), the volume of a lagoon and \(v_{t}\), the volume that is transferred. If the transfered volume is spinned down before it is added to the new lagoon, the initial value for \(c_{P}\) is calculated this way. The initial concentration of uninfected E. coli is set to the initial cell density. Initial concentrations of infected and phage-producing E. coli are set to zero, because before the transfer, no phages are present in the new lagoon. If the transfer volume is not spinned down, the concentration of infected and phage-producing E. coli are calculated, using the above formula. The initial concentration of uninfected E. coli is the calculated the same way, but the initial cell density is added. In directed evolution the fitness should increase over time. A linear increase in fitness between to given values was implemented to show this. The problem with this approach is its basic assumption being that all phage-producing E. coli are infected by phages with the same fitness. To make the model more plausible, a distribution of fitness was introduced. For a set of discrete fitness values each fitness values share of the phage-producing E. coli population is calculated. That changes the equation for the change in the concentration of phage-producing E. coli to $$ \frac{\partial c_{P} (t)}{\partial t} = -k \cdot c_{u}(t) \cdot c_{P} (t) + \sum_{i = 0}^N f_{i} \cdot s_{i} \cdot \mu \cdot c_{p} (t) $$ The calculation is for \(N\) different fitness values \(f_{i}\) and their share of the total phage-producing E. coli population \(s_{i}\).Numeric solutions
The problem described above is a system of four differential equations, of which two ( \(\frac{\partial c_{i} }{\partial t} \:, \: \frac{\partial c_{p} }{\partial t}\) ) are so called delayed differential equations. They contain a term that needs to be evaluated at a timepoint in the past \(t - t_{P}\). A custom script was used to solve the problem numerically, using the explicit Euler method.[Source!] The basic idea is that from a point in time with all values and all derivatives values given, the next point in time can be calculated by assuming a linear progress between the two points. $$ f(t_{n+1}) = f(t_{n}) + (t_{n+1} - t_{n}) \cdot f'(t_{n}) $$ This is performed for \(c_{u}(t)\), \(c_{i}(t)\), \(c_{p}(t)\) and \(c_{P}(t)\) rotatory, to always have the needed values from \(t_{n}\) ready for \(t_{n+1}\). To explore, how unprecise parameters and noise influence the outcome of the model, a mode was implemented, that adds gaussian noise to all parameters. It uses the function \(n\) that makes a value \(v\) noisy with a random parameter \(r\). $$ n(v) = \big(1 - 2r\big) \cdot \sigma_{G} \cdot \sigma_{v} \cdot v, \quad r \in (0, 1) $$ Here, \(\sigma_{G}\) is a factor that is the same for all \(v\), \(\sigma_{v}\) is specific for \(v\). This way, it is possible to have one parameter being noisier than another, while being able to tune the noise globally. [Results]Table 2: Additional Variables and Parameters used in the numeric solution of the model List of all additional paramters and variables used in the numeric solution of this model. When possible values are given.
Symbol | Name in Source code | Value and Unit | Explanation |
---|---|---|---|
\(v_{l}\) | vl |
[ml] | Volume of lagoon |
\(t_{l} \) | tl |
[min] | Duration until transfer to the next lagoon |
\(c_{u}(t_{0})\) | ceu0 |
[cfu] | Concentration of E. coli in a lagoon when M13 phages are transfered to it |
\(c_{P}(t_{0})\) | cp0 |
[pfu] | Initial concentration of M13 phage in the first lagoon |
\(n\) | epochs |
- | Number of epochs that are modelled, one epoch being everything that happens in one particular lagoon |
\(s\) | tsteps |
- | Number of time steps for which numeric solutions are calculated, counted per epoch |
\(c_{P}^{min}\) | min_cp |
[pfu] | Lower threshold for valid phage titers |
\(c_{P}^{max}\) | max_cp |
[pfu] | Upper threshold for valid phage titers |