Model - Overview

Why models are essential for our project?

1. Our Models help to explain the phenomenon in experiments

What does the small drop of florescence signal/OD600 at the beginning of bacteria growth means? What causes the actual response curve deviate from the original rough model? With the improvement of our mathematical model, we've found some potential answers.

2. Measuring working efficiency of Bioblocks

Do the bioblocks work efficiently enough to keep a robust signal processing? To determine the efficiency of signal processing, we simulated AHL production dynamics with experimental data and then made prediction with an expression model.To determine the efficiency of signal processing, we simulated AHL production dynamics with experimental data and then made prediction with an expression model.

3. Predicting best working parameters for Bioblocks

How much time is needed for a bioblock to finish its task and when would the system proceed to the next layer? Our model can give some suggestions before determining these parameters by experiment.

4. Modeling bacteria population inside a MagicBlock

Adding AHLs to bacteria culture could cause growth suppression. Will this phenomenon interfere with the function of bio-blocks? To give a precise kinetic model of bioblocks, we had to test whether and to what level the toxicity of AHLs or the pressure of expressing exogenous protein causes grow suppression in our model.

Introduction

In our models, we simplify the actual biology process into a basic model that only remains input molecule, promoter, transcription gene, mRNA, goal protein and output molecule for both dynamic perspective and responding ability. In developed model, we consider different conditions including the population growth, diffusion of signal and decay of signal molecules in cells. Finally, we completely construct the model of our bioblocks, which will instruct in using our system and guide our experiment. In addition, we also adapt new measurement methods to monitor fluorescence in bacteria in real time and do some basic researches on growth suppression by AHLs.

Aim

Our Models help to explain the observed phenomenon in experiments
Parameters fitting for Bioblocks
Measuring working efficiency of Bioblocks
Modeling bacteria population inside a MagicBlock

Basic and Developed kinetic Model

What does the small drop of florescence signal/OD600 at the beginning of bacteria growth means? What causes the actual response curve deviate from the original rough model? With the improvement of our mathematical model, we've found some potential answers.

Basic Model

$$ \begin{aligned} \frac{d([mRNA])}{dt}&=v_{generate}-\phi_{mRNA}[mRNA]\\ \frac{d([protein])}{dt}&=g_{protein}[mRNA]-\phi_{protein}[protein]\end{aligned} $$

In these equations, $v_{generate}$ refers to the efficiency of mRNA transcription. $\phi$ refers to the degradation rate of mRNA and protein.

The property of $v_{generate}$ depends on the promoter and the concentration of inducer molecule. If the promoter is constitutive promoter, $v_{generate}$ is a constant. Otherwise, it will have a sensitive response to different concentration of inducer molecule. This reponse can be expressed as following form:

$$ v_{generate}([x])=V_{max}·(\frac{(1-\epsilon)·x^n}{k^n+x^n}+\epsilon) $$

$k$ refers to the dissociation constant and $x$ refers to the concentration of inducer concentration. $\epsilon$ refers to the leakage of genetic expression. Following diagram shows the intuitive relation between input signal concentration and expression efficiency.

hill equation

In comparision, for NOR GATE, the repression of inducer molecule can be expressed as similar form:

$$ v_{generate}([x])=V_{max}·(\frac{1-\epsilon}{1^n+(\frac{x}{k})^n}+\epsilon) $$

For specific concerntration, $v_{generate}$ is a constant, otherwise it is a function of $[x]$

The generated protein is used to produce new signal molecule, which play a role as enzyme. Different from Michaelis-Menten equation, our protein (in other words, enzyme) will degradate while producing new siginal molecule, So this fact should be considered into our fundmental model.

Mathematical expression for producing new signal molecule:

$$ \frac{d[EAB]}{dt}=k_1[E][A][B]-(k_1+k_{-1})[EAB]\\ \frac{d[M_{signal}]}{dt}=k_2[EAB] $$

Developed Model

Growth of E.coli

In the developed model, we first take the growth of E.coli into consideration. The growth of E.coli can not only fluctuate the concentration of both reactants and products, but also an important variable in calculate final concentration of products. This model is based on this two fundamental relation:

$$ Total=Concentration·Volumel\\ Volume=N_{E.coli}·V_{E.coli}\\ \frac{d([protein]·Volume)}{dt}=g_{protein}[mRNA]·Volume-\phi_{protein}[protein]·Volume $$

Correspondingly, it is same to equation for mRNA expression:

$$ \frac{d([mRNA])·Volume}{dt}=v_{generate}·Volume-\phi_{mRNA}[mRNA]·Volume\\ $$

$N_{E.coli}$ is a function used to show the population of E.coli, $V_{E.coli}$ refers to the volume of every E.coli, as a constant. So we can divide out the constant $V_{E.coli}$ on both sides of every equations, and take derivative formula:

$$ \frac{d[protein]}{dt}·N_{E.coli}+\frac{dN_{E.coli}}{dt}·[protein]=g_{protein}[mRNA]·N_{E.coli}-\phi_{protein}[protein]·N_{E.coli} $$

Simplify this equation into following form:

$$ \frac{d[protein]}{dt}=g_{protein}[mRNA]-(\phi_{protein}+\frac{N_{E.coli}'}{N_{E.coli}})[protein]\\ N_{E.coli}'=\frac{dN_{E.coli}}{dt} $$

$N_{E.coli}$ is satisfied to following equation:

$$ N_{E.coli}=\frac{N_{\max}}{1+(\frac{N_{\max}}{N_{t=0}}-1)·e^{-rt}} $$

$r$ refers to growth rate of E.coli and $N_{\max}$ refers to the limits of E.coli population. Since $N_{\max}$ and $N_{t=0}$are constants, so we define following parameter:

$$ \frac{N_{\max}}{N_{t=0}}-1=N_{c} $$

From our experiments, we find there are another two possible factors affecting the production of our system. First one is diffusion of signal molecule at initial time, the other one is the decay of signal molecule with the time going.

Diffusion of signal molecule at initial time

The concentration of signal is always considered to diffuse into E.coli very rapidly. But from our data, we find that the initial part of our dynamic curve is not fitting to our basic model. Our basic model indicates that the rate of generating will decrease with the time flying, but the experiment shows that the velocity will have a short rise at initial time and then decrease as the way predicted by basic model. Therefore, we take process of diffusion into consideration. Because at very beginning, the concentration of signal in E.coli is very low, and then it will rise by diffusion, so the efficiency of production will rise according to time in a short time period.

We suppose the initial concentration difference between inside of E.coli and outside is $\Delta c(0)$, also we know the time for E.coli to balence this difference:

$$ c(t)= C_{saturated} -\Delta c(0)·e^{-\frac{t}{\tau}} $$

So the generating efficency comes to:

$$ v_{generate} = \frac{V_{\max}}{1+(\frac{k}{ C_{saturated} -\Delta c(0)·e^{-\frac{t}{\tau}}})^n} $$

And we will use this formula to simulate initial state.

扩散浓度对时间的响应

The demo is shown above which is a Log linear plot. X-axis refers to the time, Y-axis refers to the generating efficiency. We can easily figure out the concentration will rapidly get to steady state and remains to a constant. Therefore, it will only affect the inital transcription efficiency.

Following diagram shows the modified dynamic curve

simulation

We can see the initial slope of the curve is rasing to a point and then decrease gradually which is highly fixed to the experiment result we get.

Decay of signal molecule

In basic model, we consider the decay of signal can be neglected because we found there's no significant difference between concentration in vitro. But actually when we meature the rough concentration in the LB with E.coli, we found that the concentration has a linear deacrease through time, which we should take consideration into our model.

The decay can be shown as following equation:

$$ [S]_t=[S]_{initial}-k_{decay}t $$

And the $v_{generate}$ becomes to:

$$ v_{generate}= V_{\max}\frac{([S]_t)^n}{k^n+([S]_t)^n} $$

To illustrate the change taken by the decompose of signal molecule, we can see following simulation curves:

1-10

X-axis refers to time. We find the efficiency will not be disturbed greatly at initial time, and will have a rapid decrease when the concentration equals to the half of origin. This property shows that we should control the reaction time otherwise the production will decay without production with the time going by. So the main purpose of this model is to predict when we dilute the input signal solution to obtain the maximum of protein to convert out signal.

We use matlab to obtain a rough curve of protein expression. X-axis refers to time.

decay

This is a important result because it indicates that the production will not always increase with the time going. Actually, there exists a so-called "best time" to process next step in our system. For example, this peak can determine when we dilute input signal to get output signal as much as possible.

合成酶

Red stars refers to "best time" according to different input concentration from upstream block.

*matlab code:

 
xxxxxxxxxx
n = [];
fn = [];
for i=1:T/dt
n = [n i];
t = exp(-a*i*dt);
sum=0;
for j=0:i
sum = sum + (Vm-(j*dt)^n)*exp(a*dt*j)*dt/(k^n+(Vm-(dt*j)^n));
end
y = t*sum+\phi* (Vm - dt*i)^(n-1)/(k^n + (Vm - dt*i)^n)^2;
fn = [fn y];
end
plot(n,fn);
max(fn);

This matlab code shows how we draw the curves and how to find maximum.

Parameter Fitting and Simulation

Hill equation

To get the parameter of Hill equation through our data, we tranfer Hill equation to following form:

$$ Hill\quad equation:y=V_{max}\times\frac{x^n}{k^n+x^n} $$ $$ New\quad form:\log{\frac{\frac{y}{V_{max}}}{1-\frac{y}{V_{max}}}}=n\log{x}-n\log{k} $$

In this form, we can get easily get a linear relation between our input concerntration and output GFP. The question is how to find out $V_{max}$ in this equation because this value determine the reprocessed data of output. Another question is, due to the large scale of our data, to ease the workload of proceesing such data. To meet the needs of these two question, first we let each output data substract the minimum among all output data, and define the ratio between each processed output data and the maximum of all output data as the standard output. (NOTICE: The minimum data of this output data set can be the control.)As following shows:

$$ {output}={y_1,y_2,···,y_n} $$ $$ SY_{output}=\{y_1',y_2',···,y_n'\}\quad which\quad y_i=\frac{y_i-\min{Y_{output}}}{\max{Y_{output}}-\min{Y_{output}}} $$

The elements in $SY_{output}$ fit following equation:

$$ \log{\frac{{y_i'}\frac{\max{Youtput}-\min{Y_{output}}}{V_{max}}}{1-{y_i'}\frac{\max{Youtput}-\min{Y_{output}}}{V_{max}}}}=n\log{x_i}-n\log{k} $$

We define the value of $\frac{V_{max}}{\max{Youtput}-\min{Y_{output}}}$ as a parameter $PV_{max}$. So the equation we actually simulate is following one:

$$ \log{\frac{y_i'}{PV_{max}-y_i'}}=n\log{x_i}-n\log{k} $$

We use Mathematica as fitting tools, the following code is shown:

 
outputdata = {output1, output2, output3, output4, output5,output6};
Processeddata = (outputdata - Min[outputdata])/(Max[outputdata] - 
      Min[outputdata]) // N;
data' = {{Log10[10^(-9)], Processeddata[[1]]}, {Log10[10^(-8)], 
    Processeddata[[2]]}, {Log10[10^(-7)], 
    Processeddata[[3]]}, {Log10[10^(-6)], 
    Processeddata[[4]]}, {Log10[10^(-5)], Processeddata[[5]]},{Log10[10^(-4)], Processeddata[[6]]}};
data = {{data'[[1, 1]], data'[[1, 2]]}, {data'[[2, 1]], 
    data'[[2, 2]]}, {data'[[3, 1]], data'[[3, 2]]}, {data'[[4, 1]], 
    data'[[4, 2]]}, {data'[[5, 1]], data'[[5, 2]]}, {data'[[6, 1]], data'[[6, 2]]}};
solu = Flatten[
   Solve[Log10[(y*PVmax)/(1 - (y*PVmax))] == n*x - n*logk, y]];
fitparameter = (FindFit[data, y /. solu, {PVmax, logk, n}, x])
fit = y /. solu /. fitparameter;
Show[ListPlot[data, PlotStyle -> Red], Plot[fit, {x, -10, 0}]]

Example and its output is shown (NOTICE: This example is the fitting curve of the Tra with its limited five data. Actually most of our data, except for tra, has six inputs and outputs, so the original code, which is shown above, has six outputs. When we use this code, we can just import outputs into "outputdata" list and run this programm. ):

 
xxxxxxxxxx
outputdata = {16141, 6812, 32977, 362525, 959405};
Processeddata = (outputdata - Min[outputdata])/(Max[outputdata] - 
      Min[outputdata]) // N;
data' = {{Log10[10^(-9)], Processeddata[[1]]}, {Log10[10^(-8)], 
    Processeddata[[2]]}, {Log10[10^(-7)], 
    Processeddata[[3]]}, {Log10[10^(-6)], 
    Processeddata[[4]]}, {Log10[10^(-5)], Processeddata[[5]]}};
data = {{data'[[1, 1]], data'[[1, 2]]}, {data'[[2, 1]], 
    data'[[2, 2]]}, {data'[[3, 1]], data'[[3, 2]]}, {data'[[4, 1]], 
    data'[[4, 2]]}, {data'[[5, 1]], data'[[5, 2]]}};
solu = Flatten[
   Solve[Log10[(y*PVmax)/(1 - (y*PVmax))] == n*x - n*logk, y]];
fitparameter = (FindFit[data, y /. solu, {PVmax, logk, n}, x])
fit = y /. solu /. fitparameter;
Show[ListPlot[data, PlotStyle -> Red], Plot[fit, {x, -10, 0}]]

Then we can get the meaningful parameter from these data quickly and easily.

Measurement of Signal Converter

How we can measure the working efficiency of our signal converter is an important question for us. As we all know, the reason why we use GFP to reflect the efficiency of promoter is that we can measure fluoresence easily and establish the quantity relationship between GFP expression and input signal concentration. But when it comes to some other products such as small molecule, they are hard to measure exactly. We use LC-MS to indicate the production of our signal converter approximately, but this data is too rough to instruct our following work. So we will use our model to obtain the parameter of converter indirectly by following experiments and deduction from model.

We symbol $S_,S_2$ as the concentrations of two signal molecules, signal one and signal two, $GFP$ as the result of fluroesence intensity.

We propose two experiments. First one is using signal two to induce the expression of GFP. We take its results as standard curve. The other experiment is using signal one to obtain signal two, and we use signal two to induce the expression of gene. Also we will have following data:

$$ S_1=\{c_1,c_2,···,c_n\}\\ GFP=\{F_1,F_2,···,F_n\} $$

From our model we know the relationship among $S_1,S_2$ and $GFP$ at steady state as following:

$$ GFP=V_{max}·(\frac{(1-\epsilon_1)·{S_2}^n}{k_1^n+{S_2}^n}+\epsilon_1)\\ S_2=V_{max}·(\frac{(1-\epsilon_2)·{S_1}^m}{k_2^m+{S_1}^m}+\epsilon_2) $$

From the parameter fitting model, we can determine all parameters in $GFP-S_2$ curve. Therefore, we can use this curve and data of GFP from second experiment to obtain the input signal two concentration.

$$ F_i=V_{max}·(\frac{(1-\epsilon_1)·{[S_2]_i}^n}{k_1^n+{[S_2]_i}^n}+\epsilon_1)\\ F'_i=\frac{F_i-\epsilon_1}{1-\epsilon_1} \\\Longleftrightarrow\log{[S_2]_i}=\frac{\log{\frac{F_i'}{V_1-F_i'}}}{n}+\log{k_1} $$

So we have the data $[S_2]_i$ related to input concentration of signal one, so we can get the relation through using parameter-fitting model would get the parameter of $S_1-S_2$ curve finally.

Final test of simulation 2

x-axis refers to Log of input signal molecular concentration; y-axis refers to the relative GFP expression.

measure

x-axis refers to Log of signal one molecular concentration; y-axis refers to signal two molecular concentration. This curve indicates the effciency of signal converter, which low concentrations of input signal generate less output signal and high concentrations of input signal generate high output signal concentrations of input signal. And there exists a significant drop between low expression and high expression. It is absolutely what we want!

Model on E.coli Growth

This part will discuss an interesting model on how the signal molecule affect the growth and population. The reason why we care about this question is that we measured the OD600 under different circumstance and found some special relation between the concentration and the population. In breif, with the rise of concentration, the population will decrease. We wonder the mechanism and propse two hypothesis:

The signal molecule is toxic to E.coli, so the population will decrease related to the increase of concentration linearly.
The signal molecule induce the synthesis of GFP which occupy the substance that is originally used for growth. It indicates that if the GFP is produced, then the population will be at low level, otherwise the population will be at normal level.

In our model, we indicates the second hypothesis is more realistic.

Toxic model (Y-axis refers to relative population (OD) and X-axis refers to time(hour)) :

Consumption model (Y-axis refers to relative population (OD) and X-axis refers to time(hour)) :

 
x
Manipulate[Plot[0.4 + (0.5 a^2 (1 - E^-x))/(0.01^2 + a^2), {x, 0, 10},PlotRange -> {0, 1}], {a, 0, 0.11, Appearance -> "Labeled"}]

The experiment shows an obvious difference between low concentration and high concentration which fitts to the hypothesis two.

But we also cannot eliminate the hypthesis one, because the curves of low concentration go to steady state but the high concentration go slightly down. If we use hypothesis two to explain this phenomemon, that is: The production of GFP highly occupy the resource and leave little resource for the growth of E.coli even cannot mantain the population at the steady state. If we use hypothesis one, then the result is obvious that signal molecule is toxic to E.coli which causes unavoidable death of E.coli. So further study is required.

About full article

This page (https://2017.igem.org/Team:Shanghaitech/Model/full) is for people who want to learn more details about our model from a mathematical perspective. We believe both intuitive illustration and strict deduction are required for model work, so entire mathematical calculation is given on "full article" page.