Team:Tsinghua/Model

Yeasy AFT project page

Chemical kinetics and Logistic Growth Model

Chemical kinetics model

The goal of our modeling section is to work out how the concentration of glucose changes over time, given a certain aflatoxin concentration. The outline of the model is to first calculate the enrichment of HXT (the glucose transporter) on the surface of each yeast cell as a function of time and aflatoxin concentration (In other words, to obtain [HXT] = HXT(t,[AFT]), note that [HXT] ), and then evaluate how the proliferation of yeast cells and the synthesis of HXT can determine glucose consumption.

Kinetics of Transcription Signal

To begin with, we need to have a look at the figure below (1). The activation domain can only drive the expression of our reporter gene when both ScFv1 and ScFv2 bind to AFT, bringing AD and BD in close proximity. For the simplicity of the model, we assume the binding of the two ScFv to AFT are independent of each other, hence the total concentration of the ternary complex in figure (1) is proportional to

  • the percentage of ScFv1 bound to AFT

  • the percentage of ScFv2 bound to AFT

Figure 1

Figure 1. Initiation of HXT Transcription

The two reactions are quite alike, like below: $$ ScFv_i + AFT \rightleftharpoons ScFv_i\text{-}AFT $$ $$ \text{ for i = 1, 2 } \tag{2}$$ . The association constant is K1 and then we have: $$ K_i = \frac{[ScFv_i \text{-} AFT]}{[ScFv_i] [AFT]} \tag{3} $$ We also assume that the expression level of ScFv, denoted by ScFv(expressed), is unaffected by [AFT] unless AFT causes mutations that interfere with the transcription process, which is not very likely. AFT comes from the environment, which is vast compared to the yeast cell, so [AFT] barely changes when reaction (2) proceeds. However, there is only a limited amount of ScFv, so the concentration of unbound ScFv is its initial expression level subtracted by the portion bound to AFT, so now we have $$K_1 = \frac{[ScFv_1 \text{-} AFT]}{(ScFv_1^{expressed} - [ScFv1 \text{-} AFT]) [AFT]} \tag{4} $$ So the proportion of ScFv1 bound to AFT can be derived as follows $$ \frac{[ScFv_1 \text{-} AFT]}{ScFv_1^{expressed}} = \frac{K_1[AFT]}{K_1[AFT] + 1} \tag{5} $$ And similarly we have $$ \frac{[ScFv_2 \text{-} AFT]}{ScFv_2^{expressed}} = \frac{K_2[AFT]}{K_2[AFT] + 1} \tag{6} $$ The transcription initiation rate of HXT reporter gene, denoted by T(AFT), is directly proportional to the concentration of the ternary complex, so it follows that $$ Transcription\text{ } Initiation([AFT]) = k_1 \times \frac{K_1[AFT]}{K_1[AFT] + 1} \times \frac{K_2[AFT]}{K_2[AFT] + 1} \tag{7} $$

Derivatives deduction

To estimate the synthesis rate of HXT protein over time, which is denoted by [HXT], we have to first work out the synthesis of HXT mRNA, denoted by [R]. This part of the model is illustrated in the figure below. k2 to k5 are rate constants.

Figure 2

Figure 2 Transcription model of HXT



The time derivative of [R] is given by $$ \frac{d[RNA]}{dt} = k_2 \times \frac{K_1[AFT]}{K_1[AFT] + 1} \times \frac{K_2[AFT]}{K_2[AFT] } - k_3[RNA]\tag{9}$$ Since the first term on the right is independent of t, when we are solving [R] as a function of t, we can denote this term to be T_1. So $$ \frac{d[RNA]}{dt} = k_2 \times T_1- k_3[RNA] $$ $$ \text{ where } T_1= \frac{K_1[AFT]}{K_1[AFT] + 1} \times \frac{K_2[AFT]}{K_2[AFT] + 1} \tag{10}$$ The two variables can be separated $$ \frac{1}{T_1- k_3[RNA]} = \frac{K_1[AFT]}{K_1[AFT] + 1} \times \frac{K_2[AFT]}{K_2[AFT] + 1} \tag{11}$$ Integrating both sides of the equation, we have $$ -\frac{ln(T_1-k_3[RNA])}{k_3} = t + c_1 \tag{12}$$ We need some boundary conditions to determine the value of c1. If we assume the transcription of the reporter gene is not leaky, the initial amount of HXT mRNA before AFT induction would be approximately zero. Hence with restriction: $$ \text{restriction - 1: when } t = 0, [RNA] = 0 \tag{13}$$ Thus $$ c_1 = -\frac{lnT_1}{k_3} \tag{14}$$ Substituting c1 with the result above, we have $$ \therefore [RNA] = \frac{T_1- e^{k_3(t + c_1)}}{k_3} = \frac{T_1(1 - e^{k_3t})}{k_3} \tag{15}$$ We have successfully worked out [R] as a function of time, and we can finally turn to focus on. Taking the translation and degradation of HXT into consideration, the time derivative of [HXT] can be written in the following form $$ \frac{d[HXT]}{dt} = k_4[RNA] - k_5[HXT] \tag{16}$$ $$ = k_4\frac{T_1(1 - e^{-k_3t})}{k_3} - k_5[HXT] \tag{17}$$ This is a non-homogeneous first-order linear ordinary differential equation, which can be written in the standard form $$ \frac{d[HXT]}{dt} + k_5[HXT] = k_4[RNA] \tag{18}$$

Solution of [HXT]

The general solution is $$ [HXT] = e^{-\int k_5dt}[c_2 + \int\frac{k_4}{k_3}T_1(1 - e^{-k_3t})e^{-\int k_5tdt}dt] $$ $$ = e^{-k_5t + c_3}[c_2 + \int\frac{k_4}{k_3}T_1(1 - e^{-k_3t})e^{-k_5t + c_4}dt] $$ $$ = c_5e^{-k_5t} + c_6 + c_7e^{-k_3t} \tag{19}$$ Cs are constants to be determined. Since we have three constants, three boundary conditions are needed. First, taking the time derivatives on both sides of (19), we have $$ \frac{d[HXT]}{dt} = -k_5c_5e^{-k_5t} + k_3c_7e^{-k_5t} \tag{20}$$ According to (20), It can be extrapolated that when t approaches infinity the time derivative of [HXT] approaches zero $$ \text{restriction - 2: } \lim_{t\to\infty}\frac{d[HXT]}{dt} = 0 \tag{21}$$ In addition, with equation (16) $$ \therefore [RNA] = \frac{T_1}{k_3}(1 - e^{-k_3t}) $$ $$ = \frac{T_1(1 - e^{k_3t})}{k_3} \tag{22} $$ We can also obtain the limit of [R] when t approaches infinity. $$ \text{restriction - 3: } \lim_{t\to\infty}[RNA] = \frac{T_1}{k_3} \tag{23}$$ This indicates that eventually the system will reach a steady state where the concentration of HXT mRNA and HXT itself will no longer change over time. So in equation (16) if we let t approach infinity on both sides, we have $$ \lim_{t\to\infty}\frac{d[HXT]}{dt} = \frac{k_4 T_1}{k_3} - k_5 \lim_{t\to\infty}[HXT] $$ $$ \therefore c_6 = \frac{k_4 T_1}{k_3 k_5} \tag{24} $$ Given the condition that when t = 0, there is no HXT mRNA or HXT synthesized, so [HXT] and d[HXT]/dt are both zero. In other words, the left side of equation (19) and (20) would be zero, using Cramer's Rule $$ \begin{equation} \begin{cases} c_5+c_7&=-c_6\\ -k_5c_5+k_3c_7&=0\\ \end{cases} \end{equation} $$ $$ Ac = b $$ $$ A = \begin{bmatrix} 1 & 1 \\ -k_5 & k_3 \end{bmatrix} $$ $$ b = \begin{bmatrix} 1 & 1 \\ -k_5 & k_3 \end{bmatrix} $$ $$ \therefore c_5 = \frac{k_4 T_1}{k_5 (k_3 - k_5)} $$ $$ c_7 = -\frac{k_6 T_1}{k_5 (k_5 - k_7)} \tag{25} $$ So we have determined the value of all the three constants, and the concentration of HXT is $$ [HXT] = \frac{k_4 T_1}{k_5 (k_3 - k_5)}e^{-k_5t} + \frac{k_4 T_1}{k_3 k_5} -\frac{k_6 T_1}{k_5 (k_5 - k_7)}e^{-k_3t} \tag{26}$$

Logistic Growth Model

Now we’ve come to the second part of the model: to evaluate the concentration of glucose, denoted by [Glu], as a function of time and AFT concentration. We are only trying to establish a very simple model.
We assume that the consumption rate of glucose is roughly proportional to the number of yeast cells, the amount of HXT on the surface of each cell, and the concentration of glucose

Yeast growth and glucose consumption Model

In fact, the intake rate of glucose would be proportional to the concentration gradient across the membrane because glucose transport via HXT is facilitated transport, but it would be difficult to measure the intracellular concentration of glucose, so we assume the concentration inside of the cell is low compared to that on the outside and the concentration difference across the membrane roughly equals the concentration on the outside. Hence

The logistic model in ecology is $$ \frac{dN}{dt} = rN(1 - \frac{N}{K}) $$ where r is growth rate, N represents the population, and K is the carrying capacity. Similarly, we have $$ \begin{equation} \begin{cases} \frac{d[Glu]}{dt} = k_6[Yeast][HXT][Glu] \\ \frac{d[Yeast]}{dt} = k_7[Yeast][HXT](\eta [Glu] - [Yeast])\\ \end{cases} \end{equation} $$ $$\text{where } \eta[Glu] \text{ is variable related to environment capacity } \tag{27} $$ We can numerically solve this ODE(27) group using MATLAB.

Result Figure

There theoretical relationship of Yeast and Glucose are shown here:

Figure 1

Figure 1. Relationship of yeast population growth under AFT signal.

Figure 2

Figure 2. Relationship of [Glu] under AFT signal.

Time series prediction using LSTM

Introduction of LSTM Model

In recent years, Recurrent Neural Network(RNN) has large-scale application in daily life like voice recognition, weather forecasting and as part of image recognition algorithm, playing a vital role in sequence prediction and classification.
It can be seen as a directed chain with a number of cells, in which any one cell is able to choose the best prediction according to context, which is a few cells in front of it. However, it is obvious that the further the context cell is from the calculating one, the less weight the context cell takes in the whole context. Intuitively, this mechanism is quite like people often forget thing long ago.
To solve this problem, a new type of RNN called Long Short-Term Memory networks (LSTM) was invented by Horchreiter & Schmidhuber(1997). LSTMs are a special kind of RNN, capable of learning long-term dependencies. This model was explicitly designed to keep long-term important information.
In the figure below, apart from ordinary nets like input or output nets, LSTM has a special net called cell net, which decides whether some information get in or leave the long-term memory. As the cell net state will continue to next calculation stage without deprecation due to the distance. It solves the problem of long-term loss problem.

LSTM

Detailed formular in each cell: $$ \begin{equation} f_t = \sigma_g(W_f x_t + U_f h_{t-1} + b_f) tag{1}\\ i_t = \sigma_g(W_i x_t + U_i h_{t-1} + b_i) tag{2}\\ o_t = \sigma_g(W_o x_t + U_o h_{t-1} + b_o) tag{3}\\ c_t = f_t \circ c_{t-1} + i_t \circ \sigma_c(W_c x_t + U_c h_{t-1} + b_c) tag{4}\\ h_t = o_t \circ \sigma_h(c_t) tag{5} \end{equation} $$ $$ \text{Variables: } $$ $$ x_t\text{: input vector } $$ $$ h_t\text{: output vector }$$ $$ c_t\text{: cell state vector }$$ $$ W, U, b\text{: parameters matrices or vectors }$$ $$ f_t, i_t \text{ and }o_t\text{: gate vectors }$$ $$ \text{ }f_t\text{: forget gate vector, weight of remembering old information }$$ $$ \text{ }i_t\text{: input gate vector, weight of acquiring new information }$$ $$ \text{ }o_t\text{: output gate vector, output candidate }$$

Prediction of glucose consumption through time

In order to find the correlations between glucose consumption and time under different situations. We collect growth curve and try to learn some information from nearly 3,000 data.
We use a 20 cells LSTM and a dense layer as the network, and trainning for 100,000 steps with batchsize 10 using Tensorflow. Here are the results of training.

Figure 1

Figure 2. Prediction on test set.
X axis is prediction value, y axis shows the index of text data.
For precise rate, the Minial Square Error(MSE) is 0.00758278

Figure 1

Figure 3. Yeast population - time plot under AFT 0.2unit

Figure 2

Figure 4. Yeast population - time plot under AFT 0.1unit.

Figure 1

Figure 5. Yeast population - time plot under AFT 0.05unit.

Figure 2

Figure 6. Yeast population - time plot under AFT 0.025unit.

For Figure 3 - 6 above, x axis is in unit of the population size in time 0. y axis is in unit of hour.

Reference and source code

References

Qiang Z, Andersen M E, Conolly R B. Binary gene induction and protein expression in individual cells[J]. Theoretical Biology & Medical Modelling, 2006, 3(1):1-15.
Graves, Alex. "Long Short-Term Memory." Neural Computation 9.8(1997):1735.

Start following YeasyAFT on wechat now