Modeling

Overview

In this section we present the results of our model. A dose-response relationship between the salicylate and estrogen inputs and the GFP output is provided, and the gene regulatory components are reviewed. The GFP rise time is investigated using global sensitivity analysis and we show that the model indicate a sensitive biosensor with long response times. In addition, the rise time was shown to be highly insensitive with respect to the first order interactions between the model parameters. However, as higher order interactions were shown to be significant, this led to the conclusion that multiple parameter has to be modified in order to improve the responsiveness. Finally, the section ends with a discussion on the validity on the presented model and procedures in which it could be improved.

Purpose

An important part in engineering is the simulation of systems in order to yield an insight of the expected behaviours before actual implementations are carried out. This process can be both resource and time saving as practical implementations are often expensive and tedious to carry out. The same principle can be applied in synthetic biology where biological constructs are tested in silica prior to wet lab implementations. By simulating the behaviours, it is possible in an early stage of the development to evaluate whether the construct will suffice the intended application. This is especially important for complex systems where basic heuristics no longer can give neither sufficient predictions or mechanistic understandings of the underlying components.

The purpose of this model is to study the time dynamics of a lac regulated AND-gate based on a tripartite association of green fluorescent proteins induced by salicylate and estrogen. The intentions are to study the dose-response behaviours with respect to two inputs. Interesting parameters are the dose-response sensitivities and the dose-response times as these play an important part in real life applications of biosensors. Additionally, it is interesting to study the underlying biochemical mechanics in order to establish an understanding of the behaviours. Also, would a biosensor of this kind be sufficient for the detection of two concurrently present molecules?

Dose-response analysis

Detection limit

In this section we study the biosensors dose-response characteristics. Fig. 1 illustrates the dose-response landscape for the system. The surface was generated by letting the system evolve until steady state (5000 min) without any inducers, followed by the simultaneous addition of IPTG, salicylate and estrogen. A concentration of 1 mM IPTG was added and the salicylate and estrogen concentrations were varied in the range 0-1000 nM and 0-1000 nM for 40 steps respectively. The system was then further simulated until equilibrium (5000 min) and the steady state concentrations of the associated GFP were saved. From the figure, it can be seen that for low concentrations of salicylate and estrogen, a low response is obtained. As the concentrations are increased, the corresponding response is increased until reaching a plateau were the response is saturated for increased concentrations of pollutants, giving the landscape the characteristic mountain look. This is reasonable as the amount of GFP expressed is limited by the amount of GFP1-9 and GFP10-ER-GFP11. For instance, adding more salicylate will eventually make all the Psal transcriptionally active and the addition of more will thus not increase the expression of GFP1-9. In a similar manner, if all GFP10-ER-GFP11 has been activated and associated with GFP1-9, the addition of estrogen will not cause more GFP10-ER-GFP11 to be expressed. Consequently, if either one of proteins are abscent, a complementation cannot occur and the response remains unchanged.

Fig. 2 shows the dose-response heat map for the system obtained by looking at the surface level of fig. 1 in the salicylate-estrogen plane. From the figure, the sensitivity with respect to salicylate and estrogen can be studied. It can be seen that no response is generated for low concentrations of both inputs. In addition, in the abscence of either molecule, no output is generated regardless if the other exists in excess or not. This clearly illustrates the AND-behaviour and it can be seen that the detection limits are approximately 1 nM and 100 nM for salicylate and estrogen respectively. Concentrations lower than those yield either no or a weak response relative to the maximal response. Clearly, this indicates a very sensitive system with respect to the inputs. Since the detection limit of extracellular salicylate is 1 nM and since the model assumes free diffusion, this would correspond to roughly 1 nM intracellular salicylate which is almost equal to one molecule. This means that it would be possible to obtain a response based of one single molecule entering the cell. This seem to be an overestimate of sensitivity since we have not found any literature on whole cell biosensors where subnanomolar detection limits have been attained. Nevertheless, the results indicate that the sensor could be highly sensitive in the codetection of salicylate and estrogen.

Moreover, since the model manages to measure quantities in the order of single molecules, one might question the validity on using a continuous scheme. For example, there is a very weak response obtained for a salicylate concentration of 0.5 nM, corresponding to one half molecule. It might thus seem more reasonable to model the biosensor using discrete and stochastic methods in the single cell level by using the stochastic simulation algorithm. However, since we are considering a deterministic scheme interpreted as the average response over a large population, we make no assumption as to the homogenity of the molecular distribution among the cells. That is, even though the concentration of a molecule in one cell corresponds to one half molecule, it does not mean that the cell has a half molecule in the cytoplasm, but rather that the average amount of molecules over the population is one half salicylate. This means that we are studying the deterministic limit for one cell and the results should be interpreted as an expected average of a large population. We thus consider the choice to model the cell deterministic reasonable as we do no expect a biosensor to consist of one single cell.

lac regulation

With the use of lac inducible promoters, it is possible to have an on/off switch regulating the AND-gate. The dose-response landscape and heat map obtained without any IPTG inducer but otherwise same conditions as in fig. 1 and fig. 2 is shown in fig. 3. A slight leak response can be observed which is because the lac repression does not inactivate the expression entirely. Since the repressing dynamics is modeled as a sequence of reversible reactions, there will always be a small amount of unbound operators allowing for transcription to occur. Nevertheless, when comparing the expression levels with the induced case in fig. 1 we see that the difference is at least four orders of magnitude, clearly indicating a working switch.

Time characteristics

Sample simulations

Fig. 4 illustrates four simulations showing the time evolution of the associated GFP, GFP1-9 and estrogen bound GFP10-ER-GFP11. The simulations were run as previously described, using salicylate and estrogen concentrations in the range 1-100 nM and 10-300 nM respectively. It can be seen that the time evolutions are smooth as the species converges to steady state concentrations and that no oscillations are present. Thus, there are no observed undershoots or overshoots as a response to salicylate and estrogen. The rise time for the associated GFP, defined as the time it takes to go from 10% to 90% of the steady state value, was measured at approximately 650 min for all realisations. The settling time, defined as the time it takes to fall within 2% of the steady state concentration, was measured at roughly 1200 min for all simulations.

Gene regulation

The dynamics of the lac operators prior to the addition of IPTG, salicylate and estrogen is shown in fig. 5. The upper part of the figure shows the repression of the lac operators regulating the expression of NahR and GFP10-ER-GFP11. It can be seen that all operators are repressed within a couple of minutes. However, within this time frame both proteins are transcribed until completely repressed. This is noted in the lower part of the figure where expressed NahR binds to the Psal promoter, inverting the ratio between the protein bound promoter NPsal and Psal. However, as the operators are repressed, the expression of NahR alters and the equilibrium is displaced back to Psal as the NahR is decomposed. Worth noting is that the entire process takes almost 1000 min before reaching equilibrium. This can be traced to the half-life of NahR which was set to 2 hours and the slow Psal dynamics.

The dynamics of the lac operators and the Psal promoter after the addition of IPTG, salicylate and estrogen is shown in fig. 6. Here, 100 nM and 300 nM of salicylate and estrogen was used and the IPTG concentration was set at 1 mM. In the upper part of the figure, it can be seen that all operators are repressed at time zero, completely inhibiting the transcription of NahR and GP10-ER-GFP11. After the addition of IPTG, the operators are derepressed which consquently increases the transcription rates. It can be seen that not all repressors are inhibited at steady state, indicating that higher expression levels can be achieved by the addition of more IPTG. However, we chose to limit the extracellular concentration to 1 mM to keep the simulation within realistic boundaries set by wet-lab working conditions [1]. In addition, it is also worth mentioning that we did have some issues with numerical instability for higher concentrations of IPTG. This can be traced to the shock effect which occurs when a high concentration of IPTG is added momentaneously at time zero. However, it was observed that this problem could be circumvented by gradually increasing the concentration over a short time span instead of adding everything simultaneously.

The dynamics of the Psal promoter is illustrated in the lower part of fig. 6. At time zero, only Psal is present due to the abscence of NahR and salicylate. As time progresses, NahR is expressed, forming the intermediate NahR bound promoter NPsal. This is followed by salicylate simultaneously diffusing over the cell membrane and binding to NPsal. What is observed is then the Psal promoter slowly decreasing in concentration while NPsal is formed, followed by a lagged increase of the active promoter SNPsal. It can be seen that all Psal is converted into the active form SNPsal within the range of approximately 200 min. This indicates that the expression of GFP1-9 does not reach the maximal rate until 200 min after induction, which corresponds to almost a third of the total rise time.

Response time sensitivity analysis

From the simulations carried out in the preceeding section, it was noted that rise times in the order of 650 min were observed for a variety of doses and it was seen that the time varied little with respect to the applied doses. Due to the long rise time of almost 11 hours, it would be of great interest to reduce the time to allow for a rapid detection for practical uses. Hence, to investigate the role of the parameters we chose to do global sensitivity analysis with respect to the rise time. Global sensitivity analysis is a method used to study the effect that large variations in the input parameters have on the output. This is slightly different from local sensitivity analysis which studies the effect on the output caused by small pertubtations on the input. One can view the local variant as a gradient of the output with respect to the parameter space at the point in the parameter space considered, whereas the global variant gives an overall effect on the output from each parameter with respect to a specified domain in the parameter space [2].

The parameters for the sensitivity analysis were chosen with respect to their expected effect on the rise time as well as their capability of being modified in the lab and are illustrated in table 1. We chose to investigate the transcription and translation rates for NahR, GFP1-9 and GFP10-ER-GFP11, the Psal dynamics, the ligand binding dynamics of estrogen to GFP10-ER-GFP11 and the tripartite GFP association rate. The parameter space was chosen by logarithmic sampling around the estimated values within two orders of magnitude, that is the interval with bounds specified by one hundred times and one hundredth times the parameters. These intervals were chosen somewhat arbitrarly based on the belief that it is not possible (or at least very hard) to alter the parameters more than one hundred times their original values. In total, 42 000 samples from the 13-dimensional parameter space were generated using Saltelli's sampling scheme (see implementation). The sensitivities can be found in table 1 and fig. 7. We chose to only consider the first and total order sensitivity indices and present them along with 95% confidence intervals. Here, S1 denotes the first order sensitivity index which is a measure of a single parameters contribution to the total variance in the output. The total sensitivity index, ST is similarly a measure of a parameters contribution to the output variance, but including both the first order effect and all higher interactions with other parameters. We interpret it as the contribution to the output variance where interparameter dependencies are included. We think it's worth mentioning that we initially had issues when sampling over the normal parameter space as it resulted in very small sensitivities for all parameters. This was solved by sampling over the corresponding logarithmic parameter space.

It can be seen from table 1 and fig. 7 that almost all first order sensitivity indices are weak. This means that most parameters in the pathway has a weak individual effect on the rise time. In addition, the confidence intervals indicate that almost all first order sensitivities are insignificant which implies that we cannot rule out the possibility that the parameters do not affect the output individually at all. In fact, the only strong first order sensitive parameter is $\text{k}_\text{aSNP}$, which also is the only significant one. Thus, this is the only parameter that strongly affects the rise time individually. However, when looking at the total sensitivity indices, it can be seen that their values are higher than almost all first order indices. In addition, most of them are significant in contrast to the first orders. This indicates a strong coupling in the system where parameters must co-vary in order to significantly affect the rise time of the biosensor. Clearly, this is just a rediscovery of the principle of the rate determining step commonly seen in chemical kinetics and we have simply found the parameter governing the rise time, namely $\text{k}_\text{aSNP}$. It is worth noting that in retrospect, this result might seem highly trivial and that this can simply be seen by observing $\text{k}_\text{aSNP}$ being the smallest rate constant in the network. However, when we carried out this analysis we did not intuitively realize this until after analyzing the sensitivity results. In fact, our hypothesis was that there would exist some key parameters strongly affecting the rise time and we were thus looking to find these in order to optimize for rapid detection. Nevertheless, the analysis at least gave a quantitative justification of the rate determining step principle and we are very happy to have performed such an analysis.

Table 1: Sensitivity indices for 13 parameters with respect to the rise time generated from the global sensitivity analysis. The sensitivities were obtained by using the Sobol method by sampling 42000 points in the parameter space using Saltelli's sampling scheme. The parameter space used is defined by the values within two orders of magnitude from each estimated parameter. S1 = first order sensitivity index, ST = total order sensitivity index, 95% CI = 95% confidence interval.
Parameter Value S1 95% CI ST 95% CI
$\ce{k_\text{aGFP}}$ $1.8 \cdot 10^{-5}$ $0.0035$ $0.011$ $0.036$ $0.013$
$\ce{k_\text{aEEs}}$ $0.0078$ $0.0013$ $0.0022$ $0.00075$ $0.00037$
$\ce{k_\text{dEEs}}$ $0.072$ $0.0011$ $0.0019$ $0.00044$ $0.00019$
$\ce{k_\text{aNP}}$ $4.7 \cdot 10^{-3}$ $0.019$ $0.031$ $0.18$ $0.034$
$\ce{k_\text{dNP}}$ $11.5$ $0.029$ $0.039$ $0.22$ $0.044$
$\ce{k_\text{aSNP}}$ $4.61 \cdot 10^{-4}$ $0.24$ $0.070$ $0.55$ $0.067$
$\ce{k_\text{dSNP}}$ $3.81 \cdot 10^{-6}$ $0.0010$ $0.0027$ $0.00082$ $0.00039$
$\ce{k_\text{MN}}$ $3.09$ $0.0077$ $0.037$ $0.18$ $0.033$
$\ce{k_\text{ME}}$ $3.18$ $0.00094$ $0.019$ $0.083$ $0.026$
$\ce{k_\text{MG}}$ $0.168$ $0.031$ $0.036$ $0.28$ $0.055$
$\ce{k_\text{N}}$ $3.0$ $0.027$ $0.038$ $0.21$ $0.040$
$\ce{k_\text{E}}$ $2.86$ $-0.0018$ $0.019$ $0.074$ $0.022$
$\ce{k_\text{G}}$ $4.59$ $0.035$ $0.037$ $0.23$ $0.043$
Parameter sweeps

To illustrate the effect of the sensitivite and insensitive parameters within the scanned ranges, we performed four parameter sweeps. All simulations were carried out under normal conditions as explained previously, using 1 mM IPTG and 100 nM and 300 nM salicylate and estrogen respectively. A total of 30 realisations were computed for each parameter in a range of two orders around the estimated value. Fig. 8 illustrates the results. It can be seen that the only parameter affecting the rise time strongly is $\text{k}_\text{aSNP}$ which is in line with the sensitivity analysis. However, what is interesting is that tuning the parameter does not improve the rise time. In fact, the high sensitivity affects the output by drastically increasing the rise time when lowering the parameter value, whereas increasing it does not impose any effect. We have thus found that $\text{k}_\text{aSNP}$ is not only the rate determining parameter, but also the time limiting parameter. It is here important to make the statement that the sensitivities only give the contribution to the total variance in the output, but does not give any information on the direction of the change. Similarly, studying the rest of the parameters reveal that they have a relatively small effect on the rise time, with the association rate $\text{k}_\text{aGP}$ being the only one improving the rise time in a noteworthy way.

Practical interpretation of the sensitivities and parameter sweeps

Generally, sensitivity indices can be interpreted in two ways depending on the design phase. If the purpose of the model is to recreate a real system and describe it as carefully as possible, then the indices gives the parameters that are the most important to estimate accurately. This is because an error on these parameters will give a larger contribution to the output than less sensitive parameters. On the other hand, if the system has been constructed and the parameters are estimated to a satisfactory precision, the sensitivities gives the most important parameters to modify in order to achieve a certain behaviour.

In our case, the observations on the sensitivities yielded important information for avoiding unnecessary work in the wet lab. Since almost all parameters are first order insensitive, individual parameter optimizations will not yield any significant effect on the performance in terms of waiting times. In addition, the only significant first order sensitive paramater turned out to be the rate limiting step. It was seen that an increase in the parameter value would not give any improved rise time, whereas a decrease would rapidly increase it. The only possible way left to efficiently lower the rise time would then be to consider higher order interactions. But this means modifying many parameters simulatenously and such procedures would be very tiresome and thus not feasible in reality. The conclusion based on the model is thus that the biosensor, albeit very sensitive with regards to the inputs, is very slow at generating an output. The model thus indicates that if a sensitive biosensor is desired, this kind of system might work whereas if a quick and responsive biosensor is needed, a different biochemical network topology should probably be considered.

Conclusion

To conclude, the model indicates:

• High dose sensitivity
• Long response times
• Well defined on/off states
This was done by fitting the model to
• iGEM registry data
• Experimental results from published material
• Average kinetic parameters
We provided
• A full derivation of the network topology
• A determenistic representation of ordinary differential equations
• Implementation details
The contribution to our project was
• An insight into the biochemical mechanics
• Order of magnitudes of response times and dose sensitivities
• An indication of the unfeasability of improving response times

A note on the validity of the model

With mathematical models, abstractions of real world systems can be constructed in order to infer information about reflected behaviours. A lot can go wrong under such processes and it is practically impossible to include all details in a model. This makes the process of questioning, experimentalily validating and proposing alternative models ever more important before final conclusions about the real world behaviour can be infered. In this section we want to draw attention on some ambiguitities in the presented model.

The first ambiguity is the estimation of the reaction rates in the binding of salicylate to NPsal. This was estimated by using the induction ratios from a dose-response diagram of salicylate induced GFP from the registry. Here, we proposed an ad hoc base level response of 100 nM GFP and assumed linearity in the induction ratios. Clearly, there is no reason to believe that the concentration of GFP is exactly 100 nM at base level, more than it seemed reasonable. However, despite this, the model managed to catch the temporal dose-response behaviour, so although the absolute concentrations might be off, the time to equilibrium should at least be within the right order.

The estimation of the association rates between GFP1-9 and GFP10-ER-GFP11, and NahR and Psal were based on experimental in vivo measurments from published litterature. There is obviously no reason to believe that the behaviour in vitro gives a complete description of the in vivo behaviour as intracellular conditions are usually much more complex. Nevertheless, they should still be indicative and hint towards a true behavour. Other parameters based on average reaction rates are only valid under the assumption that the cell is working in an "average mode". This is not always true however. For example, overexpressing a protein will alter the expression rates of other proteins in the cell [3]. In our case, we do not know the effect that multiple genetic modules have on the total cell behaviour. Thus, the system should ideally be implemented in the wet lab and all parameters reestimated based on experimental observations.

Finally, it is not obvious that the proposed network topology is the most informative one. Reactions of higher order with more side reactions or intermediate steps could give a better description of the behaviour. Clearly, a comparision using models with other networks topologies should be made to find the best description, balancing intricacy and accuracy.

Nevertheless, despite all of this, we believe the model at least gave us indications of the real world behaviour in our biosensor.

References

1. [1] Larentis, A., Nicolau, J., Esteves, G., Vareschini, D., de Almeida, F., dos Reis, M., Galler, R. and Medeiros, M. (2014). Evaluation of pre-induction temperature, cell growth at induction and IPTG concentration on the expression of a leptospiral protein in E. coli using shaking flasks and microbioreactor. BMC Research Notes, 7(1), p.671.
2. [2] Zi, Z. (2011). Sensitivity analysis approaches applied to systems biology models. IET Systems Biology, 5(6), pp.336-346.
3. [3] Kafri, M., Metzl-Raz, E., Jona, G. and Barkai, N. (2016). The Cost of Protein Production. Cell Reports, 14(1), pp.22-31.