Modeling Paratransgenesis


One of the most important aspects when you think about a problem from the point of view of Synthetic Biology is to look at the biological problem in an “engineering” way. You know the inputs, design (or re-design) the system you want and predict your output. But sometimes, what’s happening to your system is not intuitive, so you may need an non-intuitive way to predict your system. Mathematical modeling can make you learn about the system you are working on in a way you could never imagine.

What to Model (And Predictions you may want)

When you intend to model your system, you need to know what you want to understand and predict with it.

In paratransgenesis there is a huge sort of different aspects you may look at, as you’re working not only modificating an organism to do something you want, which is with complicated by itself, but also embedded in in natura conditions, what means you will find many factor that will influence your data. Therefore these are some of the “variables” you can discover in this problem:

In the molecular level we generally have protein and RNA as the protagonists of our models, since, in synthetic biology, we basically work with changes in the RNA or protein expression.

Considering the protein expression, you can quantify the expression over time, check if the protein is expressed in some conditions or not or the probability of the expression . And you can also manage some variables like protein degradation, the translation rate or probability of some post-translational modification that is needed to be done so that the protein can work. Similarly with RNA, comparing with time or the influence of an activator or repressor. In addition, the same logic is applied in a system with more proteins or circuits interacting, you just need to gather the correlation among them to build a trustworthy model.

Now, in the ecological level there is a bunch of new variables to work with. In ecological (or epidemiological) modeling, your protagonists are populations, where you seek for their fluctuations. So the variables are in general carrying capacities, that is, the maximum population size that an environment can sustain, intrinsic growth rate of that population, what includes both natural birth and death rates, and, sometimes, some other death rate, like in classical Lotka-Volterra predator-prey models. But a good example in the paratransgenesis problem, where your objective is stop the pathogen life-cycle, is a toxicity of the protein the system is producing in some equation that describe the pathogen itself.

Another very interesting problem to model is the transmission of the modified organism. This problem is still in the ecological spectre, but you may need to understand better the relation of the two species, how the horizontal and vertical transmissions occur, if they do, and even the micro-ecological system that already lives within the insect. If the final objective really is to put a modified organism in the environment, you need to try and predict its ecological behaviour, and if it can be successful in colonizing the insects, or even spread unintentionally among unwanted territory/hosts.

To work with all of this different aspects can be really hard, so a good way to have an accurate, but simple, model is try to focus just in some of them. Nevertheless attempting to integrate this whole complexity can be a challenging, but extremely interesting and even fun, endeavor.

How to model:

A good start is designing a simplified representation of the problem, like a scheme, where you draw the basic information of your inputs, the items to have their variation described in those models, their outputs and the relations between them.

Now, having a simplified notion of the system, you can have an objective point of view on it and then decide what type of model you want to do.

Differential Equations

Differential equations (D.E.) are a very interesting way to construct a mathematical representation of a system, and specially interesting when we are talking about paratransgenesis, as we have a lot of different aspects in the problem. Using differential equations, we can correlate them in an easier way.

Unlike thermodynamic models, differential equations describe the variations of a system, i.e., how a class of interest varies in some unit, for example, the protein expression in time. This describes these variations in a continuous way, that is, we can see our class, like population size or protein in medium, variation in every single variation, as small as you want, of the unit, in our example, time.

Given these properties, D.E. are classically used for ecological modeling, since we can describe, for instance, population dynamics in time or in space. Thus one of most important models of Ecology was constructed using differential equations, the logistic growth.

Then, to first start building the model, look at your biological system, so you can determine your variables and parameters, as also define their relations (A good way to do it so is building that scheme suggested before, because it simplifies your perception of the model). It is necessary for all assumptions in the model to be very clear, like having some population as a constant or if you’re considering mutations in the system, for example. Then, knowing these assumptions and having the parameters/variables and their relations, you can already write the model! (Some basic differential equations will be shown later).

An interesting point to highlight is that, until now we exemplified the construction of these equations depending in only one variable like time, but we can also build them depending on two or more equations. In this case they are called partial differential, so you can write your equations depending on two variables, for instance, space and time. Thinking about proteins, you can think about the expression and its dynamics in the medium.

To clarify these ideas, let’s set an example. Consider this simple circuit:

Here we want to understand the expression of this protein in time. So we have as the first equation the protein over time, called here dP/dt. The protein expression depends basically on the RNA translation, θ, the RNA itself and some degradation μP. The other thing here that varies in time is the RNA, dmR/dt, since can vary with activation and repression. So RNA depends in some basal production, called here γ, the inducer, Ι, the repressor Χ and, probably, some degradation term, μmR. So we can draw a visual representation of this system:

As we can see, with this representation it is easier to see the relations between the classes. Where the spheres represent my equations, arrows going to the sphere is an input and out is an output (When there is a block, like the one going out of mR, it represents that that doesn’t decrease).

Now, knowing these postulates, we can write our equations:

*Note: As said before, the differential equations are continuous equations. But sometimes it’s interesting to represent all these small variations in discrete time, for example. So that difference equations can be written, i.e., equations that represent the next state, t+1, as a direct function of the state before, t. A good example is the Nicholson–Bailey model, that describes the population of a parasite and its host, for time t+1 depending in the populations in the time t.

Our models are based in differential equations, you can check it as an example of differential equations.

Boolean Models

Other interesting equations for synthetic biology are the boolean equations. Boolean models are mathematical representations of a system where we just have states of variables, usually represented by 0 or 1, meaning ON or OFF, Expressed or not expressed. It’s interesting that this representation occurs in discrete time and with a deterministic bias. They show themselves very useful when you have a very complex system of expression, with many genes controlling each other, then you can easily assign states for each part of the system. Using this kind of model you have a huge simplification, on purpose, so a lot of information will be lost, therefore accuracy, but a lot to be gained in the analysis, specially with the aid of computers.

For creating this kind of model, as always, you need to understand your system, but in this case the regulation is simpler. Like with a differential equation, the visual representation can be very helpful.

The actual model has basically 3 aspects: The variables, their states and their effects in the others. The variables are your circuits or activators/repressors having their have their states as 0 or 1. They also have their effects one in the other, as activator or repressor.

Let’s again take a simple example. Suppose a system with 4 circuits, A, B, C and D. A acts as activator of D, and repressor of B. B is a repressor of C. C is a repressor of D. D is a repressor of A and activator of B. But we also have repressor and activator molecules in the system. A has a repressor, B has an activator, C has both, but if the repressor is stronger than the activator, that is, in presence of both C is repressed. D doesn’t have an outside input.

So what are the states of the system? What if the A activator is present? And if all inducers and repressors are present? If we wanted to represent these with differential equations there would be so many parameters and variables, with so many equations that analysing or simulating would be true hell, even if it’s possible. A boolean model is easier to represent and can even give you a way to simplify your system.

Stochastic Models

Stochastic models are a little bit different in comparison with the deterministic models. The systems here are described by a function of probabilities, so their variables have a wide range variability considering one same input. A good example of this kind of model is the 2016 ETH Zurich team. For this kind of model, we will briefly adress two approaches: Thermodynamics and kinetics.

In the thermodynamic approach in stochastic models the focus is not the variation of a variable in terms of others, but describing the relations between the activators and repressors of the circuit. Then there’s a biophysical description of the system, where you use a stochastic model to determine the expression in each state (With the repressors/activators connected or not). To build this model, the first step, as always, is to design a good representation of your system, evidencing every state of the circuit with its inducers/repressors. Each state has its probability and level of expression, where the sum of the probabilities is 1 and the sum of the product of probability and expression is the total expression in the system. It can give you the expression of your whole system, considering a quasi-equilibrium state.

The other approach is using kinetic models, where you describe your system in terms of reactions, used to form mass balance equations for you to simulate your system. Differently from thermodynamical models, kinetic models can be described in terms of variables, like time.

Stochastic modeling can be very helpful to make you better understand the behaviour of your system, since most of times biological systems have stochastic behaviours. However, it's harder to work with a complex problem, where you have tons of variables. And it’s also harder to scale-up to the ecological level and make relations with it, since ecological stochastic modeling can be a lot harder than usual.

Basic Differential Equation Models

Having this brief explanation about modeling and a few types of models you can create, based on the focus of the work of this year, paratransgenesis, there are some basic differential equations that are interesting for modeling in paratransgenesis:

Protein production in time:

The first equation that can be worked in our proposal is the anti-pathogen protein expression:

Considering that for paratransgenesis the effect of this protein occurs when it's already out of the bacteria, the expression is directly related to the population, so the first term is the population of the chassis, Na.

The parameters θS and θT are the secretion and translation rates. In our model we treated them as one single term, since they are constants. The variable mR represents the expressed the mRNA, which dynamics is described by the next equation. There’s also degradation of this protein, product of the degradation constant, μP, and the quantity of protein itself.

There’s also some other possibilities to be worked in this equation. Some post-translational alteration can be taken into account depending on the protein that will be expressed. If this protein is a monomer and its action happens with the formation of a complex, it might be interesting to have another equation describing this formation. In this case the term of formation of this complex is a negative term in the protein equation and a positive term representing the release of the complex.

In our model, we considered a negative term that represents the encounter of the protein and the pathogen, N.

RNA expression in time:

The second equation is the protein’s mRNA expression. In this equation we basically have some basal production of the mRNA, called γmR, that is basically the transcription of my DNA not considering any type of external regulation. It also represents how strong the promoter is, since a stronger promoter will lead to a bigger basal transcription. In this term there’s also inducer, [Ι] and repressor, [Χ] variables. The inducer increases the amount of expression, so is represented in a product. In other way, the repressor is the opposite. And is represented with 1+[Χ], because since is a division, when the repressor is 0, this term must go to 1. There’s also, like the protein, a term of degradation, product of degradation parameter, μmR, and the amount of mRNA itself.

Like the protein, there can be some new term depending on your biological system, just an increase in the number of repressors and inducers or something like a new parameter representing some post-transcriptional alteration, like splicing.

The same model can be used to describe the behaviour of some circuits that act like repressors or inducers, like in our iron-lactate model. Yet our toehold-switch model has something different, a complex of two RNAs that allows the mRNA to be translated, as you can see here.

Population in time:

The third basic equation is the populational dynamics of the pathogen. This is exactly the same equation we used in our models.

The population dynamics here are basically described by the the classic logistic model, where the growth is a product of the intrinsic growth rate, r, which is a balance of natality and death rates, the population itself and the carrying capacity, K. This is a classical model which has been worked with since the XIX century.

Considering that we hope that using a specific anti-pathogen protein could lead the population to extinction, we also add a mortality rate, which is the product of the protein toxicity, λ, and the encounter of N and P.

This model can also be useful if you don’t want to consider the population of your chassis a constant, like we suggested in the protein equation.

Now, with the basic modeling tools, we hope that it will be easier to develop a model for the integration of Paratransgenesis, SynBio and mathematical modeling to eradicate vector-borne diseases.


  1. Almquist, J. et al. “Kinetic models in industrial biotechnology – Improving cell factory performance”. Metabolic Engineering 24:38-60(2014)
  2. Dresch, J.M. et al. “Thermodynamic modeling of transcription: sensitivity analysis differentiates biological mechanism from mathematical model-induced effects”. BMC Systems Biology 4:142 (2010)
  3. Ay, A. & Arnosti, D. N. “Mathematical modeling of gene expression: a guide for the perplexed biologist”. Biochem Mol Biol. 46(2):137-151 (2011).
  4. Yildirim, N. et al. “Dynamics and bistability in a reduced model of the lac operon”. Chaos 14(2):279 (2004)
  5. Santillán, M. & Mackey, M. C. “Quantitative approaches to the study of bistability in the lac operon of Escherichia coli”. Soc. Interface 5:29–39 (2008).
  6. Brauer, F. & Castillo-Chavez, C. “Mathematical models in Population Biology and Epidemiology”, Second Edition. Springer (2012)
  7. Blanchard, P., Devaney, R. L. & Hall, G. “Differential equations”, Fourth edition. Boston University, Cengage Learning (2011)