MODELING
Overview
About modeling and why iGEM Nottingham chose to do it
Download our models and source code from our gitHub page
A major problem the project faced is that the comparison process of the fluorescence proteins wouldn't be possible to be investigated with all combinations as it would take too long.
To answer this problem, the team will attempt to model the fluorescence spectra over time expressed by the proteins given different. First, the type of gene expression would need to be identified and then, would be modified to considered the effects of inhibition and finally, be applied over time to see how much expression would occur at a certain time period. The team will use Mathematical modeling such as Ordinary Differential Equations because they are easy to convert into programming in order to build components for the simulation.
As a side project, the team will also investigate into whether our method is random and unique by investigating how many combinations we can make and whether we can accurately predict which combination will occur.
Constitutive Gene Expression For Protein and mRNA Expression over Time
The general gene expression equation showing the process of protein synthesis
Biological insight had told us we need a model with constant gene expression. Investigating models from literature 1 so see which model would satisfy these conditions, and it was found the constitutive gene expression model was suitable to guide the model.
The first step was to take the general model from literature and apply it in our scenario using the proteins (GFP, ECHP, RFP.)
Figure 1 $$ \color{white}{ sfGFP \underset{Transcriptin}{\rightarrow} mRNA \underset{Translation}{\rightarrow} sfGFP } $$
The equation above describes the process of which the gene undergoes transcription to produce mRNA. The mRNA carries the genetic information copied from the DNA which codes for protein. The expression of protein, can therefore, be measured by the fluorescence which is the desired output of the system.
Figure 2 $$ \color{white}{ mRNA \underset{Degradation}{\rightarrow} \oslash } $$ $$ \color{white}{ sfGFP \underset{Degradation}{\rightarrow} \oslash } $$The two equations above state the same time, the concentration of protein and mRNA would undergo degradation which means the concentration would drop. However, since there is always protein and mRNA being created, over time, the creation and degradation keep the concentration constant. 2
We can apply Law of Mass Action combine both equations for the concentration of protein and mRNA over time. This model can be described as:
Figure 3 $$ \color{white}{ mRNA = k_{1} -d _{1 } mRNA } $$ $$ \color{white}{ Protein = k_{2} \cdot mRNA - d_{2} \cdot Protein } $$Where...
- mRNA is the concentration of mRNA
- Protein is the concentration of Protein
- k 1 is the constitutive transcription rate. This represents the number of mRNA molecules produced per gene, per unit of time.
- d 1 is the mRNA degradation rate
- k 2 is the translation rate. This represents the number of protein molecules produced per mRNA molecule, per unit of time.
- d 2 is the protein degradation rate.
This is important because we can use this model to calculate the concentration of proteins we can expect over time. This is useful as we can use this information to calculate the total emitted light spectra during the time period which is what we are looking for in our system. However, the constants and variables are individual for each protein and which means parameters for each protein would need to be found. These constants were found using literature 3 (for GFP) and lab results (the rest.)
1 GB Stan, 20137. Modeling in Biology. London, the United Kingdom: Imperial College London. p, pp.59-65.
2 See Non-Inhibited conditions from Figure 5 Gene Transcription Regulation by Repressors (CRISPRi) - Concentration over Time
3 See Relationship between Max Fluorescence and Protein Concentration for more details
Gene Transcription Regulation by Repressors (CRISPRi) - Concentration over Time
Calculating how much protein is produced over time when a gene is inhibited
The next step in developing our simulation was to calculate our protein concentration at any given time when using CRISPRi. Discussion with wet-lab revealed our method would be using CRISPRi as a repressor, which works by inhibiting the expression of one or more genes by binding to the promoter region 1 . The expanded mRNA and Protein concentration models from the Constitutive Gene Expression Model 2 were modified to include the element of repression from the CRISPRi inhibition.
$$ \color{white}{ \frac{dgRNA,i}{dt} = k_{g,i} – δ_{dg} \cdot gRNA,i – k_{f} \cdot Cas9 \cdot gRNA,i} $$The above equation details the change in gRNA concentration extending along index i, i will account for us perhaps having multiple gRNAs which will compete with one another. At any given time, the concentration of gRNA,i will be increased by its production (kgi), and decreased by its association with cas9 at rate kf, relative to it's concentration, and it will also degrade and diffuse away at rate δdg, 3 :
$$ \color{white}{ \frac{dCas9}{dt} = k_{c} – δ_{dc} \cdot Cas9 – k_{f} \cdot Cas9 \cdot \underset{i}{∑}gRNA,i} $$
This equation details the change in Cas9 protein. It will 3 :
$$ \color{white}{ \frac{dCas9}{dt} = k_{c} – δ_{dc} \cdot Cas9 – k_{f} \cdot Cas9 \cdot \underset{i}{∑}gRNA,i} $$
This change can be applied to the Law of Mass Action 3 :
$$ \color{white}{ \frac{dmRNA,i}{dt} = k_{0} \cdot \frac{1}{1+k{m} \cdot Cas9:gRNA,i} −δ_{dm} \cdot mRNA,i} $$
This change can be applied to the Law of Mass Action 3 :
$$ \color{white}{ \frac{dmRNA,i}{dt} = k_{0} \cdot \frac{1}{1+k{m} \cdot Cas9:gRNA,i} −δ_{dm} \cdot mRNA,i} $$
This change can be applied to the Law of Mass Action 3 :
Where...
m is mRNA concentration, p is Protein concentration, R is Repressor, k1 is Max Transcription Rate, k is the Repression Coefficient, n is number of repressors that need to cooperatively bind the promoter to trigger the inhibition of gene expression (Hill Coefficient), R is Repressor, d1 is mRNA degradation rate, d2 is Protein degradation rate
The value for these constants and variables were taken from literature and calculating them 4 but later, adjusted to the lab results.
Figure 6
Figure 6 shows the structure which underwent CRISPRi inhibition are expected to produce lower concentration of the protein whose expression were are inhibiting. This is important as it means the team can calculate concentration of proteins which are inhibited and compare them to the control conditions as well as giving the correct concentration for the simulation.
Furthermore, by having a model which can calculate the protein concentration at any given time, we can deduce how much fluorescence is being emitted at that time period by the bacteria
4 See Relationship between Max Fluorescence and Protein Concentration
Relationship between Max Fluorescence and Protein Concentration
Using our models to estimate the amount of fluorescence expected from a certain concentration of protein synthesized
A problem the team faced was identify the level of fluorescence at any given time as it is expected that the proteins would be expressed. This can be confirmed by looking at the bacteria after being constructed and observing that they are giving off light.
To solve this issue, the team required an equation which could estimate the intensity of fluorescence at any certain time. This consisted of calculating the protein concentration in a time period mapping that intensity to the protein concentration at that time provided by real world data.
When the fluorescence data was received from the wet lab, a model was constructed from the data gained. Originally, the data from the lab was the Fluorescence against Time but by using the Gene Transcription Regulation by Repressors model developed earlier 1 , the team was able to estimate the protein concentration at that time.
Figure 7
These graphs show the relationship between protein concentration and fluorescence intensity; as the concentration increases, the intensity increases greatly. The only exception to this is CFP however, it was revealed that there was an error in reading CFP identifeid by the wet lab. Due to time constraints, rather than implementing the relationship directly from lab data, the data was fitted using a Polynomial Fit of Order 3 using Excel and an equation was calculated from these. These equations were directly plugged into the simulation. However, this is inaccurate as the R squared value was ... , suggesting that it doesn't fully capture the data trend.
These relationships were implemented into the simulation to give the expected spectra produced by each protein. This highlights another use: by adding or subtracting values from our fit, we can create a threshold for our Keys. This was essential when developing the Raw Data Simulator. 2
1 See Gene Transcription Regulation by Repressors (CRISPRi) - Concentration over Time
2 See Software
Are Our Constructions Random?
Showing that our constructions are random and why they are random
When constructing our proteins with our current method, there were 3 vectors we could order from.
However, in this proof of concept, order is irrelevant as the gene is either inhibited (1) or not (0). Using
$$ \color{white}{ n ^ r } $$Where n = 2 and r = 3, this gives us a total combination of 2 3 {1,1,1} {1,1,0} {1,0,1} {1,0,0} {0,1,1} {0,1,0} {0,0,1} {0,0,0}
Randomness comes from the fact the system relies on Brownian Motion 1 , a random process to create these combinations.
However, in order for a movement to fall under Brownian Motion, it must fulfill a condition where the process must have continuous paths. This is not true as once the structures begin to form, the paths stop (they do not collide off each other elastically, but rather, combine.) Furthermore, the bacterium might become biased towards options that put less metabolic stress on the bacterium, which results in selection. Alternatively, using metabolites to undergo transposition can improve randomness. 2
In order to aid, with the wet lab in what combinations they can expect, the team developed an Excel Spreadsheet where a user can simply input details of the construction and it would show what construction it would look like
Members of the public are encouraged to try it out and can use it to help with identifying how their spectra would look if they used the same proteins the project used
Excel Spreadsheet1 Refer to https://statistics.stanford.edu/sites/default/files/EFS%20NSF%20149.pdf
2 Refer https://link.springer.com/book/10.1007%2F978-1-4612-0459-6 for more information about Brownian Motion and Random Walk.