MODELLING
Overview
About modeling and why iGEM Nottingham chose to do it
A major problem the project faced is that the comparison process of the fluorescence proteins wouldn't be possible to be investigated with all combinations as it would take too long.
To answer this problem, the team attempted to model the fluorescence spectra over time expressed by the proteins given different. First, the type of gene expression was identified and then, modified to consider the effects of inhibition and finally, applied over time to see how much expression would occur at a certain time period. The team used mathematical modelling such as Ordinary Differential Equations because they were easy to convert into programming in order to build components for the simulation.
As a side project, the team investigated into whether our method is random and unique by investigating how many combinations we could make and whether we could accurately predict which combination will occur.
Constitutive Gene Expression For Protein and mRNA Expression over Time
The general gene expression equation showing the process of protein synthesis
Biological insight had told us we need a model with constant gene expression. Investigating models from literature 1 so see which model would satisfy these conditions, and it was found the constitutive gene expression model was suitable to guide the model.
The first step was to take the general model from literature and apply it in our scenario using the proteins (GFP, ECHP, RFP.)
Figure 1
$$ \color{white}{ p \underset{t_{1} }{\rightarrow} m \underset{t_{2}}{\rightarrow} p } $$The equation above describes the process of which the gene undergoes transcription to produce mRNA. The mRNA carries the genetic information copied from the DNA which codes for protein. The expression of protein lead to fluorescence which is the desired output of the system.
Figure 2
$$ \color{white}{ m \underset{Degradation}{\rightarrow} \oslash } $$ $$ \color{white}{ p \underset{Degradation}{\rightarrow} \oslash } $$The two equations above state the same time, the concentration of protein and mRNA would undergo degradation which means the concentration would drop. However, since there is always protein and mRNA being created, over time, the creation and degradation keep the concentration constant. 2
The team applied Law of Mass Action, combining both equations for the concentration of protein and mRNA over time. This model can be described as:
Figure 3
$$ \color{white}{ \frac{dm}{dt} = k_{1} -d _{1 } m } $$ $$ \color{white}{ \frac{dp}{dt} = k_{2} \cdot m - d_{2} \cdot p } $$Where...
- m is the concentration of mRNA.
- p is the concentration of Protein.
- k 1 is the constitutive transcription rate. This represents the number of mRNA molecules produced per gene, per unit of time.
- d 1 is the mRNA degradation rate.
- k 2 is the translation rate. This represents the number of protein molecules produced per mRNA molecule, per unit of time.
- d 2 is the protein degradation rate.
- t 1 is the process of Transcription.
- t 2 is the process of Translation.
This is important because the model could then calculate the concentration of proteins expected over time. This is useful as the team used information to calculate the total emitted light spectra during the time period, which is what the looked for within the system. However, the constants and variables are individual for each protein and which means parameters for each protein would need to be found. These constants were found using literature 3 (for GFP) and lab results (the rest.)
1 GB Stan, 20137. Modeling in Biology. London, the United Kingdom: Imperial College London. p, pp.59-65.
2 See Non-Inhibited conditions from Figure 5 Gene Transcription Regulation by Repressors (CRISPRi) - Concentration over Time
3 See Relationship between Max Fluorescence and Protein Concentration for more details
Gene Transcription Regulation by Repressors (CRISPRi) - Concentration over Time
Calculating how much protein is produced over time when a gene is inhibited
The next step in developing our simulation was to calculate our protein concentration at any given time when using CRISPRi. Discussion with wet-lab revealed our method would be using CRISPRi as a repressor, which works by inhibiting the expression of one or more genes by binding to the promoter region 1 . The expanded mRNA and Protein concentration models from the Constitutive Gene Expression Model 2 were modified to include the element of repression from the CRISPRi inhibition.
$ \downarrow \big\downarrow \Big\downarrow \bigg\downarrow \Bigg\downarrow \xdownarrow{2cm} $
This system can be described as above. Where gRNA(i), Cas9, and mRNA are produced constitutively with their associated rates of production kc, kg, and k0i respectively. The Cas9 and gRNA(i) will undergo an irreversible association to form Cas9:gRNA(i) at rate kf, which in turn inhibits the production of mRNA and reduce the production of Fluorescent protein (k1). All molecules spontaneously degrade and diffuse away at their own associated rate. (i) will account for us having multiple gRNAs and just as many fluorescent proteins i.e. i=3 with three fluorescent proteins and subsequent set of three gRNAs. It is asumed that all gRNAs have the same binding affinity and their productions are the same. The varying strengths of promoters for mRNA (koi) will be assigned to each corresponding gRNA in the set of (i).
The system can be described by the following 5 ordinary differential equations, defining how the concentration of each variable will change at any given change in time using mass action kinetics. Equations 1, 2 and 3 are derived from Farasat et al.(2016), which comprehensively investigated the rates at which CRISPR-Cas9 can cleave DNA targets.
$$ \color{white}{(1) \frac{dgRNA,i}{dt} = k_{g,i} – δ_{dg} \cdot gRNA,i – k_{f} \cdot Cas9 \cdot gRNA,i} $$
The above equation details the change in gRNA concentration per unit time, also extending along index i. At any given time, the concentration of gRNA(i) will be increased by its production (kgi), and decreased by its association with cas9 at rate kf, relative to it's concentration, and it will also degrade and diffuse away at rate δdg.
$$ \color{white}{(2) \frac{dCas9}{dt} = k_{c} – δ_{dc} \cdot Cas9 – k_{f} \cdot Cas9 \cdot \underset{i}{∑}gRNA,i} $$
This equation details the change in Cas9 protein per unit time. It will be increased by its production (kc) and reduced by its degradation (δdc), and again it's association to gRNA(s). This will be proportioal the sum of all the gRNA's along i, accounting for the competition for Cas9.
$$ \color{white}{(3) \frac{dCas9:gRNA,i}{dt} = k_{f} \cdot Cas9:gRNA,i – δ_{dcg} } $$
This equation details the change in concentration of the Cas9 associated with gRNA(i). This is simply the rate of formation from Equation 2, minus its degredation.
$$ \color{white}{(4) \frac{dmRNA,i}{dt} = k_{0i} \cdot \frac{1}{1+k{m} \cdot Cas9:gRNA,i} −δ_{dm} \cdot mRNA,i} $$
This equation details the change in mRNA(i), which is very similar to the equation seen earlier when describing transciption. This is produced at a rate k0i, but it is also inhibited by Cas9:grna(i), so there is a standard inhibition function which will reduce rate k0 as Cas9:gRNA(i) increases. It is also simply reduced by it's degradation and diffusion rate δdm.
$$ \color{white}{(5) \frac{dFP,i}{dt} = k_{1} \cdot mRNA,i – δ_{dp} \cdot FP,i} $$
This details the rate of translation and is the same as Equation 4; only changes to protein translation are increased proportionally to mRNA(i) and reduced by it's degradation and diffusion δdp. 3 :
The value for these constants and variables were taken from literature and calculating them 4 but later, adjusted to the lab results.
Figure 6
These simulations illustrate the relationship between the variables, and how we can predict how much of each will be present at any one time. It can also show how changes in parameters can effect the outcomes. The difference between the two simulaions here is that in the non-inhibited state gRNA production (kg) is very low, compared to normal in the inhibited state. It has also significantly reduced the amount of GFP produced.
Furthermore, the model could calculate the protein concentration at any given time, and so, the team was able to deduce how much fluorescence is being emitted at that time period by the bacteria
4 See Relationship between Max Fluorescence and Protein Concentration
Farasat, I. and Salis, H.M., 2016. A biophysical model of CRISPR/Cas9 activity for rational design of genome editing and gene regulation. PLoS computational biology, 12(1), p.e1004724.
Relationship between Max Fluorescence and Protein Concentration
Using our models to estimate the amount of fluorescence expected from a certain concentration of protein synthesized
Another issue the team faced was that at any given time, it was expected that the proteins would be expressed so the bacteria would fluoresce. This can be confirmed by looking at the bacteria after being constructed and observing that they are giving off light. However, it was unknown what intensity this fluorescence would be.
To solve this issue, an equation was developed to find out what the intensity of fluorescence would be at that certain time. This consisted of of calculating the protein concentration at the time and using real life lab data of the fluorescence at that time period, the team could map that intensity to the protein concentration at that time.
When the fluorescence data received from the wet lab were graphed, a model was constructed using the data. Originally, the data from the lab was the Fluorescence against Time but by using the Gene Transcription Regulation by Repressors model developed earlier 1 , the team were able to estimate the protein concentration at a certain time periods.
Figure 7
These graphs show the relationship between protein concentration of a certain type of protein and the intensity that can be expected of it. By integrating real life data into the models, we can have accurate representation of how the bacteria would behave in real life. This suggests that when comparing the modelled data to real life data from for our lab data set. there is a strong fit. However, this is not necessarily true for all cases: we simply only had data for the conditions we were using, which suggests that more data would be required for the models to be truly representative of real world data.
On evaluation, the fit for the CFP appears quite strange! Unlike GFP and RFP, this the trend line doesn't look similar. Insight from the wet lab suggested there were mistakes made with reading from the fluorescence reader, which can be attributed to this behaviour. One way to fix this is to check the settings for the readers and choose a wavelength which is exclusively going to cause the CFP to fluoresce as the Absorption and Emission Wavelength models suggests that using a wavelength of 375nm might mean interference from the GFP would be kept to a minimum. Furthermore, due to time constraints, rather than implementing the relationship directly from lab data, the data was fitted using a Polynomial Fit of Order 3 using Excel and an equation was calculated from these. These equations were directly plugged into the simulation. However, this is inaccurate as the R squared value was ... , suggesting that it doesn't fully capture the data trend. In order to improve this situation, if there was more data available for different scenarios such as with using different wavelengths and concentration of proteins, the model could be validated against more data and refined. Once done, this could substitute the polynomial fit. Lastly, to improve the data, rather than having to use another model to estimate the protein concentration, the team could read for protein concentration during fluorescence readings. This means there is a separate data set to validate the model from, to check whether our protein calculations were correct.
These relationships were implemented into the simulation to give the expected spectra produced by each protein. This highlights another use: by adding or subtracting values from our fit, we can create a threshold for our Keys. This was essential when developing the Raw Data Simulator. 2
1 See Gene Transcription Regulation by Repressors (CRISPRi) - Concentration over Time
2 See Software
Absorption and Emission Wavelengths From Given Concentrations of sfGFP, mRFP & ECFP
Working out which wavelengths are required to produce a fluorescence spectra.
After concluding the general scheme we would be using, the team evaluated the selection of proteins. The proteins selected for the system use fluorescence, indicating they take in a light at a certain wavelength, and re-emit it at a different wavelength. This had to be considered because it informs the wet-lab in knowing which wavelengths are required to produce a spectra as well as highlighting the importance of considering any side effects from producing the spectra such as light being reabsorbed and re-emitted at a different wavelength / colour, which would result in the spectra being similar to each other rather than unique.
In order to save time and program a model, the team used Semrock's Online Fluorescence graph maker 1 which operated by taking in the expected Absorption wavelengths and emitting the Emission wavelengths expected by sfGFP (green), mRFP (red) and ECFP (blue) proteins. This was done through the Web App on the website. Furthermore, they provided the raw data in a text file format which was useful as it allows the team to read the data into a stand alone program.
Figure 4
This graph tells us the emitted light is expected to be at a higher wavelength than the absorbed wavelength. This must be considered in the model as there is overlap between emitted and absorbed wavelengths implying emitted light may be absorbed and re-emitted at a higher wavelength.
Fortunately, the data points used to graph the spectra is available on the website as a raw data text file which was very useful as it meant we could read the data directly into our simulator when it was being implemented.
This model is important as it guides us when using wavelengths as parameters so we know which wavelengths to use, especially when trying to create a specific color as well as what wavelengths to look out for as they might cause overlap. This was very useful to the wet-lab as it informed them of what wavelengths to use as well as what wavelength range they should use to produce different fluorescence spectra.
Are Our Constructions Random?
Showing that our constructions are random and why they are random
When constructing our proteins with our current method, there were 3 vectors we could order from.
However, in this proof of concept, order is irrelevant as the gene is either inhibited (1) or not (0). Using
$$ \color{white}{ n ^ r } $$Where n = 2 and r = 3, this gives us a total combination of 2 3 {1,1,1} {1,1,0} {1,0,1} {1,0,0} {0,1,1} {0,1,0} {0,0,1} {0,0,0}
Randomness comes from the fact the system relies on Brownian Motion 1 , a random process to create these combinations.
However, in order for a movement to fall under Brownian Motion, it must fulfill a condition where the process must have continuous paths. This is not true as once the structures begin to form, the paths stop (they do not collide off each other elastically, but rather, combine.) Furthermore, the bacterium might become biased towards options that put less metabolic stress on the bacterium, which results in selection. Alternatively, using metabolites to undergo transposition can improve randomness. 2
In order to aid, with the wet lab in what combinations they can expect, the team developed an Excel Spreadsheet where a user can simply input details of the construction and it would show what construction it would look like
Members of the public are encouraged to try it out and can use it to help with identifying how their spectra would look if they used the same proteins the project used
Excel Spreadsheet1 Diaconis, P. and Shahshahani, M., 1981. Generating a random permutation with random transpositions. Probability Theory and Related Fields, 57(2), pp.159-179.
2 Motion, D.B. and Walk, R.R., 1991. Random walks, Brownian motion, and interacting particle systems.
Conclusion
What iGEM Nottingham 2017 learnt from modelling and how modelling impacted the project.
The main objectives of modelling were met: the simulation for calculating the fluorescence spectra was completed and was not only extensively used in the lab to generate spectra when the parameters consisted of different protein concentrations, but was used to produce dummy data for the comparison software to produce a demo for when industry contacts came to visit the labs. Furthermore, the models allowed for parameters we couldn't test for in the lab for example, what the spectra would look like if one protein was inhibited but the others weren't.
The main reason the team undertook a rigorous approach to modelling was because it wouldn't have been accurate to construct a single model to show how the fluorescence spectra would vary with protein concentration without taking into account elements such protein degradation, the impact of CRISPRi and whether wavelengths would impact how the strong the intensity is. The simulation simply allowed the team to combine all the models produced to give a desired output in a programming fashion so the model could be used by anyone without a maths and programming background.
Overall, the models showed that given a specific wavelength and a certain concentration of protein (ug/mol), a spectra will be produced. Furthermore, beyond helping to validate real world data, it helped to solve practical issues with the wet lab. The biggest issue modelling helped to solve was that the wet lab weren't able to produce any CFP fluorescence. The models showed that after 500nm, the CFP proteins wouldn't fluoresce, which suggested the solution to this problem would be to use a lower wavelength, such as 490nm (whereas the team had used 584nm.) Unfortunately, due to time constraints, this fix couldn't be implemented but nevertheless, modelling helped to reveal the complication. Issues like these were real world problems modelling helped to solve.