## MODELLING

#### Constitutive Gene Expression

##### The general gene expression equation showing the process of protein synthesis

Biological insight had told us we need a model with constant gene expression. Investigating models from literature ^{ 1 } so see which model would satisfy these conditions, and it was found the constitutive gene expression model was suitable to guide the model.

The first step was to take the general model from literature and apply it in our scenario using the proteins (GFP, ECHP, RFP.)

$$ \color{white}{ p \underset{t_{1} }{\rightarrow} m \underset{t_{2}}{\rightarrow} p } $$

The equation above describes the process of which the gene undergoes transcription to produce mRNA. The mRNA carries the genetic information copied from the DNA which codes for protein. The expression of protein lead to fluorescence which is the desired output of the system.

$$ \color{white}{ m \underset{Degradation}{\rightarrow} \oslash } $$ $$ \color{white}{ p \underset{Degradation}{\rightarrow} \oslash } $$ The two equations above state the same time, the concentration of protein and mRNA would undergo degradation which means the concentration would drop. However, since there is always protein and mRNA being created, over time, the creation and degradation keep the concentration constant. ^{ 2 }

The team applied Law of Mass Action, combining both equations for the concentration of protein and mRNA over time. This model can be described as:

$$ \color{white}{ \frac{dm}{dt} = k_{1} -d _{1 } m } $$ $$ \color{white}{ \frac{dp}{dt} = k_{2} \cdot m - d_{2} \cdot p } $$Where...

- m is the concentration of mRNA.
- p is the concentration of Protein.
- k
_{ 1 }is the constitutive transcription rate. This represents the number of mRNA molecules produced per gene, per unit of time. - d
_{ 1 }is the mRNA degradation rate. - k
_{ 2 }is the translation rate. This represents the number of protein molecules produced per mRNA molecule, per unit of time. - d
_{ 2 }is the protein degradation rate. - t
_{ 1 }is the process of Transcription. - t
_{ 2 }is the process of Translation.

This is important because the model could then calculate the concentration of proteins expected over time. This is useful as the team used information to calculate the total emitted light spectra during the time period, which is what the looked for within the system. However, the constants and variables are individual for each protein and which means parameters for each protein would need to be found. These constants were found using literature ^{ 3 } (for GFP) and lab results (the rest.)

^{ 1 } GB Stan, 20137. Modeling in Biology. London, the United Kingdom: Imperial College London. p, pp.59-65.

^{ 2 } See Non-Inhibited conditions from Figure 5 Gene Transcription Regulation by Repressors (CRISPRi) - Concentration over Time

^{ 3 } See Relationship between Max Fluorescence and Protein Concentration for more details

#### Gene Transcription Regulation by Repressors (CRISPRi)

##### Calculating how much protein is produced over time when a gene is inhibited

The next step in developing our simulation was to calculate our protein concentration at any given time when using CRISPRi. Discussion with wet-lab revealed our method would be using CRISPRi as a repressor, which works by inhibiting the expression of one or more genes by binding to the promoter region. The expanded mRNA and Protein concentration models from the Constitutive Gene Expression Model were modified to include the element of repression from the CRISPRi inhibition.

This system can be described as above. Where gRNA(i), dCas9, and mRNA are produced constitutively with their associated rates of production kc, kg, and k0i respectively. The dCas9 and gRNA(i) will undergo an irreversible association to form dCas9:gRNA(i) at rate kf, which in turn inhibits the production of mRNA and reduce the production of Fluorescent protein (k1). All molecules spontaneously degrade and diffuse away at their own associated rate. (i) will account for us having multiple gRNAs and just as many fluorescent proteins i.e. i=3 with three fluorescent proteins and subsequent set of three gRNAs. It is asumed that all gRNAs have the same binding affinity and their productions are the same. The varying strengths of promoters for mRNA (koi) will be assigned to each corresponding gRNA in the set of (i).

The system can be described by the following 5 ordinary differential equations, defining how the concentration of each variable will change at any given change in time using mass action kinetics. Equations 1, 2 and 3 are derived from Farasat *et al.*(2016), which comprehensively investigated the rates at which CRISPR-dCas9 can cleave DNA targets.

$$ \color{white}{(1) \frac{dgRNA,i}{dt} = k_{g,i} – δ_{dg} \cdot gRNA,i – k_{f} \cdot dCas9 \cdot gRNA,i} $$

The above equation details the change in gRNA concentration per unit time, also extending along index i. At any given time, the concentration of gRNA(i) will be increased by its production (kgi), and decreased by its association with dcas9 at rate kf, relative to it's concentration, and it will also degrade and diffuse away at rate δdg.

$$ \color{white}{(2) \frac{dCas9}{dt} = k_{c} – δ_{dc} \cdot dCas9 – k_{f} \cdot dCas9 \cdot \underset{i}{∑}gRNA,i} $$

This equation details the change in dCas9 protein per unit time. It will be increased by its production (kc) and reduced by its degradation (δdc), and again it's association to gRNA(s). This will be proportioal the sum of all the gRNA's along i, accounting for the competition for dCas9.

$$ \color{white}{(3) \frac{dCas9:gRNA,i}{dt} = k_{f} \cdot dCas9:gRNA,i – δ_{dcg} } $$

This equation details the change in concentration of the dCas9 associated with gRNA(i). This is simply the rate of formation from Equation 2, minus its degredation.

$$ \color{white}{(4) \frac{dmRNA,i}{dt} = k_{0i} \cdot \frac{1}{1+k{m} \cdot ds9:gRNA,i} −δ_{dm} \cdot mRNA,i} $$

This equation details the change in mRNA(i), which is very similar to the equation seen earlier when describing transciption. This is produced at a rate k0i, but it is also inhibited by dCas9:grna(i), so there is a standard inhibition function which will reduce rate k0 as dCas9:gRNA(i) increases. It is also simply reduced by it's degradation and diffusion rate δdm.

$$ \color{white}{(5) \frac{dFP,i}{dt} = k_{1} \cdot mRNA,i – δ_{dp} \cdot FP,i} $$

This details the rate of translation and is the same as Equation 4; only changes to protein translation are increased proportionally to mRNA(i) and reduced by it's degradation and diffusion δdp. ^{ 3 } :

The value for these constants and variables were taken from literature and calculating them ^{ 4 } but later, adjusted to the lab results.

Figure 6

These simulations illustrate the relationship between the variables, and how we can predict how much of each will be present at any one time. It can also show how changes in parameters can effect the outcomes. The difference between the two simulaions here is that in the non-inhibited state gRNA production (kg) is very low, compared to normal in the inhibited state. It has also significantly reduced the amount of GFP produced.

Furthermore, the model could calculate the protein concentration at any given time, and so, the team was able to deduce how much fluorescence is being emitted at that time period by the bacteria

^{ 1 } Farasat, I. and Salis, H.M., 2016. A biophysical model of CRISPR/Cas9 activity for rational design of genome editing and gene regulation. PLoS computational biology, 12(1), p.e1004724.

^{ 2 } See Relationship between Max Fluorescence and Protein Concentration

#### Relationship between Fluorescence and Protein Concentration

##### Using our models to estimate the amount of fluorescence expected from a certain concentration of protein synthesized

Another issue the team faced was that at any given time, it was expected that the proteins would be expressed so the bacteria would fluoresce. This can be confirmed by looking at the bacteria after being engineered and observing that they are giving off light. However, it was unknown what intensity this fluorescence would be.

To solve this issue, an equation was developed to find out what the intensity of fluorescence would be at that certain time. This consisted of calculating the protein concentration at the time and using fluorescence data provided by real lab experiments, at that time period, the team could map that intensity to the protein concentration at that time.

When the fluorescence data received from the wet lab were graphed, a model was constructed using the data. Originally, the data from the lab was the Fluorescence against Time but by using the Gene Transcription Regulation by Repressors model developed earlier ^{ 1 }, the team were able to estimate the protein concentration at given time points.

^{ Figure 7 }

These graphs show the relationship between protein concentration of a certain type of protein and the intensity that can be expected of it. By integrating real life data into the models, we can have accurate representation of how the bacteria would behave in real life. This suggests that when comparing the modelled data to real life data, there is a strong fit. However, this is not necessarily true for all cases: we simply only had data for the conditions we were using, which suggests that more data would be required for the models to be truly representative of real world data.

On evaluation, the fit for the CFP appears quite strange! Insight from the wet lab suggested there were mistakes made with reading from the fluorescence reader, which can be attributed to this behaviour. One way to fix this is set the spectro-photometer at a more restrictive wavelength that would minimise the cross-interference from GFP, like 375nm, as suggested by the Absorption and Emission Wavelength models developed earlier. Furthermore, due to time constraints, rather than implementing the relationship directly from lab data, the data was fitted using a Polynomial Fit of Order 3 using Excel and an equation was calculated from these. These equations were directly plugged into the simulation. However, this is inaccurate as the R squared value was 0.9148 for RFP, 0.9922 for CFP and 0.9478 for GFP, suggesting that it doesn't fully capture the data trend. Furthermore, on the plot themselves, they don't match the trend at all, suggesting using this method follows the trend poorly. In order to improve this situation, if there was more data available for different scenarios such as with using different wavelengths and concentration of proteins, the model could be validated against more data and refined. Once done, this could substitute the polynomial fit. Lastly, to improve the data, rather than having to use another model to estimate the protein concentration, the team could read for protein concentration during fluorescence readings. This means there is a separate data set to validate the model from, to check whether our protein calculations were correct.

These relationships were implemented into the simulation to give the expected spectra produced by each protein. This highlights another use: by adding or subtracting values from our fit, we can create a threshold for our Keys. This was essential when developing the Raw Data Simulator. ^{ 2 }

^{ 1 } See Gene Transcription Regulation by Repressors (CRISPRi) - Concentration over Time

^{ 2 } See Software

#### Absorption and Emission Wavelengths of sfGFP, mRFP & ECFP

##### Working out which wavelengths are required to produce a fluorescence spectra.

After concluding the general scheme we would be using, the team evaluated the selection of proteins. The proteins selected for the system use fluorescence, indicating they take in a light at a certain wavelength, and re-emit it at a different wavelength. This had to be considered because it informs the wet-lab in knowing which wavelengths are required to produce a spectra as well as highlighting the importance of considering any side effects from producing the spectra such as light being reabsorbed and re-emitted at a different wavelength, which would result in the spectra being similar to each other rather than unique.

In order to save time and program a model, the team used Semrock's Online Fluorescence graph maker ^{ 1 } which operated by taking in the expected Absorption wavelengths and emitting the Emission wavelengths expected by sfGFP (green), mRFP (red) and ECFP (blue) proteins. This was done through the Web App on the website. Furthermore, they provided the raw data in a text file format which was useful as it allows the team to read the data into a stand alone program.

The absorption and emission spectra from RFP, GFP and ECHP. The dotted lines show absorption wavelengths, and the solid lines show emission wavelengths.

This graph tells us the emitted light is expected to be at a higher wavelength than the absorbed wavelength. This must be considered in the model as there is overlap between emitted and absorbed wavelengths implying emitted light may be absorbed and re-emitted at a higher wavelength by different fluorescent proteins, which might dramatically alter the reading.

This model was important as it guided us for the spectro-photometer setup to determine what wavelength range to produce different fluorescence spectra. This was especially crucial selecting the wavelengths so only one type of protein would be expressed, which was useful when working with the random constructions.

#### Are Our Constructions Random?

##### Showing that our constructions are random and why they are random

When constructing our proteins with our current method, there were 3 vectors we could order from.

However, in this proof of concept, order is irrelevant as the gene is either inhibited (1) or not (0). Using

$$ \color{white}{ n ^ r } $$ Where n = 2 and r = 3, this gives us a total combination of 2^{ 3 } {1,1,1} {1,1,0} {1,0,1} {1,0,0} {0,1,1} {0,1,0} {0,0,1} {0,0,0}

Randomness comes from the fact the system relies on Brownian Motion ^{ 1 }, a random process to create these combinations.

However, in order for a movement to fall under Brownian Motion, it must fulfill a condition where the process must have continuous paths. This is not true as once the structures begin to form, the paths stop (they do not collide off each other elastically, but rather, combine.) Furthermore, the bacterium might become biased towards options that put less metabolic stress on the bacterium, which results in selection. Alternatively, using metabolites to undergo transposition can improve randomness. ^{ 2 }

In order to aid, with the wet lab in what combinations they can expect, the team developed an Excel Spreadsheet where a user can simply input details of the construction and it would show what construction it would look like

Members of the public are encouraged to try it out and can use it to help with identifying how their spectra would look if they used the same proteins the project used

^{ 1 } Diaconis, P. and Shahshahani, M., 1981. Generating a random permutation with random transpositions. Probability Theory and Related Fields, 57(2), pp.159-179.

^{ 2 } Motion, D.B. and Walk, R.R., 1991. Random walks, Brownian motion, and interacting particle systems.

#### Conclusion

##### What iGEM Nottingham 2017 learnt from modelling and how modelling impacted the project.

The main objectives of modelling were met: the ** simulation for calculating the fluorescence spectra was completed ** and was not only extensively used in the lab to generate spectra when the parameters consisted of different protein concentrations, but was used to produce dummy data for the ** comparison software ** to produce a demo for when industry contacts came to visit the labs. Furthermore, the models allowed for parameters we couldn't test for in the lab for example, what the spectra would look like if one protein was inhibited but the others weren't.

The main reason the team undertook a rigorous approach to modelling was because it wouldn't have been accurate to construct a single model to show how the fluorescence spectra would vary with protein concentration without taking into account elements such protein degradation, the impact of CRISPRi and whether wavelengths would impact how the strong the intensity is. The simulation simply allowed the team to combine all the models produced to give a desired output in a programming fashion so the model could be used by anyone ** without a maths and programming background.**

Overall, the models showed that given a specific wavelength and a certain concentration of protein (ug/mol), a ** spectra ** will be produced. Furthermore, beyond helping to validate real world data, it helped to solve practical issues with the wet lab. The biggest issue modelling helped to solve was that the wet lab weren't able to produce any CFP fluorescence. The models showed that after 375nm, the GFP proteins would fluoresce alongside the CFP proteins, which suggested the solution to this problem would be to use a lower wavelength, such as 375nm so only the CFP proteins would fluoresce without interference from the GFP. Unfortunately, due to time constraints, this fix couldn't be implemented but nevertheless, modelling helped to reveal the complication. Issues like these were real world problems modelling helped to solve.