# Team:Sydney Australia/Model

Modelling is an essential component of synthetic biology [1]. It is integral to the design and integration of the different components of any synthetic biology project. Our aim in modelling was to simulate the behaviour of our project to gain insight into how to best improve it. For our project, we saw three levels at which modelling could aid in the pursuit of its central aims.

As has been explored on our integrated human practices and applied design pages, the problem of insulin accessibility is complex and multi-faceted. As such, we decided it was not enough to consider our project as a problem whose solution could be found solely in a test tube. Distilled down, our project can be viewed as three sequential aims which we believe together can be used to address insulin accessibility.

Our modelling efforts were split into three branches, which reflected these major aspects of our project

Difficulties optimising production of recombinant are a key issue in the state of its accessibility.

in silico experiments to simulate how best to optimise expression led to theoretical insights which informed the direction of our efforts.

It is imperative to test the feasibility of our recombinant insulin as a therapeutic for diabetics.

Modelling the effects of changes to insulin’s biochemical makeup on its therapeutic effects supplement our wet-lab efforts to characterise our molecule

In addition, the project would be moot without a consideration of the insulin market as a whole.

Modelling helped us to gain insight into the global insulin market, which informed our approach towards entrepeneurship.

#### Economic

We began our experimental modelling by using a mechanistic model of an E. Coli cell developed in [1]. The methodology behind integrating models of our expression system into this model was to more accurately reflect reality. Recombinant protein expression occurs within a complex cellular environment with finite resources. A model which ignores the actiities of the host cells would ignore important host-circuit interactions. Ignoring the finite resources of the cell may skew our prediction of the yield of our expression systems. See below for details on the whole cell model we used.

#### Whole Cell Model

A model of the E. Coli cell including nutrient import, its conversion to cellular energy, and the transcription and translation of four categories of proteins was developed in [1]. It builds into the model considerations of the finite levels of cellular energy, ribosomes and cell mass.

Tables 1 and 2 detail the list of reactions that were considered in the model.

Table 1: List of reactions relating to the expression and degradation of four protein species considered in whole cell model developed in [1]
Protein Species (symbol) Dilution of protein Transcription Dilution/degradation of mRNA Ribosome binding Dilution of ribosome-bound mRNA Translation
Ribosomes $$\color{#3e3f3f}{(r)}$$ $\color{#3e3f3f}{ r\xrightarrow{\lambda}\varnothing }$ $\color{#3e3f3f}{ \varnothing\xrightarrow{\omega_r}m_r}$ $\color{#3e3f3f}{ m_r\xrightarrow{\lambda+d_m}\varnothing}$ $\color{#3e3f3f}{ r+m_r\xrightarrow{k_b, k_u}c_r}$ $\color{#3e3f3f}{ c_r\xrightarrow{\lambda}\varnothing}$ $\color{#3e3f3f}{ n_ra+c_r\xrightarrow{\upsilon_r}r+m_r+r}$
Transporter enzyme $$\color{#3e3f3f}{(e_t)}$$ $\color{#3e3f3f}{ e_t\xrightarrow{\lambda}\varnothing}$ $\color{#3e3f3f}{\varnothing\xrightarrow{\omega_r}m_t}$ $\color{#3e3f3f}{ m_t\xrightarrow{\lambda+d_m}\varnothing}$ $\color{#3e3f3f}{ r+m_t\xrightarrow{k_b, k_u}c_t}$ $\color{#3e3f3f}{ c_t\xrightarrow{\lambda}\varnothing}$ $\color{#3e3f3f}{ n_ta+c_t\xrightarrow{\upsilon_t}r+m_t+e_t}$
Metabolic enzyme $$\color{#3e3f3f}{(e_m)}$$ $\color{#3e3f3f}{ e_m\xrightarrow{\lambda}\varnothing}$ $\color{#3e3f3f}{ \varnothing\xrightarrow{\omega_r}m_m}$ $\color{#3e3f3f}{m_m\xrightarrow{\lambda+d_m}\varnothing}$ $\color{#3e3f3f}{ r+m_m\xrightarrow{k_b, k_u}c_m}$ $\color{#3e3f3f}{ c_m\xrightarrow{\lambda}\varnothing}$ $\color{#3e3f3f}{ n_ma+c_m\xrightarrow{\upsilon_m}r+m_m+e_m}$
Growth-independent/ housekeeping proteins $$\color{#3e3f3f}{(q)}$$ $\color{#3e3f3f}{ q\xrightarrow{\lambda}\varnothing}$ $\color{#3e3f3f}{ \varnothing\xrightarrow{\omega_r}m_q}$ $\color{#3e3f3f}{ m_q\xrightarrow{\lambda+d_m}\varnothing}$ $\color{#3e3f3f}{ r+m_q\xrightarrow{k_b, k_u}c_q}$ $\color{#3e3f3f}{ c_q\xrightarrow{\lambda}\varnothing}$ $\color{#3e3f3f}{ n_qa+c_q\xrightarrow{\upsilon_q}r+m_q+q}$

Table 2: Nutrient metabolism and cellular energy reactions in whole cell model developed in [1]
Protein Species (symbol) Dilution of protein Nutrient Import Metabolism
Internal Nutrient $$\color{#3e3f3f}{(s_i)}$$ $\color{#3e3f3f}{ s_i\xrightarrow{\lambda}\varnothing}$ $\color{#3e3f3f}{ s\xrightarrow{\upsilon_{imp}}s_i}$ $\color{#3e3f3f}{ s_i\xrightarrow{\upsilon_{cat}}n_sa}$
ATP $$\color{#3e3f3f}{(a)}$$ $\color{#3e3f3f}{ a\xrightarrow{\lambda}\varnothing}$ - -

See table 3 for notation relating to rates and parameters in [1].
Table 3: Notation for rates in [1]
Symbol Meaning
$\color{#3e3f3f}{ \upsilon_{imp}}$ Rate of nutrient import
$\color{#3e3f3f}{\upsilon_{cat}}$ Rate of nutrient metabolism
$\color{#3e3f3f}{\lambda}$ Growth Rate
$\color{#3e3f3f}{n_x\textrm{ with } x\in\{r,t,m,q\}}$ length of proteins of different species
$\color{#3e3f3f}{\upsilon_x \textrm{ with } x\in\{r,t,m,q\}}$ Rate of translating protein species'
$\color{#3e3f3f}{k_b}$ mRNA ribosome binding rate
$\color{#3e3f3f}{k_u}$ mRNA ribosome unbinding rate
$\color{#3e3f3f}{\omega_x \textrm{ with } x\in\{r,t,m,q\}}$ Transcription rates of the four species of proteins
$\color{#3e3f3f}{d_m}$ mRNA degradation rate

A system of 14 differential equations were derived from these reactions. $\color{#3e3f3f}{ \frac{d}{dt}s_i=\upsilon_{imp} (e_t,s)-\upsilon_{cat}(e_m,s_i)-\lambda s_i}$ $\color{#3e3f3f}{ \frac{d}{dt}a=n_s\cdot\upsilon_{cat}(e_m,s_i)-\sum_{x\in\{r,t,m,q\}}n_x\upsilon_x(c_x,a)-\lambda a}$ $\color{#3e3f3f}{ \frac{d}{dt}r=\upsilon_r(c_r,a)-\lambda r+\sum_{x\in\{r,t,m,q\}} (\upsilon_x(c_x,a)-k_brm_x+k_uc_x)}$ $\color{#3e3f3f}{ \frac{d}{dt}e_r=\upsilon_t(c_t,a)-\lambda e_t}$ $\color{#3e3f3f}{ \frac{d}{dt}e_m=\upsilon_m(c_m,a)-\lambda e_m}$ $\color{#3e3f3f}{ \frac{d}{dt}q=\upsilon_q(c_q,a)-\lambda q}$ $\color{#3e3f3f}{ \frac{d}{dt}m_x=\omega_x(a)-(\lambda+d_m)m_x+\upsilon_x(c_x,a)-k_brm_x+k_uc_x \qquad \textrm{for } x\in\{r,t,m,q\}}$ $\color{#3e3f3f}{\frac{d}{dt} c_x=-\lambda c_x+k_brm_x-k_uc_x-\upsilon_x(c_x,a) \qquad \textrm{for } x\in\{r,t,m,q\}}$

We derived models to reflect our three expression systems (as seen in the diagram below) and used these in conjunction wiht the model in [1] to investigate in silico how to optimise recombinant protein production.

We then developed a model to reflect the production of our recombinant insulin. First we modelled the production of recombinant insulin in an E. coli cytoplasm. We included transcription, translation, folding and aggregation into inclusion bodies, as well as dilution and degradation. See below for details on our model of recombinant insulin expression in the cytoplasm.

#### Cytoplasmic Expression Model

We modelled the rate of change of five biochemical species in the cell (Table 5)

Table 4. Cytoplasmic Expression Model Variables
Symbol Meaning
$\color{#3e3f3f}{m_p}$ free mRNA of recombinant protein
$\color{#3e3f3f}{c_p}$ ribosome-bound mRNA of recombinant protein
$\color{#3e3f3f}{p_u}$ Unfolded recombinant protein
$\color{#3e3f3f}{p_f}$ Folded recombinant protein
$\color{#3e3f3f}{p_a}$ recombinant protein aggregated in inclusion bodies

A diagram showing the species we modelled and notation used is shown below:

Table 5 details the reactions considered in the model

Table 5. List of reactions considered in cytoplasmic protein expression model
Process Reaction Rate
Transcription $\color{#3e3f3f}{\varnothing\rightarrow m_p}$ $\color{#3e3f3f}{\omega_p(a)}$
Dilution and degradation of mRNA $\color{#3e3f3f}{m_p\rightarrow\varnothing}$ $\color{#3e3f3f}{\lambda+d_m}$
ribosome binding $\color{#3e3f3f}{r+m_p\rightleftharpoons c_p}$ $\color{#3e3f3f}{\textrm{forward: } k_b \textrm{, reverse: } k_u}$
Dilution of ribosome-bound protein $\color{#3e3f3f}{c_p\rightarrow\varnothing}$ $\color{#3e3f3f}{\lambda}$
Translation $\color{#3e3f3f}{n_pa+c_p\rightarrow m_p+p_u+r}$ $\color{#3e3f3f}{\upsilon_p(c_p,a)}$
Aggregation $\color{#3e3f3f}{p_u\rightarrow p_a}$ $\color{#3e3f3f}{k_a}$
Folding $\color{#3e3f3f}{p_u\rightarrow p_f}$ $\color{#3e3f3f}{k_f}$
Dilution and degradation of folded protein $\color{#3e3f3f}{p_f\rightarrow \varnothing}$ $\color{#3e3f3f}{\lambda+k_d}$

Here, $$\omega_p(a)$$, the rate of transcription, is an energy dependent process.

We used the transcription rate form used in [1] to denote the amount being transcribed ($$\omega_p(a)$$). That is,

$\color{#3e3f3f}{\omega_p(a)=w_p \frac{a}{\theta_p+a} }$

Where $$w_p$$ is the maximal rate of transcription, dependent on the speed of transcriptional elongation, as well as the gene length, induction and copy number. $$a$$ is the energy in the cell such as ATP (transcription is an energy dependent process), and $$\theta_p$$ is the transcriptional threshold of the recombinant protein.

In addition, we used the form in [1] for the translation rate term

$\color{#3e3f3f}{\upsilon_p(c_p,a)=c_p \frac{\gamma(a)}{n_p} }$

Where $$n_p$$ is the length of recombinant protein, and $$\gamma(a)$$ is an expression for the rate of transcriptional elongation:

$\color{#3e3f3f}{\gamma(a)=\frac{\gamma_{max} a}{K_{\gamma} + a} }$

Where $$\gamma_{max}$$ is the maximal rate of translation, $$K_{\gamma}$$ is the translational elongation threshold, and $$a$$ is the energy in the cell.

For the model of inclusion body aggregation, we assumed first order deposition of monomers of unfolded protein, dependent on the concentration of unfolded protein. as in Hoffmann et al (2001).

Using the law of mass action kinetics we can derive a set of ordinary differential equations from these reactions.

# Summary of Cytoplasmic Expression Model

$\color{#3e3f3f}{\frac{d}{dt}{m}_p=\omega_p(a)+\upsilon_p(c_p,a)+k_uc_p-(\lambda +d_m)m_p-k_brm_p}$ $\color{#3e3f3f}{\frac{d}{dt}{c}_p=k_brm_p-\lambda c_p-k_uc_p-\upsilon_p(c_p,a)}$ $\color{#3e3f3f}{\frac{d}{dt}{p}_u=\upsilon_p(c_p,a)-(k_f+k_a+\lambda)p_u}$ $\color{#3e3f3f}{\frac{d}{dt}{p}_a=k_ap_u-\lambda p_a}$ $\color{#3e3f3f}{\frac{d}{dt}{p}_f=k_fp_u-(k_d+\lambda) p_f}$

# Parametrising the model

Table 6 shows the parameters we needed to find for our model, and the values we used

Table 6. Cytoplasmic Expression Model Parameters. * Set to 0 as degradation is dominated by the rate of dilution due to cell division for stable proteins [3]
Symbol Meaning Default value Units Source
$\color{#3e3f3f}{w_p}$ Maximal rate of transcription <10^3 mRNAs/min Proportional to induction level. Varied around realistic values as recommended by[1]
$\color{#3e3f3f}{\theta_p}$ transcriptional threshold of the recombinant protein 4.38 [molecs/cell] [1]
$\color{#3e3f3f}{n_p}$ Length of recombinant protein 312/255 [aa/molecs] Length of cytoplasmic proinsulin/winsulin gblock *link to design/parts page?*
$\color{#3e3f3f}{\gamma_{max}}$ Maximal rate of translation 1260 [aa/ min molecs] [1]
$\color{#3e3f3f}{K_{\gamma}}$ Translational elongation threshold 7 [molecs/ cell] [1]
$\color{#3e3f3f}{k_u}$ Rate of unbinding of mRNA and ribosomes 1 [/min] [1]
$\color{#3e3f3f}{k_b}$ Rate of binding of mRNA and ribosomes 1 [cell/ min molecs] [1]
$\color{#3e3f3f}{d_m}$ degradation rate of mRNA 0.1 [/min] [1]
$\color{#3e3f3f}{k_f}$ Rate of protein folding 0.14 [/min] adapted to fit units from [2]
$\color{#3e3f3f}{k_a}$ Rate of protein aggregation 0.21 [/min] adapted to fit units from [2]
$\color{#3e3f3f}{k_d}$ Rate of protein degradation 0 *

Next, we modelled the steps in our periplasmic expression system, including transcription, translation, translocation, and folding in the periplasm.

#### Periplasmic Expression Model

We looked at periplasmic expression of our recombinant protein in E. coli. We modelled the rate of change of 6 species (Table 7)

Cytoplasmic Expression Model Variables
Symbol Meaning
$\color{#3e3f3f}{m_p}$ free mRNA of recombinant protein
$\color{#3e3f3f}{c_p}$ ribosome-bound mRNA of recombinant protein
$\color{#3e3f3f}{p_c}$ Unfolded recombinant protein in the cytoplasm
$\color{#3e3f3f}{p_t}$ Unfolded recombinant protein bound to transporter
$\color{#3e3f3f}{p_u}$ Unfolded recombinant protein in the periplasm
$\color{#3e3f3f}{p_f}$ Folded recombinant protein in the periplasm

A diagram showing the species we modelled and notation used is shown below (Figure 2)

The reactions in table 8 were considered

Table 8. List of reactions considered in periplasmic protein expression model
Process Reaction Rate
Transcription $\color{#3e3f3f}{\varnothing\rightarrow m_p}$ $\color{#3e3f3f}{\omega_p(a)}$
Dilution and degradation of mRNA $\color{#3e3f3f}{m_p\rightarrow\varnothing}$ $\color{#3e3f3f}{\lambda+d_m}$
ribosome binding $\color{#3e3f3f}{r+m_p\rightleftharpoons c_p}$ $\color{#3e3f3f}{\textrm{forward: } k_b \textrm{, reverse: } k_u}$
Dilution of ribosome-bound protein $\color{#3e3f3f}{c_p\rightarrow\varnothing}$ $\color{#3e3f3f}{\lambda}$
Translation $\color{#3e3f3f}{n_pa+c_p\rightarrow m_p+p_u+r}$ $\color{#3e3f3f}{\upsilon_p(c_p,a)}$
Translocator binding $\color{#3e3f3f}{p_c+t\rightarrow p_t}$ $\color{#3e3f3f}{k_bt}$
Translocation $\color{#3e3f3f}{p_t\rightarrow p_u}$ where $$t$$ refers to the amount of translocons $\color{#3e3f3f}{\tau(p_t,a)}$
Folding $\color{#3e3f3f}{p_u\rightarrow p_f}$ $\color{#3e3f3f}{k_f}$
Dilution and degradation of folded protein $\color{#3e3f3f}{p_f\rightarrow \varnothing}$ $\color{#3e3f3f}{\lambda+k_d}$

here, $$\omega(a)$$ and $$\upsilon(c_p,a)$$ are as in the cytoplasmic reactions. The amount being transported is found with the term $$\tau_p(p_t,a)$$. Protein translocation to the periplasm occurs via an ATP-dependent motor protein, secA [4]. Post-translational translocation uses ATP as a stepwisesource of energy to drive the protein through the membrane. It follows mechanism illustrated in Figure 3 [4].

Following the logic used to derive the translation rate in [1], we derive the net rate of translocating a protein $$p$$ by defining $$K_p:=\frac{k_1k_2}{k_{-1}+k_2}$$. This leads to

$\color{#3e3f3f}{\tau_p(p_t,a)=p_t\Big(\frac{n_p}{50}\Big(\frac{1}{K_pa}+\frac{1}{k_2} \Big)+\frac{1}{k_t}\Big)^{-1}}$

If we assume the final termination step is fast, so $$\frac{1}{k_t}<< \frac{n_p}{50}\Big(\frac{1}{K_pa}+\frac{1}{k_2} \Big)$$, this is approximately equal to

$\color{#3e3f3f}{\tau_p(p_t,a)\approx 50p_t \frac{\epsilon(a)}{n_p}\qquad \epsilon(a):=\frac{\epsilon_{max}a}{K_{\epsilon}+a} }$

Where $$\color{#3e3f3f}{\epsilon_{max}}$$ is the maximal translocation rate, $$\color{#3e3f3f}{K_{\epsilon}}$$ is the threshold, and $$\color{#3e3f3f}{n_p}$$ is the length of the protein in amino acids

### Parametrising Translocation

To find the parameters for translocation ($$\color{#3e3f3f}{\epsilon(a)}$$) and ($$\color{#3e3f3f}{K_{\epsilon}}$$), we used kinetic parameters determined in [5]. They measured translocation of a 346aa protein proOmpA and found the apparent Km of SecA was 50nM, and the threshold was 2.7 proOmpa/site/min. The concentration of $$\color{#3e3f3f}{1nM}$$ in E. coli is $$\color{#3e3f3f}{\approx}$$ 1 molecule/cell [6], so $$\color{#3e3f3f}{K_m=50 molecs/ cell}$$. Using the length of proOmpa, the threshold converts to 2.7 $$\cdot$$ 346 proOmpA/site/min aa/proOmpa $$\rightarrow$$ 934.2 aa/molec/min

Using the law of mass action kinetics we can derive a set of ordinary differential equations from these reactions.

# Summary of Periplasmic Expression Model

$\color{#3e3f3f}{\frac{d}{dt}{m}_p=\omega_p(a)+\upsilon_p(c_p,a)+k_uc_p-(\lambda +d_m)m_p-k_brm_p}$ $\color{#3e3f3f}{\frac{d}{dt}p=k_brm_p-\lambda c_p-k_uc_p-\upsilon_p(c_p,a)}$ $\color{#3e3f3f}{\frac{d}{dt}c=\upsilon_p(c_p,a)-(k_{bt}t+\lambda)p_c}$ $\color{#3e3f3f}{\frac{d}{dt}t=k_{bt}tp_c-\tau_p(p_t,a)-\lambda p_t}$ $\color{#3e3f3f}{\frac{d}{dt}u=\tau_p(p_t,a)-(k_f+\lambda) p_u}$ $\color{#3e3f3f}{\frac{d}{dt}f=k_fp_u-(k_d+\lambda)p_f}$

# Parametrising the model

Table 9 shows the parameters we needed to find for our model, and the values we used

Table 9. Periplasmic Expression Model Parameters. † Doubled relative to cytoplasmic folding rate to reflect the effect of an oxidising environment on disulfide bond formation. * Set to 0 as degradation is dominated by the rate of dilution due to cell division for stable proteins [3]
Symbol Meaning Default value Units Source
$\color{#3e3f3f}{w_p}$ Maximal rate of transcription <10^3 mRNAs/min Proportional to induction level. Varied around realistic values as recommended by[1]
$\color{#3e3f3f}{\theta_p}$ transcriptional threshold of the recombinant protein 4.38 [molecs/cell] [1]
$\color{#3e3f3f}{n_p}$ Length of recombinant protein 312/255 [aa/molecs] Length of cytoplasmic proinsulin/winsulin gblock *link to design/parts page?*
$\color{#3e3f3f}{\gamma_{max}}$ Maximal rate of translation 1260 [aa/ min molecs] [1]
$\color{#3e3f3f}{K_{\gamma}}$ Translational elongation threshold 7 [molecs/ cell] [1]
$\color{#3e3f3f}{k_u}$ Rate of unbinding of mRNA and ribosomes 1 [/min] [1]
$\color{#3e3f3f}{k_b}$ Rate of binding of mRNA and ribosomes 1 [cell/ min molecs] [1]
$\color{#3e3f3f}{d_m}$ degradation rate of mRNA 0.1 [/min] [1]
$\color{#3e3f3f}{t}$ Number of translocons in a cell 500 [/cell] [5]
$\color{#3e3f3f}{k_{bt}}$ Rate of protein binding to translocon 1 [cell /min molecs] [1]
$\color{#3e3f3f}{\epsilon_{max}}$ Maximal translocation rate 934.2 [aa /min molecs] [5]
$\color{#3e3f3f}{K_{\epsilon}}$ Translocational threshold 50 [molecs/ cell] [5]
$\color{#3e3f3f}{k_f}$ Rate of protein folding 0.28 [/min]
$\color{#3e3f3f}{k_d}$ Rate of protein degradation 0 *

We also modelled our third expression system in Bacillus,including transcription, translation, secretion, and folding extracellularly.

#### Bacillus Secretory Expression Model

We also developed a model of our secretory protein expression system in bacillus subtilis. The model included 6 species (table 10)

Table 10. Cytoplasmic Expression Model Variables
Symbol Meaning
$\color{#3e3f3f}{m_p}$ free mRNA of recombinant protein
$\color{#3e3f3f}{c_p}$ ribosome-bound mRNA of recombinant protein
$\color{#3e3f3f}{p_c}$ Unfolded recombinant protein in the cytoplasm
$\color{#3e3f3f}{p_t}$ Unfolded recombinant protein bound to transporter
$\color{#3e3f3f}{p_u}$ Unfolded recombinant protein in the medium
$\color{#3e3f3f}{p_f}$ Folded recombinant protein in the medium

A diagram showing the species we modelled and notation used is shown below (Figure 4)

Structurally, this is the same process as the periplasmic expression system, so the equations' structure is the same. However the parameters are different, reflecting the different environment of bacillus and medium and its effect on expression of recombinant protein.

# Summary of Bacillus Secretory Expression Model

$\color{#3e3f3f}{\frac{d}{dt}{m}_p=\omega_p(a)+\upsilon_p(c_p,a)+k_uc_p-(\lambda +d_m)m_p-k_brm_p}$ $\color{#3e3f3f}{\frac{d}{dt}{c}_p=k_brm_p-\lambda c_p-k_uc_p-\upsilon_p(c_p,a)}$ $\color{#3e3f3f}{\frac{d}{dt}{p}_c=\upsilon_p(c_p,a)-(k_{bt}t+\lambda)p_c}$ $\color{#3e3f3f}{\frac{d}{dt}{p}_t=k_{bt}tp_c-\tau_p(p_t,a)-\lambda p_t}$ $\color{#3e3f3f}{\frac{d}{dt}{p}_u=\tau_p(p_t,a)-(k_f+\lambda) p_u}$ $\color{#3e3f3f}{\frac{d}{dt}{p}_f=k_fp_u-(k_d+\lambda)p_f}$

We did not parametrise the bacillus model, instead choosing to focus on comparing cytoplasmic and periplasmic E. coli expression for our in silico experiments.

Once we had developed models to reflect our different expression systems, we integrated them into the whole cell model from [1], and performed in silico experiments, comparing predicted yield and parameter scanning to discover any insights into how to optimise recombinant protein production.

#### In Silico Experiments

Once we had modelled our different expression systems for recombinant insulin, we integrated them into the whole cell model developed in [1].

We then interpreted these models using matlab for insights into how to optimise the expression of insulin;

# Comparing Cytoplasmic and Periplasmic Expression

(A)
(B)

First, we looked at the dynamics of the two models in the first 25 minutes of recombinant protein expression

Cytoplasmic and Periplasmic expression showed very different behaviour. The cytoplasmic model predicted a quick peak in unfolded protein in the cytoplasm which is then depleted, and a large amount of protein aggregating in inclusion bodies (Figure 5).

The periplasmic model predicted that unfolded protein in the cytoplasm would be translocated very quickly, which corresponds well to the fact that translocation is a fast event in E. coli [5]. The higher protein folding rate for insulin in the periplasm results in the unfolded protein depleting quickly, resulting in a much higher yield of folded protein predicted by the periplasmic model to the cytoplasmic model.

(A)
(B)

After the initial dynamics, the model reaches a steady state for both cytoplasmic and periplasmic expression (Figure 6).

The cytoplasmic model predicts that unfolded proteins will continue to aggregate in the cytoplasm to a larger degree than they fold, while the periplasmic model predicts that unfolded protein amount will become negligible. In addition the yield of folded protein in the cytoplasm plateaus at $$\color{#3e3f3f}{7.7014\times10^4}$$ while the yield of folded periplasmic protein plateaus at $$\color{#3e3f3f}{19.128\times10^4}$$. Therefore the model predicts that periplasmic expression will yield almost 3-fold higher expression of recombinant insulin than cytoplasmic expression.

# Parameter Scanning to Optimise Expression

We then wanted to scan parameter values to see how we could optimise folded protein yield.

$$\color{#3e3f3f}{\omega_p}$$, the maximal rate of transcription, is proportional to induction level. Varying it is equivalent to varying the concentration of IPTG used to induce expression. We therefore varied the parameter to see how it effected predicted protein expression in the model. We explored $$\color{#3e3f3f}{\omega_p\in [1,10^4]}$$ as these are around the bounds of realistic values [1].

(A)
(B)

We found that the yield of folded protein followed a logarithmic increase in relation to $$\color{#3e3f3f}{\omega_p}$$ (figure 7). The model predicts that at a low degree of induction ($$\color{#3e3f3f}{\omega_p}<200$$), the yield of folded protein is comparable, however at higher values the cytoplasmic yield is much lower. This correlates well with the fact that inclusion body formation increasing with induction rate, and therefore decreasing the yield of recombinant protein, is a well known issue in synthetic biology [7].

We next asked if there was any parameter we could change in the cytoplasmic expression model so that expression levels in the cytoplasm could match levels in the periplasm, correlating to some experimental step we could take

$$\color{#3e3f3f}{k_f}$$, the rate of protein folding in the cytoplasm, greatly affects protein yield as our model supposes that insoluble aggregates of recombinant protein is caused by the association of protein that has not folded properly yet (matching experimental knowledge [8]). Since aggregated protein cannot fold in our model, aggregation sequesters away protein and decreases the folded protein yield. We wanted to know if we could increase $$\color{#3e3f3f}{k_f}$$ in the cytoplasmic model to such a degree that cytoplasmic yield matched periplasmic yield.

We found that periplasmic protein yield could not be matched within realistic parameter values of $$\color{#3e3f3f}{k_f}$$, however the protein yield did increase with $$\color{#3e3f3f}{k_f}$$ (figure 8). Thus, in order to improve protein yield in cytoplasm, we used a SHuffle strain of E. coli, which promotes disulfide bond formation in the periplasm, as our modelling predicted it would improve yield.

# Experimental Modelling References

1. Weisse, A.Y., Diego, A.O., Danos, V., Swain, P.S. (2015). Mehchanistic links between cellular trade-offs, gene expression, and growth. Proc Natl Acad Sci U S A. 112(9):E1038-47
2. Hoffman, F., Posten, C., Rinas, U. (2001). Kinetic model of in vivo folding and inclusion body formation in recombinant Escherichia coli. Biotechnol Bioeng. 72(3):315-22
3. Taniguchi, Y., Choi, P.J., Li, G.W., Chen, H., Babu, M., Hearn, H., Emili, A., Xie, S. (2010). Quantifying E. coli Proteome and Transcriptome with Single-Molecule Sensitivity in Single Cells. Science. 329(5991):533-538
4. Natale, P., Bruser, T., Driessen, A.J.M. (2008). Sec- and Tat-mediated protein secretion across the bacterial cytoplasmic membrane- Distinct translocases and mechanisms. Biochemica et Biophysica Acta- Biomembranes. 1998(9):1735-1756
5. Keyzer, J., Does, C., Driessen, A. (2002). Kinetic Analysis of the Translocation of Fluorescent Precursor Proteins into Escherichia coli Membrane Vesicles. The Journal of Biological Chemistry. 227:46059-46065
6. BioNumbers. Key Numbers for Cell Biologists. [online] Available at: http://bionumbers.hms.harvard.edu/Includes/KeyNumbersLinks.pdf
7. Thomas, J.G., Baneyx, F. (1996) Protein Misfolding and Inclusion Body Formation in Recombinant Escherichia coli Cells Overexpressing Heat-shock Proteins. The Journal of Biological Chemistry<. 271:11141-11147/li>
8. Upadhyay, A.K., Murmu, A., Sing, A., Panda, A.K. (2012). Kinetics of Inclusion Body Formation and its Correlation with the Characteristics of Protein Aggregates in Escherichia coli. PLoS One. 7(3):e33951

For our physiological modelling, we used a model of subcutaneous insulin absorption developed in [1] and used it to relate the free energy of insulin hexamer formation and insulin dynamics. We then used thermodynamic modelling to make an estimate of the relative time of peak of action, and the duration of action of our novel insulin analogue (Winsulin).

The most widely used therapy for Diabetes type 1 is subcutaneously injected insulin. Insulin has a propensity to self-associated into hexamers. When injected, these insulin hexamers dissociate into dimers which can then be absorbed into the bloodstream (See the diagram below).

The authors of [1] developed a system of partial differential equations to describe the insulin infusion process. They modelled the change in three species:

Table 1. Variables in model of insulin infusion
Symbol Meaning
$\color{#3e3f3f}{c_d}$ Insulin in dimeric form
$\color{#3e3f3f}{c_h}$ Insulin in hexamer form
$\color{#3e3f3f}{c_b}$ Insulin in bound form

They modelled the conversion between hexameric and dimeric insulin as follows:

InsulinHexamer $$\color{#3e3f3f}{\rightleftharpoons}$$ InsulinDimer

Where the forward rate was called $$\color{#3e3f3f}{P}$$ and the reverse rate was $$\color{#3e3f3f}{PQ}$$ where we can interpret $$\color{#3e3f3f}{P}$$ as the production rate and $$\color{#3e3f3f}{Q}$$ as the equilibrium constant.

The final model was as follows

\color{#3e3f3f}{\eqalignno{{\partial c_{d}(t,r)\over\partial t}=&\,P\left(c_{h}(t,r)-Qc_{d}(t,r)^{3}\right)-B_{d}c_{d}(t,r)\cr&+D\nabla^{2}c_{d}(t,r),\cr{\partial c_{h}(t,r)\over\partial t}=&\,-P\left(c_{h}(t,r)-Qc_{d}(t,r)^{3}\right)\cr&+D\nabla^{2}c_{h}(t,r)}}

Where $$\color{#3e3f3f}{P, Q, B_d, D}$$ are parameters, and exogenous insulin flow is obtained by integrating the expression denoting the amount of insulin dimer entering the bloodstream: $\color{#3e3f3f}{I_{ex}(t)=B_{d}\int\limits_{V_{sc}}c_{d}(t,r)dV.}$

The parameters found for the different insulin analogues and their resultant insulin dynamics predicted by the model are shown in Table 2:

Table 2. Parameter Values and resultant Dynamics for different insulin analogues. Values of parameters from [1] Table IV. Insulin dynamics taken from [1] Fig. 6
Insulin Analogue $$\color{#3e3f3f}{Q}$$ $$\color{#3e3f3f}{D}$$ $$\color{#3e3f3f}{B_d}$$ Time of peak Insulin action (hours) Duration of Insulin action (hours)
Lispro, Humalog, NovoRapid $\color{#3e3f3f}{4.75\cdot 10^{-4}}$ $\color{#3e3f3f}{3.36\cdot 10^{-4}}$ $\color{#3e3f3f}{2.36\cdot 10^{-2}}$ $\color{#3e3f3f}{0.25}$ $\color{#3e3f3f}{4}$
Actrapid $\color{#3e3f3f}{1.9\cdot 10^{-3}}$ $\color{#3e3f3f}{8.4\cdot 10^{-5}}$ $\color{#3e3f3f}{1.18\cdot 10^{-2}}$ $\color{#3e3f3f}{0.75}$ $\color{#3e3f3f}{8}$
Semilente $\color{#3e3f3f}{7.6\cdot 10^{-2}}$ $\color{#3e3f3f}{8.4\cdot 10^{-5}}$ $\color{#3e3f3f}{1.18\cdot 10^{-2}}$ $\color{#3e3f3f}{1.3}$ $\color{#3e3f3f}{11}$
NPH $\color{#3e3f3f}{3.04}$ $\color{#3e3f3f}{8.4\cdot 10^{-5}}$ $\color{#3e3f3f}{1.18\cdot 10^{-2}}$ $\color{#3e3f3f}{4.5}$ $\color{#3e3f3f}{16}$

Since the parameter $$\color{#3e3f3f}{Q}$$ seemed to have the most impact on insulin dynamics, we tested if there was a relationship between the two (figure 1):

Now, since $$\color{#3e3f3f}{Q}$$ in the model formed in [1] is the equilbrium constant of the reaction InsulinHexamer $$\color{#3e3f3f}{\rightleftharpoons}$$ InsulinDimer, it is related to the Gibbs free energy of the reaction by the expression $$\color{#3e3f3f}{\Delta G^{o}=-RT\ln{Q}}$$, where $$\color{#3e3f3f}{R=8.314472 J K^{-1} mol^{-1}}$$ is the gas constant and $$T$$ is the temperature in kelvins.

Therefore if we know the Gibbs free energy of insulin hexamer formation, we can use this to find some qualitative information on the dynamics of insulin absorption using the model from [1], and thus estimate the time of peak insulin action and the duration of insulin action from thermodynamic information.

The Mutabind tool [2] computationally predicts the $$\color{#3e3f3f}{\Delta\Delta G}$$ of point mutations relative to a known structure, where $$\color{#3e3f3f}{\Delta\Delta G=\Delta G^{mut}-\Delta G^{wt}}$$, or the change in free energy of binding upon mutation. We used the server to predict the effects of our variations to proinsulin's sequence on protein-protein interactions within the insulin hexamer, and thus their effects on the $$\color{#3e3f3f}{\Delta G}$$ of hexamer formation. We used PDB file 3AIY and inputted the sequence variants we had designed our winsulin with. Since the B chains are buried at the center of the insulin hexamer (see 3AIY), changes in these residues will have an effect on the $$\color{#3e3f3f}{\Delta\Delta G}$$. Our insulin analogue had no changes to the B chain so mutabind predicts no change in $$\color{#3e3f3f}{\Delta\Delta G}$$(figure 2):

However the Asparagine to Glycine mutation in the A chain increases the predicted pI of winsulin relative to native insulin. An upward shift in pI closer to physiological pH has been found to cause subcutaneously injected insulin preparations to form precipitates. [3]

This corresponds to an increase in $$\color{#3e3f3f}{Q}$$, meaning we predict our winsulin analogue will be relatively slow acting, compared to regular human insulin. Human insulin activity peaks in 2-4 hours and lasts for 6-8 hours [4] so this would make winsulin a slow-acting analogue.

Although this is a crude estimate, it is useful as it gives us some qualitative information on the action profile of our novel Winsulin.

# Physiological Modelling References

1. Tarin, C., Teufel, E., Pico, J., Bondia, J., Pfleiderer, H.J. (2005). Comprehensive pharmacokinetic model of insulin Glargine and other insulin formulations. IEEE Transactions on Biomedical Engineering, vol. 52, no. 12, pp. 1994-2005
2. Li, M., Simonetti, F. L., Goncearenco, A., & Panchenko, A. R. (2016). MutaBind estimates and interprets the effects of sequence variants on protein–protein interactions. Nucleic Acids Research, 44(Web Server issue), W494–W501. http://doi.org/10.1093/nar/gkw374
3. Wayne D. Kohn, Radmila Micanovic, Sharon L. Myers, Andrew M. Vick, Steven D. Kahl, Lianshan Zhang, Beth A. Strifler, Shun Li, Jing Shang, John M. Beals, John P. Mayer, Richard D. DiMarchi, pI-shifted insulin analogs with extended in vivo time action and favorable receptor selectivity, In Peptides, Volume 28, Issue 4, 2007, Pages 935-948, ISSN 0196-9781, https://doi.org/10.1016/j.peptides.2007.01.012.
4. Diabetes Education Online. 2017. Types of Insulin. [ONLINE] Available at: https://dtc.ucsf.edu/types-of-diabetes/type2/treatment-of-type-2-diabetes/medications-and-therapies/type-2-insulin-rx/types-of-insulin/.

# Why Economic Modeling?

When considering a project that aims to sell a product into the economy, it’s important to consider the current economy and associated prices of the product. As you probably would’ve read by now throughout our wiki, insulin prices are considered to be unaffordable in many regions of the world. However, quantifying what is affordable and unaffordable is generally quite difficult, as every individual has their own standard of living ideal (Hancock et al 1993). High prices of therapies have also been correlated to non-compliant use (Ohene et al 2004). It is due to these reasons that we have pursued modelling the price of insulin between different countries and regions over time to determine whether we can predict the rate of which insulin prices will tend in regions over time.

# Data

For statistical models, data collection is one of the difficulties in obtaining useful results. As we are no means able to go out and measure incomes across the world, find local insulin prices worldwide and calculate diabetes prevalence’s in 6 months, we have had to rely on open source information. All insulin prices were found from a single source to reduce error variability (Health Action International 2010). Similarly were the median household incomes collected from a single source(Phelps & Crabtree 2013). Household incomes were collected by Gallup, a large polling constitute that has performed several face-to-face and telephone interviews across 50 countries in 6 years. Their income data were aggregated from 2006-2012. US Inflation was calculated to be 5.3% from 2010 to 2012 to standardise Median incomes to insulin prices.

# Data Visualisation

An important part of any statistical analysis is data visualisation. In order to be able to understand what our later models are showing, we first have to find interesting trends and relations between sections of the data to focus the analysis. Since we are looking at affordability, we had to determine what we consider an ‘affordable’ price per year. This turned out to be quite difficult, as Insulin is what we can call something that has little elasticity. What this means is that no matter the price, because insulin is required to simply survive, people are forced to pay it. This means that affordability isn’t really considered when purchasing the drug. If the option is blow your monthly salary on a single vial, or die in 5 days, we’ve assumed you’re going to buy the insulin. Therefore, affordability was reclassified as appropriate expense.
A 2% threshold is what we arrived at as an appropriate proportion of income spent on insulin alone. This was heavily impacted by the Australian Medicare system, which takes approximately 3.5%p.a. of an income to cover all medical related incidents (including doctors’ visits, ambulance services, discounted medication rebates). 2%, though not perfect, seemed to balance an individuals’ need to purchase insulin, along with maintaining certain standards of living.

Figure One. Median Household Yearly Expense, in PPP Adjusted $USDs, of Insulin per Country. 28% of countries involved were deemed to have yearly insulin expenses under the 2% threshold of affordability at the median household income level. Generally, European regions have costs well above this threshold, while those in Australasia (minus Indonesia) can largely afford it. Figure Two: Comparison of Prices of Actrapid and Humulin in Different Countries. Correlation matrix analysis shows an 82% correlation between Actrapid and Humulin prices in different countries. R2 of 0.761 is considered to demonstrate a strong linear relationship between prices of the two insulin analogues. Figure Three: Diabetes prevalence has no impact on yearly expense of Insulin in that country.No strong relation can be drawn from proportion of society with diabetes and insulin price, therefore supply pressures are not causing prices to increase in areas with high prevalence. # Linear Models ## Testing Dependence Between Insulin Prices Chi Squared simulated p value results demonstrate no dependence between Actrapid and Humulin price ranges given the p-value was 0.6762. This enables us to perform linear regression on these prices, as linear models require independent results. Normality will be assumed to hold, however were be verified through QQ-plots in model 1. ## Linear Models In order to inform our project, we wanted to analyse whether we could use previous insulin prices to predict the yearly expense today for a general region of the world. Though the linear model provides a limited scope of what ultimately leads to the prices we see today, strong relations between 2010 insulin prices and the resulting percentage of income spent on insulin in 2012 were seen in certain regions, specifically Australasia, Europe and the Middle East. MODEL ONE: Predict the Price of Humulin Given Actrapid Price: $\color{#3e3f3f}{Humulin Price= -0.5466+1.1375\cdot Actrapid Price}$ This model was revised following the observation that Indonesia’s prices were significantly affecting the line of best fit. This was noted as an outlier as well as a high leverage point as classified by Cooks Distance. It was thus removed from the data set. This model tells us that Actrapid is generally more expensive than Humulin in any given region, which could be seen in Figure Two. MODEL TWO Predicting Percentage of Income Spent on Humulin in 2012 Given 2010 Humulin Prices: $\color{#3e3f3f}{YearlyPercentOfExpenseOnHumulin=2.7903}$ $\color{#3e3f3f}{-9.9708\cdot I(Australasia==1)-6.3756\cdot I(European==1)}$ $\color{#3e3f3f}{-4.9750\cdot I(MiddleEast==1)+0.2880\cdot ActrapidCost}$ This model was run through a step-wise process to determine the significant influences that will best predict the percentage of income someone in a certain region will spend on insulin in 2012. Obviosuly, 2012 insulin prices were not included in the model, as this would remove the ‘predicatively’ of the insulin prices. Interestingly, Actrapid 2010 prices were deemed more significant to predicting Humulin prices in 2012, than Humulin 2010 prices. Moreso, living in the Australasian region would lower your expected yearly expense on insulin prices more than European and Middle Eastern regions also. This is as expected through, following the removal of Indonesia from the models. No levels of significance were found by including diabetes population percentages and whether or not manufacturing plants are found in countries in those regions. #### MODEL THREE: #### Predicting Percentage of Income Spent on Actrapid in 2012 Given 2010 Actrapid Prices: $\color{#3e3f3f}{YearlyExpenseOnActrapid=2.05-10.3547\cdot I(Australasia)}$ $\color{#3e3f3f}{-6.0511\cdot I(Europe==1)-4.8\cdot I(MiddleEast==1}$ $\color{#3e3f3f}{+0.3285\cdot Actrapid}$ Similar trends can be seen with Actrapid as with Humulin Prices. The same three regions have significant influence on the expense paid each year on insulin. However, Actrapid is a better predictor than Humulin 2010 prices, despite the reversal seen in Model Two. This is potentially due to the fact that Actrapid prices are generally higher than Humulin. Actrapids higher expense can also be seen through the higher coefficient in Model Three to Model Two, despite being the same constant for a given region. This tells us that generally, Yearly expenses of Actrapid will be higher in a region than Humulin, which has been consistent throughout all prior results. # What Did We Learn? Overall, this confirmed to us how lucky we are here in Australia to have the kind of system we have. It further demonstrated the complexity of the economic market when exploring insulin, as obviously individual markets hold a significant sway on the resulting prices. Unfortunately all of those individual market factors could not be explored in this context due to data limitations. These models also suggested that the demand of insulin prices were not influencing the resulting price, as no linear trend could be found between diabetes prevalence in a country to insulin prices. Nor was diabetes prevalent in any resulting models. # References 1. HANCOCK, K. E. 1993. 'Can Pay? Won't Pay?' or Economic Principles of 'Affordability'. Urban Studies, 30, 127-145. 2. OHENE BUABENG, K., MATOWE, L. & PLANGE-RHULE, J. 2004. Unaffordable drug prices: the major cause of non-compliance with hypertension medication in Ghana. J Pharm Pharm Sci, 7, 350-2. 3. PHELPS G., CRABTREE S. 2013. Worldwide, Median Household Income About$10,000. Gallup News.
4. HEALTH ACTION INTERNATIONAL 2010. Medicine Prices, Availability, Affordability & Price Components. Health Action International.

# General References

1. Chandran D, Copeland WB, Sleight SC, Sauro HM. Mathematical modeling and synthetic biology. Drug discovery today Disease models. 2008;5(4):299-309