# Model

### Overview

As part of our overarching project goal to optimize the production of curli fibers, we took a closer look at the cellular pathway involved in this process. As protein expression level can be tuned relatively easily with ribosome binding sites, we chose to focus on optimizing the stoichiometric ratios of protein expression for the proteins involved in the production pathway. To this end, we developed a mechanistic model of the general curli pathway that can be applied to any system involving the curli proteins.

To constrain our model, we defined the following objectives:

1. Maximize the concentration of extracellular fibrils
2. Minimize the concentration of intracellular fibrils
3. Maximize cellular resource usage efficiency

The first objective focuses on maximizing the production of our desired product: the secreted and polymerized csgA. The second addresses the toxicity induced by accumulation of intracellular protein aggregates. And lastly, the third aims to minimize the inputs needed to make the production process as cost and energy efficient as possible.

The pathway itself is been broken down into the following five modules: Transcription, Translation, Periplasmic Export, Extracellular Secretion, and Aggregation and Polymerization.

### Transcription

The naturally occurring genes corresponding to curli are organized into csgBAC and csgDEFG operons. csgD is a regulatory protein, and thus is not applicable for the expression of foreign plasmids. To increase transformation efficiencies, the plasmid that we used placed all the relevant genes on a single plasmid, csgBACEFG.

The rate of transcription is primarily governed by the plasmid copy number and promoter strength. Although RNA polymerase and a number of other transcription factors are also involved in the transcription of mRNA, transcription factor binding achieves equilibrium much faster than transcription, translation, and protein accumulation, so it can be at considered to be at steady state on the time scale of proteins. Thus only the concentration of the plasmids are assumed to have a large impact on the rate of transcription.

$$g + RNA_{pol} \overset{\alpha}{\rightarrow} g + RNA_{pol} + mRNA$$

The rate of transcript degradation is dependent upon transcript stability, which is in turn affected by a number of factors. Although we have introduced these genes on foreign plasmids, the mRNA half-life here is assumed to be that of the mRNA transcripts of the corresponding genes naturally present in E. coli.

The degradation rate, $$\zeta$$, can be described as an exponential decay function of time mRNA half-life, $$h$$: $$\zeta = e^{ht}$$ $$mRNA \overset{\zeta}{\rightarrow} \varnothing$$

### Translation

Each of the coding sequences in the transcript described above is preceded by an RBS sequence that determines the rate of translation of each protein. The relative RBS strengths determine the stoichiometry between the proteins involved within the cell. Translation also involves a number of players, including the ribosome and tRNAs, but as the kinetic parameters regarding the rate of translation available in literature do not distinguish between interactions between specific molecules, instead examining the overall rate of polypeptide production, our model also only considers the overall transformation. In addition, since we aim to control the expression level of each individual protein, we assign a unique translation rate for each gene $$mRNA \overset{\beta_{1}}{\rightarrow} mRNA + csgA_{cyt}$$ $$mRNA \overset{\beta_{2}}{\rightarrow} mRNA + csgB_{cyt}$$ $$mRNA \overset{\beta_{3}}{\rightarrow} mRNA + csgC_{cyt}$$ $$mRNA \overset{\beta_{4}}{\rightarrow} mRNA + csgE_{cyt}$$ $$mRNA \overset{\beta_{5}}{\rightarrow} mRNA + csgF_{cyt}$$ $$mRNA \overset{\beta_{6}}{\rightarrow} mRNA + csgG_{cyt}$$

### Periplasmic export

Although the ultimate destination of the csgA monomer is the extracellular space, not all of the other proteins involved in the pathway share the same fate. csgB and csgF do operate in the extracellular space, but csgC and csgE are chaperone proteins that remain in the periplasm, whereas csgG forms a channel in the outer membrane.

Regardless of their ultimate destination, all of the proteins must first be exported into the periplasm, since none remain in the cytoplasm. The mechanism by which this occurs is the Sec secretion pathway. The main actors in this secretion pathway are SecYEG, which dimerizes to form a protein conducting channel (PCC), SecA, which acts as an ATPase driving the translocation, and SecB, a chaperone protein that keeps proteins in an unfolded state (Driessen et al., 2007). As a protein emerges from the ribosome, SecB, a homotetramer, binds and stabilizes it in its unfolded conformation. This protein complex then binds to SecA, a homodimer, which also recruits SecYEG to assemble the dimeric PCC.

In our model, we simplify the interactions into three steps: the binding of the SecB homotetramer to the csg protein to form a $$SecB:csgX$$ complex, the subsequent binding between the $$SecB:csgX$$ complex with SecA, and finally the translocation of the protein from the cytoplasm to the periplasm.
$$csgX_{cyt} + 4 SecB \underset{\gamma_{-1}}{\overset{\gamma_{1}}{\rightleftharpoons}} SecB:csgX$$ $$SecB:csgX + 2 SecA + 2 SecYEG \overset{\gamma_{2}}{\rightarrow} Sec:csgX$$ $$Sec:csgX \overset{\gamma_{3}}{\rightarrow} 4 SecB + 2 SecA + SecYEG + csgX_{per}$$
The rate of binding and secretion is assumed to be conserved across the different proteins due to the similarity in mechanism and homology in the Sec signal sequences. Thus, instead of listing the same interactions for every protein, the generic $$csgX_{cyt}$$ and $$csgX_{per}$$ are used to denote the csg proteins in the cytoplasm and periplasm, respectively.

### Extracellular secretion

Analysis of the crystal structure of csgG and has revealed that it assembles into a double-nonameric form in D9 symmetry (Taylor and Matthews 2015). CsgE has also been shown to form a nonamer at the base of the csgG structure in the periplasm, providing selectivity for the substrates that are secreted (Goyal et al., 2014). CsgG and csgE participate in the translocation of csgF into the extracellular matrix, which then folds and binds csgG to the membrane. Meanwhile, csgC interacts with csgA and csgB monomers to prevent the formation of oligomers (Taylor and Matthews 2015). When the monomers interact with csgE, they become trapped in the periplasmic cavity and are transported across the outer membrane. CsgB then interacts with csgF to initiate the nucleation of csgA fibers.
$$9 csgE_{per} \underset{\delta_{-1}}{\overset{\delta_{1}}{\rightleftharpoons}} csgE_{9}$$ $$csgF_{per} \overset{\delta_{2}}{\rightarrow} csgF_{ECM}$$ $$9 csgG_{per} \underset{\delta_{-3}}{\overset{\delta_{3}}{\rightleftharpoons}} csgG_{9}$$ $$csgG_{9} + 2 csgF_{ECM} + csgE_{9} \underset{\delta_{-4}}{\overset{\delta_{4}}{\rightleftharpoons}} csgGEF$$ $$csgA_{per} + csgC_{per} \overset{\delta_{5}}{\rightarrow} csgC:csgA$$ $$csgB_{per} + csgC_{per} \overset{\delta_{5}}{\rightarrow} csgC:csgB$$ $$csgC:csgA + csgGEF \overset{\delta_{6}}{\rightarrow} csgGEF + csgC_{per} + csgA_{ECM}$$ $$csgC:csgB + csgGEF \overset{\delta_{7}}{\rightarrow} csgGEF + csgC_{per} + csgB_{ECM}$$

#### Diffusion

The rate of translocation of proteins from the periplasm to the extracellular matrix will be assumed to follow Fick's first law of diffusion: $$J = -D \nabla \phi$$ Here, $$J$$ represents the "diffusion flux", or diffusion per unit area per unit time, $$D$$ is the diffusion coefficient, which we will take from literature, and $$\nabla \phi$$ is the concentration gradient. However, to avoid partial derivatives, we will simplify the equation above into the following: $$J = -D \frac{[csgX_{per}] - [csgX_{ECM}]}{\omega}$$ where $$[csgX_{per}] - [csgX_{ECM}]$$ represents the difference in concentration of a protein between the periplasm and the ECM, and $$\omega$$ represents the width of the outer membrane. The equation now gives us the moles per unit area per unit time. To obtain the number of molecules per unit time, we multiply the equation above by the surface area of the cell, which we will represent as $$S$$, as well as avogadro's number, $$N_A$$: $$J = -D \frac{[csgX_{per}] - [csgX_{ECM}]}{\omega} S N_A$$ The equations above will be used to represent the rate of translocation of csgF, csgA, and csgB from the periplasm to the extracellular matrix, as their mode of transport is known to be selective diffusion.

### Aggregation and Polymerization

One of the most useful properties of curli fibers is their self-assembly and aggregation. However, this feature also poses a problem when curli is used as a biomanufacturing platform – production of csgA that outpaces the E. coli maximum efficiency cell's natural secretion machinery can lead to accumulation and aggregation within the cytoplasm, causing the cell lysis. Thus, it is crucial to examine both the buildup of intracellular fibers as it pertains to cell health, as well the effect of extracellular aggregation in driving the diffusion equilibrium forward.

Fibril formation can be broken down into two main reactions: nucleation, describing the interaction between two csgA monomers to form a seed for further fiber growth, and elongation of fibrils, which refers to the addition of csgA monomers to an existing fiber. These two steps proceed until the species involved reach an equilibrium. $$2 csgA_m \underset{\epsilon_{-1}}{\overset{\epsilon_{1}}{\rightleftharpoons}} F$$ $$F + csgA_m \underset{\epsilon_{-2}}{\overset{\epsilon_{2}}{\rightleftharpoons}} F$$ $$csgA_m$$ and $$F$$ represent a free-floating csgA monomer and a fiber of arbitrary length, respectively. Nucleation is considered the rate-limiting step in polymerization (Chapman 2008), whereas subsequent elongation occurs at a relatively constant rate regardless of fibril length. Thus, all fibers are grouped as a single species. As polymerization can occur anywhere where the concentration of csgA is high enough, the reactions involved must be taken into consideration in all three compartments of the cell.

### Differential Equations

Check out the full set of differential equations used in our model!

### Results

In developing our model, we were unable to find a threshold toxicity level for the intracellular concentration of aggregated csgA monomers in literature. Thus, we chose to examine the ratio of concentrations of extracellular to intracellular fibers as a proxy for maximizing our desired product while minimizing the buildup of toxic protein aggregates inside the cell. Additionally, due to limitations on computing power, we were only able to investigate the effects of varying the RBS strength of one protein at a time. The results generated by our model are presented below. All graphs are in terms of arbitrary units, as we are only concerned with relative expression and concentration levels.

#### csgA

The results predicted by the model when we vary the expression level of csgA predicts that the ratio of extracellular to intracellular fibers is positively correlated with the expression level of csgA, which matches what we intuitively expect based on what we know about the role of csgA in the curli pathway. These results also match the experiences of members of the Joshi lab who work with the curli system.

#### csgB

The model predicts that when the expression level of csgB is varied, the ratio of extracellular to intracellular fibers is maximized for relatively low and high expression levels, and minimized with midlevel expression levels. This does not match our intuitive conjectures, as we expected that decreasing the expression level of csgB would increase curli production. Further experiments must be conducted to determine the accuracy with which the model predicts cellular behavior when the expression level of csgB is varied.

#### csgC

Since csgC is involved with the prevention of intracellular protein aggregation, we expected that increasing the expression level of csgC would decrease the concentration of intracellular fibers, and in turn, increase the ratio of extracellular to intracellular fibers. However, the model predicts that relatively low and high expression levels are optimal for curli production. Further experiments must be conducted to determine the accuracy with which the model predicts cellular behavior when the expression level of csgC is varied.

#### csgE

As the role of csgE in the curli pathway has not been fully elucidated, we were not able to make any intuitive predictions about the effect of varying csgE on the accumulation of fibers, the model predicts that the optimal expression level is slightly above midlevel expression levels. Further experiments must be conducted to determine the accuracy with which the model predicts cellular behavior when the expression level of csgE is varied.

#### csgF

Like with csgE, the role of csgF in the curli pathway has not been fully investigated, so we were not able to make any intuitive predictions about the effect of varying csgF on the accumulation of fibers. The model also does not predict any noticeable pattern in the ratio of extracellular to intracellular fibers when the expression level of csgF is modulated. This may be a result of the incomplete information that we incorporated into the model, so further experiments must be conducted to determine the true relationship between expression level of csgF and curli production.

#### csgG

The model again does not predict any noticeable pattern in the ratio of extracellular to intracellular fibers when the expression level of csgG is modified. Since csgG is a membrane protein responsible for exporting csgA monomers out of the cell, we intuitively expected that increasing the expression level of csgG would increase the concentration of extracellular fibers and decrease the concentration of intracellular fibers, resulting in a net increase in the ratio between the two. This is the protein that we chose to investigate further in the genetic engineering aspect of our project. Check out our results to learn more!

You can also explore the code we used to simulate our model here.

### Discussion

As with any model, it is important for us to reassess the assumptions upon which our model is founded, analyze how they could affect the results predicted by our model, and explore ways in which the predictive accuracy of the model could be improved. One such area is the kinetic parameters used in our model. Many of the parameters used to specify the rate constants of the various molecular interactions were drawn from analogous systems or interactions, as parameters specific to interactions between curli proteins are currently unavailable in literature. Obtaining experimental data to either update or validate these parameters is a fundamental first step to increasing predictive accuracy of our model.
Another area is the mathematical simplification that we used in describing the aggregation and polymerization kinetics. Far more descriptive models of protein aggregation and polymerization have been developed by Meisl et al., Wang et al., and Schreck et al.. The 2014 INSA Lyon Team also made significant efforts to model the polymerization kinetics of curli fibers. However, as the 2014 INSA-Lyon discovered, modelling protein aggregation and polymerization kinetics to this level of detail requires very high levels of computational power. Thus, in order to be able to run our model on student-owned computers, we chose to make simplifications in this aspect of the model, as described above. Developing a way to combine the specificity and intricacy of the models developed by those studying protein aggregation with the computational feasibility of the simplified version we used in our model is another step towards increasing the accuracy of our model.

### References

1. Agarwal, Anup. Computational and experimental approaches to enhance extracellular secretion of recombinant proteins in Escherichia coli. Diss. University of Delaware, 2010.
2. Lee, A. A., et al. "Dissecting the self-assembly kinetics of multimeric pore-forming toxins." Journal of The Royal Society Interface 13.114 (2016): 20150762.
3. Meisl, Georg, et al. "Molecular mechanisms of protein aggregation from global fitting of kinetic models." Nature protocols 11.2 (2016): 252-272.
4. Proshkin, Sergey, et al. "Cooperation between translating ribosomes and RNA polymerase in transcription elongation." Science 328.5977 (2010): 504-508.
5. Schreck, John S., and Jian-Min Yuan. "A kinetic study of amyloid formation: fibril growth and length distributions." The Journal of Physical Chemistry B 117.21 (2013): 6574-6583.
6. Stewart, Philip S. "Diffusion in biofilms." Journal of bacteriology 185.5 (2003): 1485-1491.
7. Taylor, Jonathan D., and Steve J. Matthews. "New insight into the molecular control of bacterial functional amyloids." Frontiers in cellular and infection microbiology 5 (2015).
8. Lardner, T. J. "The measurement of cell membrane diffusion coefficients." Journal of biomechanics 10.3 (1977): 167-170.
9. BioNumbers: The Database of Useful Biological Numbers, http://bionumbers.hms.harvard.edu/bionumber.aspx?id=100015&ver=0. Accessed 22 Oct. 2017.
10. Wang, Xuan, Neal D. Hammer, and Matthew R. Chapman. "The molecular basis of functional bacterial amyloid polymerization and nucleation." Journal of Biological Chemistry 283.31 (2008): 21530-21539.