If knowledge is power, then modelling is the key to a successful engineering endeavour. Our modelling of the biophysical characteristics of the classifier circuits that follows in this section is largely based upon the work of Z. Xie and Y. Benenson [1, 2]. The latter’s newest paper [2] faces the same questions: it analyzes the modelling process and optimizes the classifier’s logical expression. Standing on their shoulders, we expand upon their work by analyzing architectural variations of the classifier circuits, which add an additional optimization axis.
pANDORRA is based on a family of layered genetic circuits that aim to biophysically implement logical expressions involving miRNA molecules in 2 basic roles: upregulated or downregulated (on the target cells in respect to the control). The family is characterized by a basic structure emanating from these two roles. The target gene (the system’s output) is controlled by a promoter and RNA interference (RNAi) from the downregulated miRNAs and forms the lower layer of the system. The upper layer comprises target sites for the upregulated miRNAs and an inhibitor for the lower layer.
This basic architecture can be significantly improved with a double-inversion module for the the upper layer [1]. It takes the form of a middle layer with a promoter that is activated by the upper layer’s product and it in turn produces the inhibitor for the lower layer. This extra part significantly improves the system’s efficiency.
But how do we define the system’s efficiency and how do we optimize its architecture, out of the many options left open by the previous definitions? And after finding the optimal architecture, how do we choose the optimal miRNA targets to separate particular cell groups? These are the basic questions we set out to answer by modelling the pANDORRA system.
The main task of a classifier system, given a group of target and control cells, is to produce large amounts of output when inside a target cell and little output when inside a control cell. Ideally, it should function as a logical function: maximum output expression in all target cells and zero in all controls. We will therefore judge its efficiency by how close it is to the ideal. We define the fold change of a concrete classifier instance as the average output margin between the two groups.
The circuit’s layered design naturally guides the modelling effort in two paths: estimating each layer’s output and the effects this output has on the next layer. Basic elements of the model therefore would be relationships between the concentrations of the involved regulatory species and the outputs of the regulated elements.
Take the RNAi mechanism: let Pmax be the unregulated (maximum) output concentration and C* the miRNA concentration that results in a 50% reduction of the output. Then the relationship between the output [P] and the input [miRNA] is:
$$[P]= P_{max} \frac{1}{1+\frac{\mathit{[miRNA]}}{C*}}$$Next question: if P is regulated by 2 miRNAs together, how do they interact? Following the same logic as in [2], we assume that the behave additively. If each miRNA has a Ci* concentration of half repression and n is the number of miRNAs, the cumulative regulation is:
$$[P]= P_{max} \frac{1}{1+\sum_{i=1}^{n}{\frac{\mathit{[miRNA_i]}}{C_i^*}}}$$If there are multiple target sites for a particular miRNA we can assume that their effect on interference will also be additive, so if there are ri target repeats for miRNAi, the output will be:
$$[P]= P_{max} \frac{1}{1+\sum_{i=1}^{n}{r_i \frac{\mathit{[miRNA_i]}}{C_i^*}}}$$To simplify the following expressions, we can define the interference on a scale of 0 to 1 based on the above:
$$\mathit{RNAi}= \frac{1}{1+\sum_{i=1}^{n}{r_i \frac{\mathit{[miRNA_i]}}{C_i^*}}}$$These observations can provide the output of the classifier’s upper layer. A valid design is not limited to 1 OR gate, therefore the total output (let’s assume the output is rtTA) is the sum of each gate’s output:
$$[rtTA]= rtTA_{max} \sum_{g=1}^{m} {\mathit{RNAi}_g}$$rtTA is an inducer of the pTRE promoter and its induction can be modelled on a scale of 0 to 1 with the use of the reaction’s dissociation constant Kd as follows:
$$A= \frac{\mathit{rtTA}}{K_d \mathit{rtTA}}$$The second layer can be regulated in 2 ways: by rtTA through its promoter and by RNA interference, like the other layers. The first regulation is transcriptional, whereas the second post-transcriptional. Let’s assume the outputs of the second layer can be the protein LacI and the miRNA FF4. LacI will be affected by both regulatory mechanisms; in our particular implementation though FF4 will only be affected by the transcriptional regulation due to the FF4 being produced Their levels would be then:
$$\mathit{[LacI]}= \mathit{[LacI]}_{max} A(\mathit{rtTA}) \mathit{RNAi}(\text{up_miR})$$ $$\mathit{[LacI]}= \mathit{[LacI]}_{max} A(\mathit{rtTA})$$The final layer’s output applies the exact same logic:
$$ \mathit{[Output]} = \mathit{[Output]}_{max} \mathit{RNAi}(\text{FF4,down_miR}) A(\mathit{LacI})$$This biophysical model expresses an input-output relationship between the miR inputs and the classifier output. Given an miRNA expression dataset, we can calculate the expected output for each sample and evaluate the performance of a concrete classifier.
pANDORRA’s goal is to discriminate 2 groups of cells. If there is an miRNA expression dataset for these cells, we need to find the optimal circuit inputs that maximize output in cancer cells and minimize it in control cells. Thanks to the evaluation function derived above, if we have a way to generate candidate circuits, we can employ a search method to optimize the logical circuit.
The search space is huge and looking at every possible combination of circuit inputs is impractical; the search is guided instead by a genetic algorithm. The search begins at the simplest circuits: those with a single up- or downregulated miRNA. At each search step, the pool of candidate circuits is evaluated and only the top best-performing are kept. These are then used as a seed to propose new circuits by randomly recombining them together or adding to them single miRNAs. This process is repeated until the best-performing circuit isn’t dethroned for enough steps [2]. The parallels between this process and natural selection and genetic recombination are obvious, explaining the name “genetic algorithm”.
Optimizing the logical circuit therefore requires selecting an architecture first (to have a concrete evaluation function). However, if there are many candidate architectures it can’t generally be known a priori which one will perform best on a particular dataset. Different architectures can be sensitive to different levels of miRNA input - imagine using a particularly strong repressor in the middle layer: low concentrations would be sufficient to completely knock out the output. The only way this classifier would produce output is if the levels of the upregulated miRNAs are exceedingly high on an absolute scale. This architecture wouldn’t work well to recognise cancer cells whose upregulated miRNAs are expressed in lower levels. Therefore, deciding on an architecture is also an optimization problem that should be solved for a specific dataset.
Contrary to the huge search space in circuit optimization, the biologically realizable architectures are few and each can be examined closely. To optimize across the architectural axis then, one need only find the best classifier circuit given the current architecture’s evaluation function. In the following section, we examine the various architectures in more detail.
The first proposed architecture is straightforward: it accommodates upregulated miR on the top layer, downregulated miR directly on the output layer and couples the upregulated ones with 2 parallel mechanisms in the middle layer: transcriptional repression of the output layer by LacI and post-transcriptional inhibition by FF4.
The second architecture removes the parallel inhibition mechanism by FF4 and loosens the coupling of low levels of upregulated miR to the output. At the same time, by adding upregulated miR targets at the middle layer, it increases the coupling of high levels of upregulated miR to the output.
By removing upregulated miR targets on the upper layer, the 3rd architecture allows greater LacI leakage when the upregulated miR are high.
The 4th architecture is very similar to the 3rd, only simpler to implement.
The 5th architecture again combines the complementary couplings provided by LacI and FF4
By reestablishing the upper layer, we get architecture 6.
And by removing LacI we get architecture 9.
The basic variable element among these architectures is the method of double inversion. The two middle layer coupling mechanisms we explore, LacI and FF4, are used individually or combined. To assess the significance of this decision, we explore the input-output relationship for the corresponding architectures by scanning a pair of input upregulated miRNAs connected together in the same OR gate along the biologically plausible range they can cover.
There is stark difference in their behaviour. LacI maps low counts of both the miRNAs to relatively low output levels (OUTmax = 10000) and high counts of either on to high output levels, thus implementing the OR gate. Its transition occurs very rapidly at relatively low miR counts, while its on/off ratio is 20. FF4 on the other hand is even more effective at mapping low levels of both miR to low output levels and high levels of both to high output, but maps high levels of only one of them to middle output, therefore implementing an arithmetic + gate more than a logical OR gate. Its transition is much more gradual than LacI’s, leaving a large zone of uncertain output, while its overall on/off ratio is 15. It also appears the FF4’s presence masks the effect of LacI.
AND gates are implemented more successfully, at least for upregulated miR. Although the overall on/off ratio remains the same, the output is more consistent with a logical expression when one miR is high and the other is low.
Overall this analysis shows that LacI is more appropriate to implement logic gates. However it is not a silver bullet. Its transition occurs at a point that might be inappropriate for some datasets, leaving even the miRNAs that are “low” on the positive side of the output.
An interesting possibility is the adaptation of the number of target sites where each miRNA can attach to cause interference. It has been shown by [3] that there is little interaction between multiple miRNA and RISC complexes attaching next to each other, allowing us to model the overall dynamics additively. This assumption is challenged when there are low copy numbers of the participating molecules, but let’s follow it as a working hypothesis. Additive RNAi dynamics implies that, given an interference strength x by some miRNA for 1 repeat of the target site, if there are multiple repeats the interference becomes n*x.
To examine the effect of target site repeats, we’ll use an example dataset and circuit on the previous architectures.
Our Caco-2 and healthy dataset was obtained from the work by Cummins et al., 2006 [4]. The miRNA expression data regarding the HEK-293 and A549 cell lines were retrieved from the mammalian microRNA expression atlas [5]
We use the methods and insights derived so far to implement classifiers for our lab experiments. The experimental goal is to prove the ability of an optimized classifier to target cells from the Caco-2 cell line versus a healthy tissue control. Due to practical considerations, the controls on the actual experiments were the HEK-293 and A549 cell lines.
Running the circuit optimization algorithm on a dataset comprising Caco-2 and healthy samples [4] for the various architectures suggests the following circuits for each:
- where “cMargin” is the logarithm of the fold change |
Apparently the best architectures are indeed the ones that use LacI in the middle layer and don’t use FF4 (which overrules LacI’s influence).
The LacI architectures are optimal with a very small number of inputs. The others have more inputs, but due to the way the optimization works, many of those inputs might be of only marginal significance. It is then in our best interests from a design perspective to attempt to simplify the proposed circuits - a process called “pruning”. Taking cues from LacI, we attempt to prune the other architectures to the same small circuit, which is a subset of all the larger circuits.
Experiments are expensive and models are useful lies; acknowledging this statement, it is also useful to put the results of our model in a broader perspective provided by the literature and increase our chances of a successful experiment. Xie [2] suggests using miR-373, while miR-21 (which also comes up in one of our architectures) comes from a widespread and very significant oncogene. While examining miR-373, we notice it shares the same seed sequence with miR-372, which means that it will cause RNAi for the same targets as miR-372. The optimization algorithm takes this similarity into account at a preprocessing step, adding the expressions of miR-372 and miR-373 together and reporting only miR-372.
Putting all these elements together leads us to propose the following circuit for the wet lab experiments:
- Architecture: 2 (LacI + miR targets at middle layer)
- Circuit: hsa-mir-372 OR hsa-mir-373 AND hsa-mir-21 AND ~hsa-mir-143 AND ~hsa-mir-145
Although we’ve determined architecture 2 to be optimal, we reevaluate the proposed circuit across all the architectures. The resulting logarithm of the fold change is:
arch1 | arch2 | arch3 | arch4 | arch5 | arch6 | arch7 |
1.45 | 2.69 | 2.64 | 2.65 | 0.90 | 1.67 | 0.77 |
Architecture 2 retains the same high fold change with the proposed circuit.
We designed experiments to implement this final classifier circuit in the lab. The LacI promoter, CAGop, was ordered to be synthesized, but unfortunately synthesis proved very difficult due to the high GC content and the construct arrived at 31/10. Necessity forced us to fall back to an alternate architecture for which we already had the parts: architecture 7 (FF4). Thanks to the way we had structured the circuit elements, we also had some leeway in choosing which miRNA targets to actually include: miR-21, miR-372 OR miR-372, or both. We reevaluate these 3 options for architecture 7 and also compute the efficiency of the circuits on datasets with HEK and A549 as controls, apart from healthy tissue samples.
Expected fold change logarithm for healthy samples:
mir21 & ~mir143 & ~mir145 | 0.835 |
mir372|mir373 & ~mir143 & ~mir145 | 0.881 |
mir21 & mir372|mir373 & ~mir143 & ~mir145 | 0.766 |
Expected fold change logarithm for the HEK cell line
mir21 & ~mir143 & ~mir145 | 1.572 |
mir372|mir373 & ~mir143 & ~mir145 | 1.038 |
mir21 & mir372|mir373 & ~mir143 & ~mir145 | 1.054 |
Expected fold change logarithm for the A549 cell line
mir21 & ~mir143 & ~mir145 | 0.620 |
mir372|mir373 & ~mir143 & ~mir145 | 1.138 |
mir21 & mir372|mir373 & ~mir143 & ~mir145 | 1.068 |
[1] Xie, Z., Wroblewska, L., Prochazka, L., Weiss, R., & Benenson, Y. (2011). Multi-input RNAi-based logic circuit for identification of specific cancer cells. Science, 333(6047), 1307-1311.
[2] Mohammadi, P., Beerenwinkel, N., & Benenson, Y. (2017). Automated Design of Synthetic Cell Classifier Circuits Using a Two-Step Optimization Strategy. Cell Systems, 4(2), 207-218.
[3] Schmitz, U., Lai, X., Winter, F., Wolkenhauer, O., Vera, J., & Gupta, S. K. (2014). Cooperative gene regulation by microRNA pairs and their identification using a computational workflow. Nucleic acids research, 42(12), 7539-7552.
[4] Cummins, J. M., He, Y., Leary, R. J., Pagliarini, R., Diaz, L. A., Sjoblom, T., ... & Raymond, C. K. (2006). The colorectal microRNAome. Proceedings of the National Academy of Sciences of the United States of America, 103(10), 3687-3692.
[5] Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., Iovino, N., Aravin, A., ... & Lin, C. (2007). A mammalian microRNA expression atlas based on small RNA library sequencing. Cell, 129(7), 1401-1414.