Difference between revisions of "Template:Greece/RNAi Classifier Design"

Line 32: Line 32:
 
<section class='sub_sections'>
 
<section class='sub_sections'>
 
<div style='text-align: justify;'>
 
<div style='text-align: justify;'>
<p>If knowledge is power, then modelling is the key to a successful engineering endeavour. Our modelling of the biophysical characteristics of the classifier circuits that follows in this section is largely based upon the work of Z. Xie and Y. Benenson [<a href='#ref1a'>1</a>, <a href='#ref2a'>2</a>]. The latter’s newest paper [<a href='#ref2a'>2</a>] faces the same questions: it analyzes the modelling process and optimizes the classifier’s logical expression. Standing on their shoulders, we expand upon their work by analyzing architectural variations of the classifier circuits, which add an additional optimization axis.</p>
+
<p>If knowledge is power, then modeling is the key to a successful engineering endeavour. Our modeling of the biophysical characteristics of the classifier circuits that follows in this section is largely based upon the exciting work of research teams led by Professors Z. Xie and Y. Benenson [<a href='#ref1a'>1</a>, <a href='#ref2a'>2</a>]. The latter’s newest paper [<a href='#ref2a'>2</a>] faces the same questions: it analyzes the modeling process and optimizes the classifier’s logical expression. Standing on their shoulders, we expand upon their work by analyzing architectural variations of the classifier circuits, which add an additional optimization axis.</p>
 
</div>
 
</div>
 
</section>
 
</section>
Line 44: Line 44:
 
<div style='text-align: justify;'>
 
<div style='text-align: justify;'>
 
<p>pANDORRA is based on a family of layered genetic circuits that aim to biophysically implement logical expressions involving miRNA molecules in 2 basic roles: upregulated or downregulated (on the target cells in respect to the control). The family is characterized by a basic structure emanating from these two roles. The target gene (the system’s output) is controlled by a promoter and RNA interference (RNAi) from the downregulated miRNAs and forms the lower layer of the system. The upper layer comprises target sites for the upregulated miRNAs and an inhibitor for the lower layer.</p>
 
<p>pANDORRA is based on a family of layered genetic circuits that aim to biophysically implement logical expressions involving miRNA molecules in 2 basic roles: upregulated or downregulated (on the target cells in respect to the control). The family is characterized by a basic structure emanating from these two roles. The target gene (the system’s output) is controlled by a promoter and RNA interference (RNAi) from the downregulated miRNAs and forms the lower layer of the system. The upper layer comprises target sites for the upregulated miRNAs and an inhibitor for the lower layer.</p>
<p>This basic architecture can be significantly improved with a double-inversion module for the the upper layer [<a href='#ref1a'>1</a>]. It takes the form of a middle layer with a promoter that is activated by the upper layer’s product and it in turn produces the inhibitor for the lower layer. This extra part significantly improves the system’s efficiency.</p>
+
<p>This basic architecture can be significantly improved with a double-inversion module for the the upper layer [<a href='#ref1a'>1</a>]. It takes the form of a middle layer with a promoter that is activated by the upper layer’s product and in turn produces the inhibitor for the lower layer. This extra part significantly improves the system’s efficiency.</p>
<p>But how do we define the system’s efficiency and how do we optimize its architecture, out of the many options left open by the previous definitions? And after finding the optimal architecture, how do we choose the optimal miRNA targets to separate particular cell groups? These are the basic questions we set out to answer by modelling the pANDORRA system.</p>
+
<p>But how do we define the system’s efficiency and how do we optimize its architecture, out of the many options left open by the previous definitions? And after finding the optimal architecture, how do we choose the optimal miRNA targets to separate particular cell groups? These are the basic questions we set out to answer by modeling the pANDORRA system.</p>
 
</div>
 
</div>
 
</section>
 
</section>
Line 57: Line 57:
 
<div style='text-align: justify;'>
 
<div style='text-align: justify;'>
 
<p>The main task of a classifier system, given a group of target and control cells, is to produce large amounts of output when inside a target cell and little output when inside a control cell. Ideally, it should function as a logical function: maximum output expression in all target cells and zero in all controls. We will therefore judge its efficiency by how close it is to the ideal. We define the fold change of a concrete classifier instance as the average output margin between the two groups.</p>
 
<p>The main task of a classifier system, given a group of target and control cells, is to produce large amounts of output when inside a target cell and little output when inside a control cell. Ideally, it should function as a logical function: maximum output expression in all target cells and zero in all controls. We will therefore judge its efficiency by how close it is to the ideal. We define the fold change of a concrete classifier instance as the average output margin between the two groups.</p>
<p>The circuit’s layered design naturally guides the modelling effort in two paths: estimating each layer’s output and the effects this output has on the next layer. Basic elements of the model therefore would be relationships between the concentrations of the involved regulatory species and the outputs of the regulated elements.</p>
+
<p>The circuit’s layered design naturally guides the modeling effort in two paths: estimating each layer’s output and the effects this output has on the next layer. Basic elements of the model therefore would be relationships between the concentrations of the involved regulatory species and the outputs of the regulated elements.</p>
 
<p>Take the RNAi mechanism: let P<sub>max</sub> be the unregulated (maximum) output concentration and C* the miRNA concentration that results in a 50% reduction of the output. Then the relationship between the output [P] and the input [miRNA] is:</p>
 
<p>Take the RNAi mechanism: let P<sub>max</sub> be the unregulated (maximum) output concentration and C* the miRNA concentration that results in a 50% reduction of the output. Then the relationship between the output [P] and the input [miRNA] is:</p>
 
<span class='equation'>
 
<span class='equation'>
Line 358: Line 358:
 
<!--$$A= \frac{\mathit{rtTA}}{K_d \mathit{rtTA}}$$-->
 
<!--$$A= \frac{\mathit{rtTA}}{K_d \mathit{rtTA}}$$-->
 
</span>
 
</span>
<p>The second layer can be regulated in 2 ways: by rtTA through its promoter and by RNA interference, like the other layers. The first regulation is transcriptional, whereas the second post-transcriptional. Let’s assume the outputs of the second layer can be the protein LacI and the miRNA FF4. LacI will be affected by both regulatory mechanisms; in our particular implementation though FF4 will only be affected by the transcriptional regulation due to the FF4 being produced Their levels would be then:</p>
+
<p>The second layer can be regulated in 2 ways: by rtTA through its promoter and by RNA interference, like the other layers. The first regulation is transcriptional, whereas the second post-transcriptional. Let’s assume the outputs of the second layer can be the protein LacI and the miRNA FF4. LacI will be affected by both regulatory mechanisms; in our particular implementation though FF4 will only be affected by the transcriptional regulation due to the FF4 being produced. Their levels would be then:</p>
 
<span class='equation'>
 
<span class='equation'>
  
Line 515: Line 515:
 
<div style='text-align: justify;'>
 
<div style='text-align: justify;'>
 
<p>pANDORRA’s goal is to discriminate 2 groups of cells. If there is an miRNA expression dataset for these cells, we need to find the optimal circuit inputs that maximize output in cancer cells and minimize it in control cells. Thanks to the evaluation function derived above, if we have a way to generate candidate circuits, we can employ a search method to optimize the logical circuit.</p>
 
<p>pANDORRA’s goal is to discriminate 2 groups of cells. If there is an miRNA expression dataset for these cells, we need to find the optimal circuit inputs that maximize output in cancer cells and minimize it in control cells. Thanks to the evaluation function derived above, if we have a way to generate candidate circuits, we can employ a search method to optimize the logical circuit.</p>
<p>The search space is huge and looking at every possible combination of circuit inputs is impractical; the search is guided instead by a genetic algorithm. The search begins at the simplest circuits: those with a single up- or downregulated miRNA. At each search step, the pool of candidate circuits is evaluated and only the top best-performing are kept. These are then used as a seed to propose new circuits by randomly recombining them together or adding to them single miRNAs. This process is repeated until the best-performing circuit isn’t dethroned for enough steps [<a href='#ref2a'>2</a>]. The parallels between this process and natural selection and genetic recombination are obvious, explaining the name “genetic algorithm”.</p>
+
<p>The search space is huge and looking at every possible combination of circuit inputs is impractical; the search is guided instead by a genetic algorithm. The search begins at the simplest circuits: those with a single up- or downregulated miRNA. At each search step, the pool of candidate circuits is evaluated and only the top best-performing are kept. These are then used as a seed to propose new circuits by randomly recombining them together or adding to them single miRNAs. This process is repeated until the best-performing circuit isn’t dethroned for enough steps [<a href='#ref2a'>2</a>]. The parallels between this process and natural selection and genetic recombination are obvious, justifying the name “genetic algorithm”.</p>
<p>Optimizing the logical circuit therefore requires selecting an architecture first (to have a concrete evaluation function). However, if there are many candidate architectures it can’t generally be known a priori which one will perform best on a particular dataset. Different architectures can be sensitive to different levels of miRNA input - imagine using a particularly strong repressor in the middle layer: low concentrations would be sufficient to completely knock out the output. The only way this classifier would produce output is if the levels of the upregulated miRNAs are exceedingly high on an absolute scale. This architecture wouldn’t work well to recognise cancer cells whose upregulated miRNAs are expressed in lower levels. Therefore, deciding on an architecture is also an optimization problem that should be solved for a specific dataset.</p>
+
<p>Optimizing the logical circuit therefore requires selecting an architecture first (to have a concrete evaluation function). However, if there are many candidate architectures it can’t generally be known a priori which one will perform best on a particular dataset. Different architectures can be sensitive to different levels of miRNA input - imagine using a particularly strong repressor in the middle layer: low concentrations would be sufficient to completely knock out the output. The only way this classifier would produce output is if the levels of the upregulated miRNAs are exceedingly high on an absolute scale. This architecture wouldn’t work well to recognize cancer cells whose upregulated miRNAs are expressed in lower levels. Therefore, deciding on an architecture is also an optimization problem that should be solved for a specific dataset.</p>
 
<p>Contrary to the huge search space in circuit optimization, the biologically realizable architectures are few and each can be examined closely. To optimize across the architectural axis then, one need only find the best classifier circuit given the current architecture’s evaluation function. In the following section, we examine the various architectures in more detail.</p>
 
<p>Contrary to the huge search space in circuit optimization, the biologically realizable architectures are few and each can be examined closely. To optimize across the architectural axis then, one need only find the best classifier circuit given the current architecture’s evaluation function. In the following section, we examine the various architectures in more detail.</p>
 
</div>
 
</div>
Line 533: Line 533:
 
<p>The second architecture removes the parallel inhibition mechanism by FF4 and loosens the coupling of low levels of upregulated miR to the output. At the same time, by adding upregulated miR targets at the middle layer, it increases the coupling of high levels of upregulated miR to the output.</p>
 
<p>The second architecture removes the parallel inhibition mechanism by FF4 and loosens the coupling of low levels of upregulated miR to the output. At the same time, by adding upregulated miR targets at the middle layer, it increases the coupling of high levels of upregulated miR to the output.</p>
 
<!--3.5--><div style='text-align: center'><img class='sub_images' src='https://static.igem.org/mediawiki/2017/8/8f/Greekom_Design_LacITargets3png.png' /></div>
 
<!--3.5--><div style='text-align: center'><img class='sub_images' src='https://static.igem.org/mediawiki/2017/8/8f/Greekom_Design_LacITargets3png.png' /></div>
<p>By removing upregulated miR targets on the upper layer, the 3rd architecture allows greater LacI leakage when the upregulated miR are high.</p>
+
<p>By removing upregulated miR targets on the upper layer, the third architecture allows greater LacI leakage when the upregulated miR are high.</p>
 
<!--4.7--><div style='text-align: center'><img class='sub_images' src='https://static.igem.org/mediawiki/2017/7/76/Greekom_Design_LacITargets2.png' /></div>
 
<!--4.7--><div style='text-align: center'><img class='sub_images' src='https://static.igem.org/mediawiki/2017/7/76/Greekom_Design_LacITargets2.png' /></div>
<p>The 4<sup>th</sup> architecture is very similar to the 3<sup>rd</sup>, only simpler to implement.</p>
+
<p>The fourth architecture is very similar to the third, only simpler to implement.</p>
 
<!--5.9--><div style='text-align: center'><img class='sub_images' src='https://static.igem.org/mediawiki/2017/5/5e/Greekom_Design_LacIFF4Targets2.png' /></div>
 
<!--5.9--><div style='text-align: center'><img class='sub_images' src='https://static.igem.org/mediawiki/2017/5/5e/Greekom_Design_LacIFF4Targets2.png' /></div>
<p>The 5<sup>th</sup> architecture again combines the complementary couplings provided by LacI and FF4</p>
+
<p>The fifth architecture again combines the complementary couplings provided by LacI and FF4.</p>
 
<!--6.10--><div style='text-align: center'><img class='sub_images' src='https://static.igem.org/mediawiki/2017/a/ad/Greekom_Design_LacIFF4Targets3.png' /></div>
 
<!--6.10--><div style='text-align: center'><img class='sub_images' src='https://static.igem.org/mediawiki/2017/a/ad/Greekom_Design_LacIFF4Targets3.png' /></div>
<p>By reestablishing the upper layer, we get architecture 6.</p>
+
<p>By reestablishing the upper layer, we get architecture six.</p>
 
<!--7.11--><div style='text-align: center'><img class='sub_images' src='https://static.igem.org/mediawiki/2017/f/f2/Greekom_Design_FF4.png' /></div>
 
<!--7.11--><div style='text-align: center'><img class='sub_images' src='https://static.igem.org/mediawiki/2017/f/f2/Greekom_Design_FF4.png' /></div>
<p>And by removing LacI we get architecture 9.</p>
+
<p>And by removing LacI we get architecture seven.</p>
 
</div>
 
</div>
 
</section>
 
</section>
Line 549: Line 549:
 
<section>
 
<section>
 
<article>
 
<article>
<header><strong class='sub_headers'>LacI & FF4</strong></header>
+
<header><strong class='sub_headers'>Transcriptional vs Post-transcriptional Inversion</strong></header>
 
<section class='sub_sections'>
 
<section class='sub_sections'>
 
<div style='text-align: justify;'>
 
<div style='text-align: justify;'>
 
<p>The basic variable element among these architectures is the method of double inversion. The two middle layer coupling mechanisms we explore, LacI and FF4, are used individually or combined. To assess the significance of this decision, we explore the input-output relationship for the corresponding architectures by scanning a pair of input upregulated miRNAs connected together in the same OR gate along the biologically plausible range they can cover.</p>
 
<p>The basic variable element among these architectures is the method of double inversion. The two middle layer coupling mechanisms we explore, LacI and FF4, are used individually or combined. To assess the significance of this decision, we explore the input-output relationship for the corresponding architectures by scanning a pair of input upregulated miRNAs connected together in the same OR gate along the biologically plausible range they can cover.</p>
 
<div style='text-align: center'><img class='sub_images' src='https://static.igem.org/mediawiki/2017/b/ba/Greekom_pandorra_firstimage.png' /></div>
 
<div style='text-align: center'><img class='sub_images' src='https://static.igem.org/mediawiki/2017/b/ba/Greekom_pandorra_firstimage.png' /></div>
<p>There is stark difference in their behaviour. LacI maps low counts of both the miRNAs to relatively low output levels (OUT<sub>max</sub> = 10000) and high counts of either on to high output levels, thus implementing the OR gate. Its transition occurs very rapidly at relatively low miR counts, while its on/off ratio is 20. FF4 on the other hand is even more effective at mapping low levels of both miR to low output levels and high levels of both to high output, but maps high levels of only one of them to middle output, therefore implementing an arithmetic + gate more than a logical OR gate. Its transition is much more gradual than LacI’s, leaving a large zone of uncertain output, while its overall on/off ratio is 15. It also appears the FF4’s presence masks the effect of LacI.</p>
+
<p>There is stark difference in their behaviour. LacI maps low counts of both the miRNAs to relatively low output levels (OUT<sub>max</sub> = 10000) and high counts of either on to high output levels, thus implementing the OR gate. Its transition occurs very rapidly at relatively low miRNA counts, while its on/off ratio is 20. FF4 on the other hand is even more effective at mapping low levels of both miRNA to low output levels and high levels of both to high output, but maps high levels of only one of them to middle output, therefore implementing an arithmetic AND gate more than a logical OR gate. Its transition is much more gradual than LacI’s, leaving a large zone of uncertain output, while its overall on/off ratio is 15. It also appears that the FF4’s presence masks the effect of LacI.</p>
 
<div style='text-align: center'><img class='sub_images' src='https://static.igem.org/mediawiki/2017/3/3a/Greekom_pandorrasecond.png' /></div>
 
<div style='text-align: center'><img class='sub_images' src='https://static.igem.org/mediawiki/2017/3/3a/Greekom_pandorrasecond.png' /></div>
<p>AND gates are implemented more successfully, at least for upregulated miR. Although the overall on/off ratio remains the same, the output is more consistent with a logical expression when one miR is high and the other is low.</p>
+
<p>AND gates are implemented more successfully, at least for upregulated miRNAs. Although the overall on/off ratio remains the same, the output is more consistent with a logical expression when one miRNA is high and the other is low.</p>
<p>Overall this analysis shows that LacI is more appropriate to implement logic gates. However it is not a silver bullet. Its transition occurs at a point that might be inappropriate for some datasets, leaving even the miRNAs that are “low” on the positive side of the output. </p>
+
<p>Overall this analysis shows that LacI is more appropriate to implement logic gates. However, it is not a silver bullet. Its transition occurs at a point that might be inappropriate for some datasets, leaving even the miRNAs that are “low” on the positive side of the output. </p>
 
</div>
 
</div>
 
</section>
 
</section>
Line 565: Line 565:
 
<section>
 
<section>
 
<article>
 
<article>
<header><strong class='sub_headers'>Target Site Repeats</strong></header>
+
<header><strong class='sub_headers'>miRA Target Tandem Repeats</strong></header>
 
<section class='sub_sections'>
 
<section class='sub_sections'>
 
<div style='text-align: justify;'>
 
<div style='text-align: justify;'>
Line 592: Line 592:
 
</table></div>
 
</table></div>
 
<p>Apparently the best architectures are indeed the ones that use LacI in the middle layer and don’t use FF4 (which overrules LacI’s influence).</p>
 
<p>Apparently the best architectures are indeed the ones that use LacI in the middle layer and don’t use FF4 (which overrules LacI’s influence).</p>
<p>The LacI architectures are optimal with a very small number of inputs. The others have more inputs, but due to the way the optimization works, many of those inputs might be of only marginal significance. It is then in our best interests from a design perspective to attempt to simplify the proposed circuits - a process called “pruning”. Taking cues from LacI, we attempt to prune the other architectures to the same small circuit, which is a subset of all the larger circuits.</p>
+
<p>The LacI architectures are optimal with a very small number of inputs. The others have more inputs, but due to the way the optimization works, many of those inputs might be of only marginal significance. It is then in our best interest from a designer's perspective to attempt to simplify the proposed circuits - a process called “pruning”. Taking cues from LacI, we attempt to prune the other architectures to the same small circuit, which is a subset of all the larger circuits.</p>
<p><i>Experiments are expensive and models are useful lies</i>; acknowledging this statement, it is also useful to put the results of our model in a broader perspective provided by the literature and increase our chances of a successful experiment. Xie [<a href='#ref2a'>2</a>] suggests using miR-373, while miR-21 (which also comes up in one of our architectures) comes from a widespread and very significant oncogene. While examining miR-373, we notice it shares the same seed sequence with miR-372, which means that it will cause RNAi for the same targets as miR-372. The optimization algorithm takes this similarity into account at a preprocessing step, adding the expressions of miR-372 and miR-373 together and reporting only miR-372.</p>
+
<p><i>Experiments are expensive and models are useful lies</i>; acknowledging this statement, it is also useful to put the results of our model in a broader perspective provided by the literature and increase our chances of a successful experiment. miR-21 was included due to exceptionally high expression level in most cance cell lines but not in the majority of healty and immortalized tissues [<a href='#ref2a'>2</a>]. While examining miR-373, we notice it shares the same seed sequence with miR-372. The optimization algorithm takes this similarity into account at a preprocessing step, adding the expressions of miR-372 and miR-373 together and reporting only miR-372.</p>
 
<p>Putting all these elements together leads us to propose the following circuit for the wet lab experiments:</p>
 
<p>Putting all these elements together leads us to propose the following circuit for the wet lab experiments:</p>
 
<ul>
 
<ul>
Line 638: Line 638:
 
<section class='sub_sections'>
 
<section class='sub_sections'>
 
<div style='text-align: justify;'>
 
<div style='text-align: justify;'>
<p>We designed experiments to implement this final classifier circuit in the lab. The LacI promoter, CAGop, was ordered to be synthesized, but unfortunately synthesis proved very difficult due to the high GC content and the construct arrived at 31/10. Necessity forced us to fall back to an alternate architecture for which we already had the parts: architecture 7 (FF4). Thanks to the way we had structured the circuit elements, we also had some leeway in choosing which miRNA targets to actually include: miR-21, miR-372 OR miR-372, or both. We reevaluate these 3 options for architecture 7 and also compute the efficiency of the circuits on datasets with HEK and A549 as controls, apart from healthy tissue samples.</p>
+
<p>We designed experiments to implement this final classifier circuit in the lab. The promoter bearing LacO operator sites, CAGop, was ordered to be synthesized, but unfortunately synthesis proved very difficult due to the high GC content and the construct arrived at the 31st of October. Necessity forced us to fall back to an alternate architecture for which we already had the Parts: architecture 7 (FF4). Thanks to the way we had structured the circuit elements, we also had some leeway in choosing which miRNA targets to actually include: miR-21, miR-372 OR miR-372, or both. We reevaluate these 3 options for architecture 7 and also compute the efficiency of the circuits on datasets with HEK-293 and A549 as controls, apart from healthy tissue samples.</p>
 
<p>Expected fold change logarithm for healthy samples:</p>
 
<p>Expected fold change logarithm for healthy samples:</p>
 
<table style='position: relative; left: 50%; margin-right: -50%; transform: translate(-50%, 0%);'>
 
<table style='position: relative; left: 50%; margin-right: -50%; transform: translate(-50%, 0%);'>
Line 651: Line 651:
 
</tr>
 
</tr>
 
</table>
 
</table>
<p>Expected fold change logarithm for the HEK cell line</p>
+
<p>Expected fold change logarithm for the HEK-293 cell line:</p>
 
<table style='position: relative; left: 50%; margin-right: -50%; transform: translate(-50%, 0%);'>
 
<table style='position: relative; left: 50%; margin-right: -50%; transform: translate(-50%, 0%);'>
 
<tr>
 
<tr>
Line 663: Line 663:
 
</tr>
 
</tr>
 
</table>
 
</table>
<p>Expected fold change logarithm for the A549 cell line</p>
+
<p>Expected fold change logarithm for the A549 cell line:</p>
 
<table style='position: relative; left: 50%; margin-right: -50%; transform: translate(-50%, 0%);'>
 
<table style='position: relative; left: 50%; margin-right: -50%; transform: translate(-50%, 0%);'>
 
<tr>
 
<tr>

Revision as of 02:36, 16 December 2017

pANDORRA Design Algorithm
“If you know yourself and you know your enemy, you shouldn’t fear the result of a thousand battles”
-Sun Tzu

If knowledge is power, then modeling is the key to a successful engineering endeavour. Our modeling of the biophysical characteristics of the classifier circuits that follows in this section is largely based upon the exciting work of research teams led by Professors Z. Xie and Y. Benenson [1, 2]. The latter’s newest paper [2] faces the same questions: it analyzes the modeling process and optimizes the classifier’s logical expression. Standing on their shoulders, we expand upon their work by analyzing architectural variations of the classifier circuits, which add an additional optimization axis.

Architecture of a cell-type classifier

pANDORRA is based on a family of layered genetic circuits that aim to biophysically implement logical expressions involving miRNA molecules in 2 basic roles: upregulated or downregulated (on the target cells in respect to the control). The family is characterized by a basic structure emanating from these two roles. The target gene (the system’s output) is controlled by a promoter and RNA interference (RNAi) from the downregulated miRNAs and forms the lower layer of the system. The upper layer comprises target sites for the upregulated miRNAs and an inhibitor for the lower layer.

This basic architecture can be significantly improved with a double-inversion module for the the upper layer [1]. It takes the form of a middle layer with a promoter that is activated by the upper layer’s product and in turn produces the inhibitor for the lower layer. This extra part significantly improves the system’s efficiency.

But how do we define the system’s efficiency and how do we optimize its architecture, out of the many options left open by the previous definitions? And after finding the optimal architecture, how do we choose the optimal miRNA targets to separate particular cell groups? These are the basic questions we set out to answer by modeling the pANDORRA system.

Modeling classifiers

The main task of a classifier system, given a group of target and control cells, is to produce large amounts of output when inside a target cell and little output when inside a control cell. Ideally, it should function as a logical function: maximum output expression in all target cells and zero in all controls. We will therefore judge its efficiency by how close it is to the ideal. We define the fold change of a concrete classifier instance as the average output margin between the two groups.

The circuit’s layered design naturally guides the modeling effort in two paths: estimating each layer’s output and the effects this output has on the next layer. Basic elements of the model therefore would be relationships between the concentrations of the involved regulatory species and the outputs of the regulated elements.

Take the RNAi mechanism: let Pmax be the unregulated (maximum) output concentration and C* the miRNA concentration that results in a 50% reduction of the output. Then the relationship between the output [P] and the input [miRNA] is:

[ P ] = P m a x 1 1 + [ m i R N A ] C

Next question: if P is regulated by 2 miRNAs together, how do they interact? Following the same logic as in [2], we assume that the behave additively. If each miRNA has a Ci* concentration of half repression and n is the number of miRNAs, the cumulative regulation is:

[ P ] = P m a x 1 1 + i = 1 n [ m i R N A i ] C i

If there are multiple target sites for a particular miRNA we can assume that their effect on interference will also be additive, so if there are ri target repeats for miRNAi, the output will be:

[ P ] = P m a x 1 1 + i = 1 n r i [ m i R N A i ] C i

To simplify the following expressions, we can define the interference on a scale of 0 to 1 based on the above:

R N A i = 1 1 + i = 1 n r i [ m i R N A i ] C i

These observations can provide the output of the classifier’s upper layer. A valid design is not limited to 1 OR gate, therefore the total output (let’s assume the output is rtTA) is the sum of each gate’s output:

[ r t T A ] = r t T A m a x g = 1 m R N A i g

rtTA is an inducer of the pTRE promoter and its induction can be modelled on a scale of 0 to 1 with the use of the reaction’s dissociation constant Kd as follows:

A = r t T A K d r t T A

The second layer can be regulated in 2 ways: by rtTA through its promoter and by RNA interference, like the other layers. The first regulation is transcriptional, whereas the second post-transcriptional. Let’s assume the outputs of the second layer can be the protein LacI and the miRNA FF4. LacI will be affected by both regulatory mechanisms; in our particular implementation though FF4 will only be affected by the transcriptional regulation due to the FF4 being produced. Their levels would be then:

[ L a c I ] = [ L a c I ] m a x A ( r t T A ) R N A i ( up_miR ) [ L a c I ] = [ L a c I ] m a x A ( r t T A )

The final layer’s output applies the exact same logic:

[ O u t p u t ] = [ O u t p u t ] m a x R N A i ( FF4,down_miR ) A ( L a c I )

This biophysical model expresses an input-output relationship between the miR inputs and the classifier output. Given an miRNA expression dataset, we can calculate the expected output for each sample and evaluate the performance of a concrete classifier.

Classifier design process
Circuit optimization algorithm

pANDORRA’s goal is to discriminate 2 groups of cells. If there is an miRNA expression dataset for these cells, we need to find the optimal circuit inputs that maximize output in cancer cells and minimize it in control cells. Thanks to the evaluation function derived above, if we have a way to generate candidate circuits, we can employ a search method to optimize the logical circuit.

The search space is huge and looking at every possible combination of circuit inputs is impractical; the search is guided instead by a genetic algorithm. The search begins at the simplest circuits: those with a single up- or downregulated miRNA. At each search step, the pool of candidate circuits is evaluated and only the top best-performing are kept. These are then used as a seed to propose new circuits by randomly recombining them together or adding to them single miRNAs. This process is repeated until the best-performing circuit isn’t dethroned for enough steps [2]. The parallels between this process and natural selection and genetic recombination are obvious, justifying the name “genetic algorithm”.

Optimizing the logical circuit therefore requires selecting an architecture first (to have a concrete evaluation function). However, if there are many candidate architectures it can’t generally be known a priori which one will perform best on a particular dataset. Different architectures can be sensitive to different levels of miRNA input - imagine using a particularly strong repressor in the middle layer: low concentrations would be sufficient to completely knock out the output. The only way this classifier would produce output is if the levels of the upregulated miRNAs are exceedingly high on an absolute scale. This architecture wouldn’t work well to recognize cancer cells whose upregulated miRNAs are expressed in lower levels. Therefore, deciding on an architecture is also an optimization problem that should be solved for a specific dataset.

Contrary to the huge search space in circuit optimization, the biologically realizable architectures are few and each can be examined closely. To optimize across the architectural axis then, one need only find the best classifier circuit given the current architecture’s evaluation function. In the following section, we examine the various architectures in more detail.

Classifier Architectures

The first proposed architecture is straightforward: it accommodates upregulated miR on the top layer, downregulated miR directly on the output layer and couples the upregulated ones with 2 parallel mechanisms in the middle layer: transcriptional repression of the output layer by LacI and post-transcriptional inhibition by FF4.

The second architecture removes the parallel inhibition mechanism by FF4 and loosens the coupling of low levels of upregulated miR to the output. At the same time, by adding upregulated miR targets at the middle layer, it increases the coupling of high levels of upregulated miR to the output.

By removing upregulated miR targets on the upper layer, the third architecture allows greater LacI leakage when the upregulated miR are high.

The fourth architecture is very similar to the third, only simpler to implement.

The fifth architecture again combines the complementary couplings provided by LacI and FF4.

By reestablishing the upper layer, we get architecture six.

And by removing LacI we get architecture seven.

Transcriptional vs Post-transcriptional Inversion

The basic variable element among these architectures is the method of double inversion. The two middle layer coupling mechanisms we explore, LacI and FF4, are used individually or combined. To assess the significance of this decision, we explore the input-output relationship for the corresponding architectures by scanning a pair of input upregulated miRNAs connected together in the same OR gate along the biologically plausible range they can cover.

There is stark difference in their behaviour. LacI maps low counts of both the miRNAs to relatively low output levels (OUTmax = 10000) and high counts of either on to high output levels, thus implementing the OR gate. Its transition occurs very rapidly at relatively low miRNA counts, while its on/off ratio is 20. FF4 on the other hand is even more effective at mapping low levels of both miRNA to low output levels and high levels of both to high output, but maps high levels of only one of them to middle output, therefore implementing an arithmetic AND gate more than a logical OR gate. Its transition is much more gradual than LacI’s, leaving a large zone of uncertain output, while its overall on/off ratio is 15. It also appears that the FF4’s presence masks the effect of LacI.

AND gates are implemented more successfully, at least for upregulated miRNAs. Although the overall on/off ratio remains the same, the output is more consistent with a logical expression when one miRNA is high and the other is low.

Overall this analysis shows that LacI is more appropriate to implement logic gates. However, it is not a silver bullet. Its transition occurs at a point that might be inappropriate for some datasets, leaving even the miRNAs that are “low” on the positive side of the output.

miRA Target Tandem Repeats

An interesting possibility is the adaptation of the number of target sites where each miRNA can attach to cause interference. It has been shown by [3] that there is little interaction between multiple miRNA and RISC complexes attaching next to each other, allowing us to model the overall dynamics additively. This assumption is challenged when there are low copy numbers of the participating molecules, but let’s follow it as a working hypothesis. Additive RNAi dynamics implies that, given an interference strength x by some miRNA for 1 repeat of the target site, if there are multiple repeats the interference becomes n*x.

To examine the effect of target site repeats, we’ll use an example dataset and circuit on the previous architectures.

Our Caco-2 and healthy dataset was obtained from the work by Cummins et al., 2006 [4]. The miRNA expression data regarding the HEK-293 and A549 cell lines were retrieved from the mammalian microRNA expression atlas [5]

Architecture & Circuit Optimization

We use the methods and insights derived so far to implement classifiers for our lab experiments. The experimental goal is to prove the ability of an optimized classifier to target cells from the Caco-2 cell line versus a healthy tissue control. Due to practical considerations, the controls on the actual experiments were the HEK-293 and A549 cell lines.

Running the circuit optimization algorithm on a dataset comprising Caco-2 and healthy samples [4] for the various architectures suggests the following circuits for each:

- where “cMargin” is the logarithm of the fold change

Apparently the best architectures are indeed the ones that use LacI in the middle layer and don’t use FF4 (which overrules LacI’s influence).

The LacI architectures are optimal with a very small number of inputs. The others have more inputs, but due to the way the optimization works, many of those inputs might be of only marginal significance. It is then in our best interest from a designer's perspective to attempt to simplify the proposed circuits - a process called “pruning”. Taking cues from LacI, we attempt to prune the other architectures to the same small circuit, which is a subset of all the larger circuits.

Experiments are expensive and models are useful lies; acknowledging this statement, it is also useful to put the results of our model in a broader perspective provided by the literature and increase our chances of a successful experiment. miR-21 was included due to exceptionally high expression level in most cance cell lines but not in the majority of healty and immortalized tissues [2]. While examining miR-373, we notice it shares the same seed sequence with miR-372. The optimization algorithm takes this similarity into account at a preprocessing step, adding the expressions of miR-372 and miR-373 together and reporting only miR-372.

Putting all these elements together leads us to propose the following circuit for the wet lab experiments:

  • Architecture: 2 (LacI + miR targets at middle layer)
  • Circuit:
    hsa-mir-372 OR hsa-mir-373
    AND
    hsa-mir-21
    AND
    ~hsa-mir-143
    AND
    ~hsa-mir-145
Re-evaluation of the proposed circuit

Although we’ve determined architecture 2 to be optimal, we reevaluate the proposed circuit across all the architectures. The resulting logarithm of the fold change is:

arch1arch2arch3arch4arch5arch6arch7
1.452.692.642.650.901.670.77

Architecture 2 retains the same high fold change with the proposed circuit.

Implementation problems and fallback

We designed experiments to implement this final classifier circuit in the lab. The promoter bearing LacO operator sites, CAGop, was ordered to be synthesized, but unfortunately synthesis proved very difficult due to the high GC content and the construct arrived at the 31st of October. Necessity forced us to fall back to an alternate architecture for which we already had the Parts: architecture 7 (FF4). Thanks to the way we had structured the circuit elements, we also had some leeway in choosing which miRNA targets to actually include: miR-21, miR-372 OR miR-372, or both. We reevaluate these 3 options for architecture 7 and also compute the efficiency of the circuits on datasets with HEK-293 and A549 as controls, apart from healthy tissue samples.

Expected fold change logarithm for healthy samples:

mir21 & ~mir143 & ~mir1450.835
mir372|mir373 & ~mir143 & ~mir1450.881
mir21 & mir372|mir373 & ~mir143 & ~mir1450.766

Expected fold change logarithm for the HEK-293 cell line:

mir21 & ~mir143 & ~mir1451.572
mir372|mir373 & ~mir143 & ~mir1451.038
mir21 & mir372|mir373 & ~mir143 & ~mir1451.054

Expected fold change logarithm for the A549 cell line:

mir21 & ~mir143 & ~mir1450.620
mir372|mir373 & ~mir143 & ~mir1451.138
mir21 & mir372|mir373 & ~mir143 & ~mir1451.068
References

[1] Xie, Z., Wroblewska, L., Prochazka, L., Weiss, R., & Benenson, Y. (2011). Multi-input RNAi-based logic circuit for identification of specific cancer cells. Science, 333(6047), 1307-1311.

[2] Mohammadi, P., Beerenwinkel, N., & Benenson, Y. (2017). Automated Design of Synthetic Cell Classifier Circuits Using a Two-Step Optimization Strategy. Cell Systems, 4(2), 207-218.

[3] Schmitz, U., Lai, X., Winter, F., Wolkenhauer, O., Vera, J., & Gupta, S. K. (2014). Cooperative gene regulation by microRNA pairs and their identification using a computational workflow. Nucleic acids research, 42(12), 7539-7552.

[4] Cummins, J. M., He, Y., Leary, R. J., Pagliarini, R., Diaz, L. A., Sjoblom, T., ... & Raymond, C. K. (2006). The colorectal microRNAome. Proceedings of the National Academy of Sciences of the United States of America, 103(10), 3687-3692.

[5] Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., Iovino, N., Aravin, A., ... & Lin, C. (2007). A mammalian microRNA expression atlas based on small RNA library sequencing. Cell, 129(7), 1401-1414.