Team:MIT/Model

SpliceMIT – Splice Modelling Intronic Technology

SpliceMIT is a tool to generate antisense oligonucleotides (ASO) for a given DNA/RNA sequence, and then analyze and output the most effective ASOs. This tool could be used for a variety of scenarios. For our test samples, SpliceMIT was used to generate and analyze the gRNA sequences (complementary to the region on the intron) in the CRISPR – dCas13a system to control alternative splicing by targeting mRNA sequences.

We used this program to generate some of our gRNA sequences.

This page briefly discusses the important concepts associated with our model. For more information, see our SpliceMIT Documentation


Check out the program on Github!


Overview

SpliceMIT is a tool to generate antisense oligonucleotides (ASO) for a given DNA/RNA sequence, and then analyze and output the most effective ASOs. This tool could be used for a variety of scenarios. For our test samples, SpliceMIT was used to generate and analyze the gRNA sequences (complementary to the region on the intron) in the CRISPR – dCas13a system to control alternative splicing by targeting mRNA sequences.

Mechanism

To compute and obtain the most effective antisense oligonucleotides, we considered four factors:

  • 1. GC content of the ASO
  • 2. Off-target binding
  • 3. RNA-binding proteins competition on the ASO binding site
  • 4. Secondary structure of the ASO

The general process is:

  • 1. Add additional features if needed (e.g. Add pre-gRNA secondary structure analysis part if the goal is to find gRNAs instead of ASOs)
  • 2. Ask user inputs, including DNA/RNA sequence, preferred ASO length, etc.
  • 3. Run the calculations that require online platform first.
  • 4. Store all data locally.
  • 5. Analyze all local data.
  • 6. Output highest ranked ASOs.

Anywhere in this process, the user could choose his or her preferred scoring or rank mechanism, or ideally all of them and compare to each other. (For our sample test trials, we have done data analysis comparing all three rank/scoring mechanisms. You could use the same file to analyze the results based on your own test samples.)

GC content:

Measure the amount of Guanine and Cytosine bases on the ASO, then calculate the proportion of GC bases. Based on previous research results, the effective GC content is between 35% and 70%[1] [2] [3]. Then the GC content value would be either calculated into an arbitrary score or used to apply cutoffs.

Off Target Binding:

Search for potential off-target binding with human cDNA.

RNA-binding proteins competition:

RNA-binding protein (RBP) competes with other types of proteins to bind at a region. The level of competition could affect the binding effectiveness of a certain ASO.

ASO Secondary Structure:

Here we calculate the probability of obtaining a secondary structure for each generated ASO. SpliceMIT calculates the probabilities of a base binding to another on Nupack web interface.

Ranking Methods

Arbitrary Score

An Arbitrary score is calculated respectively in each factor, and then the sum of those arbitrary score is used to rank the effectiveness and efficiency of the ASOs. This score is is based on the following factors:

  • GC content of the ASO
  • Off-target binding
  • RNA-binding proteins competition on the ASO binding site
  • Secondary structure of the ASO

Cutoff

The cut-off method preserves the basic scoring methods in the four sub-models.However, the weight and multiplication actors are removed. Instead, the model tests 10 sets of cut-off thresholds on 16 (currently completed) intron sequences.

Rank-Product

The Rank-Product method also preserves the basic scoring methods in the four sub-models except the removal of the weight and multiplication factors. The method derives from a biologically motivated test for the detection of differentially expressed genes in replicated microarray experiments23. It is a simple non-parametric statistical method based on ranks of fold changes.

Comparison of the Three Rank Methods

The three rank methods were also analyzed. For the cut-off method, the analysis used the 10th set of thresholds as shown in Table 2. For the arbitrary score and the rank product methods, only the top 20 ranked ASOs remained in the list. Then the program found the identical ASO sequences between two lists of outputs.