This model is created to select the relative optimal structure of Locker B, one of the four Lockers based on our previous design, and improve the binding efficiency of Locker B and microRNA.

Theory design

In order to further optimize the structure of the default Locker in the project, we assert that the binding probability of base complementary pairing should be reduced in the sequence of the functional module when the Locker formed the secondary structure, thus increasing the binding probability of base complementary pairing between the functional module and the corresponding microRNA.

Figure 1. the predicted structure of Locker B

The two pairs of overhangs (The four pale gray segments in Figure 1) that we design to be added between the complementary pairing supporting module and the functional module can open the port where the supporting module is connected to the functional module, but different sequences may have a potential impact on the secondary structure of the Locker. So we design to build an overhang candidate library in order to find all possible sequences and predict the secondary structure of each sequence, compare them and select the relative optimal structure as the object of our experiment.

Reducing the probability of base complementary pairing occurred in the sequence of functional module in the formation of the secondary structure of Locker will result in the reduction of the number of the paired bases of entire Lockers. Therefore, we believe that for different sequences, when the number of base pairs reaches the minimum, functional module unmatched base number is the largest, the probability of corresponding to microRNA through the base complementary pairing is maximum and the efficiency is highest.

Methods

1.Building an overhang alternative library

Each overhang has 4 bases, so the number of all possible structures is :

For an overhang A, A₂(which is paired with A₁), A₃( which is paired with A ) and the reverse sequence A₁ are all included in the possible structures.

Take AGGC for example：

A :AGGC

A₁:CGGA

A₂:GCCT

A₃:GCCT

Given the one-to-one relationship between them, we consider that the ratio of the number of A, A₁, A₂ A₃ is 1: 1: 1: 1. Accordingly, after removing all A₁ and A₃, the number of overhangs left in the library is:

For some of the A and A₂, they have the same sequence so we have to subtract the corresponding number.

This means that the sequence itself is inversely complementary and the first two base pairs correspond one-to-one with the latter two base pairs, so the number of possible scenarios is:

The number of the rest is:

we use MATLAB to screen the above two situations and keep the remaining overhangs in the overhang alternative library.

2. Stitching sequence

Since the number of overhangs in the overhang library is 112 and we have to select two of them, which is the same as simple arrangement combination. Thus the number of possible Locker is:

It takes a lot of time to accomplish the secondary structure prediction of so many sequences, so we consider completing the sequence splicing and secondary structure prediction using the computer.

3. Predicting the structure of Locker B

We mimic the method of calculating the RNA sequence and use the minimum free energy algorithm to calculate and predict the secondary structure of the single - chain DNA. The RNA-related parameters are replaced by DNA parameters (Matthews model, 2004). The specific algorithm is described as follows:

Because the formation of base pairing can reduce the energy of DNA molecules and the structure is more stable, the minimum free energy algorithm considers that at certain temperature, RNA molecules can achieve a certain thermodynamic balance through the conformational adjustment, the free energy can be reduced to the minimum and the most stable structure will be formed, then the secondary structure is considered to be the true secondary structure of DNA. Its computational object is not the number of base pairs, but a complex set of energy parameters.

Figure 2. Plain structure drawing of the single - chain DNA

Assume that a₁,a₂,a₃...a_n is a given DNA sequence. Throughout this section, we let E_i,j denotes the minimum free energy of a_i,a_i+1...a_j, which is computed and stored in arrays by a dynamic programming algorithm corresponding to the following recursions.

we use the notation α_ij to denote the free energy of a stacked base pair (a_i,a_j),β_k to denote the free energy of a bulge,γ_k+1 to denote the free energy of an internal loop,ε_k+1-j'-i' to denote the free energy of a multiloop containing, while the free energy δ_j-i for a hairpin.

Here is the formula and description of the energy function of each substructure:

α_ij This function is used to calculate the stacking energy between two pairs of paired bases (a_i,a_j) and (a_i+1,a_j-1), which mainly relate to the various combinations of paired bases that make up this stack. Because complementary base-pairing reactions reduce the energy of DNA molecules, α_ij is generally negative.

Figure3. the stacking energy between two pairs of paired bases

β_k This function is used to calculate the energy of a bulge whose number of bases in the single-chain is k.

γ_k+1 This function is used to calculate the energy of an internal loop whose numbers of bases in two single-chains are k and l.

δ_j-iThis function is used to calculate the energy of a hairpin between pair (a_i,a_j).

ε_k+1-j'-i'This function is used to calculate the energy of a multiloop containing that starts from paired bases (a_i,a_j) and also has inner paired bases(a_i1,a_j1),(a_i2,a_j2)…(a_ik,a_jk).

ε_k+1-j'-i'=eM(i,j,i₁,j₁,i₂,j₂,...i_k,j_k,)

Figure4. the flow diagram of algorithm

Once E_1,n is computed, then the minimum free energy structure can be computed by tracebacks.

4. Selecting the structure with the maximum/minimum paired bases

In order to facilitate the comparison of the number of bases in each sequence, we directly read the number of the points in dot-bracket notation, and the sequence with the maximum points means that the number of unpaired bases is the largest and the sequence pairing base number is the least in the case of the length of all the Lockers is equal, which means structure is relatively optimal.

Results

By calculating we can get structures of the selected Lockers with the maximum and minimum paired bases, corresponding to the relatively worst and optimal structure.

The sequence of the optimal secondary structure:

AAGGGTAAGGATAATCGCACATGTTCTGCGGACCACTAAATCCTTACCCTTCGTAATCCTTGGCTACTGCCTGTCTGTGCCTGCTGTTCGGAAGGATTACG

The optimal secondary structure in dot-bracket notation:

((((((((((((...((((.......))))............))))))))))))(((((((((((((....)))....................))))))))))

Plain structure drawing of the optimal secondary structure:

Figure 3. Plain structure drawing of the optimal secondary structure

The sequence of the worst secondary structure:

AAGGGTAAGGATAACGGCACATGTTCTGCGGCCCACCAGCAAATCCTTACCCTTCGTAATCCTTACGGACTGCCTGTCTGTGCCTGCTGTGGCAAAGGATTACG

The worst secondary structure in dot-bracket notation:

((((((((((((((((.....)))).(((((....)).))).))))))))))))((((((((((((((((.....))))))...(((....)))))))))))))

Plain structure drawing of the worst secondary structure:

Figure 4. Plain structure drawing of the worst secondary structure

References

1 Zuker, M. & Sankoff, D. RNA secondary structures and their prediction. Bulletin of Mathematical Biology 46, 591-621 (1984).

2 Zuker, M., Mathews, D. H. & Turner, D. H. Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide. (Springer Netherlands, 1999

Team:NUDT CHINA/Model

Model