Cyclase of Nature and Pangu Algorithm
Out of 500 some existing amino acids nature only picks 20 as major building blocks to make proteins. To analyze these 20 molecules, a Matlab computer algorithm is scripted to measure physical-chemical distance among them, which is then summed up to score homologous proteins. We called the program Pangu, a Chinese legendary figure, to reflect the common shared ancestor of life on earth. We hope programs like Pangu would supplement standard methods of analyzing phylogenetic relationships between proteins and species. Guided by Pangu algorithm, we will synthesize an artificial squalene cyclase in cultured human cells to study basic physiology of cholesterol and related analogs. We hope the result will help to develop new methods to identify micro-nutrients importance for human health. Last but not least, we will continue our 2016 iGEM program iGUT, an E.coli-expressed beta-glucosidase for notoginseng processing to improve bio-availability.
Pangu Algorithm
As we all know, phylogeny analysis is one of the main methods to analyze evolutionary histories between species on Earth. Here we use another perspective which is amino-acid’s physical-chemical properties to do the analysis.
At first, we should build up a scoring matrix.We choose 20 representative factors to analyze the similarity between two different amino-acid. And to do our computation more easily, we use factorial analysis to reduce dimensions from 20 to 5. After that, we use multiple linear regression to predict the relationship among BLOSUM-62 and 5 factors X1, X2, X3, X4, X5.
Then, we can build up a phylogenetic trees by the scoring matrix we built up before. We use another way to build up phylogenetic trees which we call Pangu Building Up method. We define the distance between 2 sequence is
In which score12 represents the score between sequence1 and sequence2 by Needleman-Wunsch algorithm, and score11 represents the score between sequence1 and itself and so on. We use unweighted Pair Group Method Average. We chose two different sequences, and the distance between them is the shortest. Then we create a new sequence3, this sequence has the same distance between the sequence1 and sequence2 we use before. We add this sequence3 to sequence set and we delete sequence1 and sequence2. We do the same operations again and again until there are only two sequences remains in sequence set. There father sequence is the ancestor of the sequence set.
Pangu Phylogenetic Trees
▲ We chose two different sequences 1 and 2. And the distance between them is the shortest.
Create a new sequence3, this sequence has the same distance between sequences 1 and 2 we have chosen before.
▲ Delete sequence 1 and 2 from sequence set.
▲ Add sequence3 into sequence set.
…
▲ We do the same operations again and again.
Pangu Algorithm Application
In contrast to the traditional methods which are based on the statistics of existing databases, Pangu algorithm use the physical-chemical properties of amino acids to analyze the relatedness of different proteins and species, which gives us another method to work on the problems of evolution.
As you can see below, we chose the pangu100 and blosum100 matrices, and drew these two phylogenetic trees of 8 close-related cyclases from different species.
Pangu100_full-length(left) & Blosum100_full-length(right)
Pangu100_1-160(left) & Blosum100_1-160(right)
Pangu100_161-720(left) & Blosum100_161-720(right)
Pangu100_721-844(left) & Blosum100_721-844(right)
For the full-length protein sequence, it seems like two methods gave same result. But if we separate the whole sequence into three parts, 160, 560 and 124 residues and analyze again, Pangu 100 demonstrated remarkable robustness while Blosum 100 fluctuated dramatically.
From the preliminary test, Pangu algorithm seems to be a useful new method supplementing existing common phylogenetic analysis for protein comparison and evolution studies. Furthermore, the algorithm could be used to direct new protein design and synthetic biology.
Sponsorship
USTB
University of Science & Technology Beijing
USTB-CBE
University of Science & Technology Beijing
School of Chemistry and Biological Engineering
BCU
Beijing City University
YUNNAN NOTOGINSENG
An enterprise sponsor