Francischlin (Talk | contribs) |
|||
(40 intermediate revisions by 5 users not shown) | |||
Line 43: | Line 43: | ||
} | } | ||
+ | h4{ | ||
+ | text-align: center; | ||
+ | margin-top: 0px; | ||
+ | margin-bottom: 50px !important; | ||
+ | } | ||
Line 440: | Line 445: | ||
</div> | </div> | ||
<p> | <p> | ||
− | Our database, a new genesis for Artificial Intelligence, strengthens the power of large datasets by the antifungal peptide prediction system on the basis of SCM with other optimization. The antifungal characteristic can be | + | Our database, a new genesis for Artificial Intelligence, strengthens the power of large datasets by the antifungal peptide prediction system on the basis of SCM with other optimization. The antifungal characteristic can be evaluated and interpreted only by sequence analysis. |
− | + | ||
</p> | </p> | ||
<p> | <p> | ||
− | Furthermore, we integrated all the relative data to form a complete antifungal database to achieve the query function of hosts, pathogens, and corresponding peptides. Combining two together, a novel Parabase database achieving | + | Furthermore, we integrated all the relative data to form a complete antifungal database to achieve the query function of hosts, pathogens, and corresponding peptides. Combining two together, a novel Parabase database achieving both new drug discovery and old drug repurposing for antifungal peptides is born. |
− | + | ||
</p> | </p> | ||
Line 459: | Line 462: | ||
<p> | <p> | ||
− | In order to evaluate peptide functions in a quicker and smarter way, we introduced SCM into making our antifungal peptide prediction system. With this applicable and interpretable tool, we are able to find potential target | + | In order to evaluate peptide functions in a quicker and smarter way, we introduced SCM into making our antifungal peptide prediction system. With this applicable and interpretable tool, we are able to find potential target peptides in a large number of unknown peptides, making the best use of vast data. |
− | + | ||
</p> | </p> | ||
<div class="sublist"> | <div class="sublist"> | ||
Line 467: | Line 469: | ||
<li>Datasets</li> | <li>Datasets</li> | ||
<li>The concept of the dipeptide and the weight</li> | <li>The concept of the dipeptide and the weight</li> | ||
− | <li> | + | <li>Intelligent Genetic Algorithm</li> |
</ol> | </ol> | ||
</div> | </div> | ||
Line 480: | Line 482: | ||
<p> | <p> | ||
− | SCM, based on Support Vector Machine (SVM), is a method originating from our instructor Shinn-Ying Ho. To measure the property of anti-fungus, we introduced SCM into our model to evaluate peptides’ antifungal functions with | + | SCM, based on Support Vector Machine (SVM), is a method originating from our instructor Shinn-Ying Ho. To measure the property of anti-fungus, we introduced SCM into our model to evaluate peptides’ antifungal functions with the perspective of biological information. |
− | + | ||
</p> | </p> | ||
<div class="sub_text"> | <div class="sub_text"> | ||
− | <h2>Datasets:</h2> | + | <h2>- Datasets:</h2> |
<p> | <p> | ||
− | We obtain our positive data from antifungal databases, such as cAMP, PhytAMP and papers we found in PubMed. We collected our negative data from peptides that are not annotated to be antifungal in | + | We obtain our positive data from antifungal databases, such as cAMP, PhytAMP and papers we found in PubMed. We collected our negative data from peptides that are not annotated to be antifungal in UniProt. |
</p> | </p> | ||
<p> | <p> | ||
− | We created the train dataset and test dataset by reducing the sequence identity of positive data and negative data and divide them into two | + | We created the train dataset and test dataset by reducing the sequence identity of positive data and negative data and divide them into two portions that each dataset has an equal amount of positive and negative data. |
</p> | </p> | ||
− | <h2>Dipeptide:</h2> | + | <h2>- Dipeptide:</h2> |
<p> | <p> | ||
Line 504: | Line 505: | ||
<p> | <p> | ||
− | A peptide that has more potentially antifungal dipeptides will more likely to be an antifungal peptide, vise versa. The total 400 individual dipeptide propensities are obtained by statistical discrimination between dipeptide | + | A peptide that has more potentially antifungal dipeptides will more likely to be an antifungal peptide, vise versa. The total 400 individual dipeptide propensities are obtained by statistical discrimination between dipeptide composition of the antifungal peptides and non-antifungal peptides. |
− | + | ||
</p> | </p> | ||
− | <h2>Dipeptide Frequency & Score:</h2> | + | <h2>- Dipeptide Frequency & Score:</h2> |
<p> | <p> | ||
Line 516: | Line 516: | ||
<img src="https://static.igem.org/mediawiki/2017/1/12/Ptp_score.png" width="60%" style="display: block; margin: auto;"> | <img src="https://static.igem.org/mediawiki/2017/1/12/Ptp_score.png" width="60%" style="display: block; margin: auto;"> | ||
− | <p> | + | <p style="margin-top: 20px;"> |
− | The score is obtained by summing each dipeptide frequency | + | The score is obtained by summing each dipeptide frequency of each peptide multiplies the weight to get a score. |
</p> | </p> | ||
− | <h2>Weight:</h2> | + | <h2>- Weight:</h2> |
<p> | <p> | ||
− | | + | The initial weight value for each dipeptide is the ratio of the dipeptide appearing in the positive datasets minus the ratio appearing in the negative datasets. The weight value is then further optimized by IGA. |
− | + | ||
</p> | </p> | ||
Line 530: | Line 530: | ||
<div class="latex">$$ P(ij) - N(ij) $$</div> | <div class="latex">$$ P(ij) - N(ij) $$</div> | ||
− | <h2>Selection of Weight: | + | <h2>- Selection of Weight: The Select Method:</h2> |
<p> | <p> | ||
− | We picked up two weights among all: the one that had the highest fitness value or the one selected by the Roulette method | + | We picked up two weights among all: the one that had the highest fitness value or the one selected by the Roulette method. These two scoring cards were used for crossover selection. |
</p> | </p> | ||
Line 544: | Line 544: | ||
</p> | </p> | ||
− | <h2>AUC:</h2> | + | <h2>- AUC:</h2> |
<p> | <p> | ||
− | The Area Under ROC curves which | + | The Area Under ROC curves which are viewed as a way to evaluate the model built. The closer to 1 of the value is, the higher accuracy of the prediction model has. |
</p> | </p> | ||
− | <h2>Roulette: </h2> | + | <h2>- Roulette: </h2> |
<p> | <p> | ||
Line 558: | Line 558: | ||
<h2> | <h2> | ||
− | IGA (intelligent genetic algorithm): | + | - IGA (intelligent genetic algorithm): |
</h2> | </h2> | ||
Line 602: | Line 602: | ||
<p style="margin-top: 50px;"> | <p style="margin-top: 50px;"> | ||
− | After we finished our prediction system, the next would be the integration of antifungal databases. There're several databases related to fungal infection | + | After we finished our prediction system, the next would be the integration of antifungal databases. There're several databases related to fungal infection on the internet yet lack of arrangement and integrity. The disorder of data would lead to the inconvenience for searching full information and end up to have the narrow- sighted absorbance of knowledge. |
− | + | ||
</p> | </p> | ||
Line 646: | Line 645: | ||
</p> | </p> | ||
− | <img src="https://static.igem.org/mediawiki/2017/2/2f/Ptp_cross.png" width=" | + | <img src="https://static.igem.org/mediawiki/2017/2/2f/Ptp_cross.png" width="40%" style="display: block; margin: auto;"> |
− | <p> | + | <p style="margin-bottom: 20px;"> |
In the end, we set up our Parabase website, presenting the antifungal prediction system and validated antifungal peptides relative data relationships. Please check out the final presentation in <span><a href="https://2017.igem.org/Team:NCTU_Formosa/Demonstrate" target="_blank">Demonstration</a></span>. | In the end, we set up our Parabase website, presenting the antifungal prediction system and validated antifungal peptides relative data relationships. Please check out the final presentation in <span><a href="https://2017.igem.org/Team:NCTU_Formosa/Demonstrate" target="_blank">Demonstration</a></span>. | ||
</p> | </p> | ||
Line 667: | Line 666: | ||
<p> For the antifungal database: the data amount we have collected </p> | <p> For the antifungal database: the data amount we have collected </p> | ||
<p> For the antifungal scoring system : </p> | <p> For the antifungal scoring system : </p> | ||
− | <ol style=" | + | <div class="sublist"> |
− | + | <ol style="font-size: 1.3em"> | |
− | + | <li>The ROC curve and the results of test data</li> | |
− | + | <li>Visualized antifungal scoring card</li> | |
− | + | <li>Discussion of the relationships of dipeptides and active sites</li> | |
+ | </ol> | ||
+ | </div> | ||
<p> For the achievement: the conclusion of what we’ve dedicated to humans</p> | <p> For the achievement: the conclusion of what we’ve dedicated to humans</p> | ||
Line 699: | Line 700: | ||
<img src="https://static.igem.org/mediawiki/2017/d/da/Ptp_result_photo1.png" width="60%" style="display: block; margin: auto;"> | <img src="https://static.igem.org/mediawiki/2017/d/da/Ptp_result_photo1.png" width="60%" style="display: block; margin: auto;"> | ||
− | < | + | <h4>Figure 1:<br> The test accuracy, the overall performance of classifying positive data as positive and negative data as negative, is 76%. The sensitivity, the performance of classifying positive data as positive, is 77%. The specitivity, |
− | + | the performance of classifying negative data as negative, is 76%. The suitable threshold value is 354, peptides score higher than this value is considered as antifungal peptide.</h4> | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
<p> | <p> | ||
(2)The score distribution between positive datasets and negative datasets | (2)The score distribution between positive datasets and negative datasets | ||
Line 710: | Line 710: | ||
<img src="https://static.igem.org/mediawiki/2017/b/b6/Ptp_result_photo2.png" width="60%" style="display: block; margin: auto;"> | <img src="https://static.igem.org/mediawiki/2017/b/b6/Ptp_result_photo2.png" width="60%" style="display: block; margin: auto;"> | ||
+ | <h4 style="margin-top: -20px;">Figure 2: The score distribution between positive datasets and negative datasets</h4> | ||
<p> | <p> | ||
Line 718: | Line 719: | ||
<img src="https://static.igem.org/mediawiki/2017/f/fb/Ptp_result_photo3.jpeg" width="60%" style="display: block; margin: auto;"> | <img src="https://static.igem.org/mediawiki/2017/f/fb/Ptp_result_photo3.jpeg" width="60%" style="display: block; margin: auto;"> | ||
+ | <h4>Figure 3: Final antifungal scoring card</h4> | ||
<h2> | <h2> | ||
Line 726: | Line 728: | ||
<h4 style=" text-align: center; margin-top: 0; margin-bottom: 50px;"> | <h4 style=" text-align: center; margin-top: 0; margin-bottom: 50px;"> | ||
− | The bar graph above showed the single amino acid score calculated from each dipeptide score. | + | Figure 4: The bar graph above showed the single amino acid score calculated from each dipeptide score. |
</h4> | </h4> | ||
<div class="sub_text"> | <div class="sub_text"> | ||
<h2> | <h2> | ||
− | Single Peptide Score Analysis: | + | - Single Peptide Score Analysis: |
</h2> | </h2> | ||
Line 743: | Line 745: | ||
<p> | <p> | ||
− | There are many antifungal peptides for plants and mammals that contain lots of Cysteine , such as Thionins, plant defensins, and more. For Glycine, there are also many Glycine-rich peptides from Insect's antifungal peptides. | + | There are many antifungal peptides for plants and mammals that contain lots of Cysteine, such as Thionins, plant defensins, and more. For Glycine, there are also many Glycine-rich peptides from Insect's antifungal peptides. |
</p> | </p> | ||
<p> | <p> | ||
− | For the 5 peptides(D, E, S, T, V | + | For the 5 peptides(D, E, S, T, V) of the lowest scores, four of them are hydrophilic, while most of the hydrophilic amino acids have a higher score (average score : 362.73 > threshold : 350). |
</p> | </p> | ||
Line 755: | Line 757: | ||
<h2> | <h2> | ||
− | 3D structure and active site: | + | - 3D structure and active site: |
</h2> | </h2> | ||
<p> | <p> | ||
− | To show the result of the scoring card, we visualized the peptides by drawing the dipeptide score on the peptide 3D structure. The region of a peptide become redder when the dipeptide score there is higher. Otherwise, the | + | To show the result of the scoring card, we visualized the peptides by drawing the dipeptide score on the peptide 3D structure. The region of a peptide become redder when the dipeptide score there is higher. Otherwise, the region becomes bluer when the dipeptide score there is lower. |
− | + | ||
</p> | </p> | ||
Line 771: | Line 772: | ||
</p> | </p> | ||
− | <img src="https://static.igem.org/mediawiki/2017/ | + | <img src="https://static.igem.org/mediawiki/2017/1/1c/Design_photo1.gif" width="60%" style="display: block; margin: auto;"> |
+ | <h4>Figure 5: <br>This picture is the 3D rotating gif of the Rs-AFP2 with scoring card visualize score on the peptide.As you can see, the N terminal of the peptide(on the top) and the 3sheet are the reddest part of the peptide. To our scoring system based on the SCM, it indicated that these two regions are important regions that determined the whole peptide sequence as an antifungal peptide or not.</h4> | ||
<p style="margin-top: 50px; margin-bottom: 50px"> | <p style="margin-top: 50px; margin-bottom: 50px"> | ||
− | It seemed that the N term of the peptide and the 2sheet were the reddest. To our antifungal peptide prediction system based on the SCM, it indicated that these two regions were important regions that determined the full peptide | + | It seemed that the N term of the peptide and the 2sheet were the reddest. To our antifungal peptide prediction system based on the SCM, it indicated that these two regions were important regions that determined the full peptide sequence as an antifungal peptide or not. |
− | + | ||
</p> | </p> | ||
− | <img src="https://static.igem.org/mediawiki/2017/f/ | + | <img src="https://static.igem.org/mediawiki/2017/f/f0/Design_photo2.gif" width="60%" style="display: block; margin: auto;"> |
+ | <h4>Figure 6: <br>This is a 3D rotating gif picture of Rs-AFP2 peptide with red color labeled on it’s active region which found in the paper<sup>[1]</sup>.According to this paper, showing that the major active site are between the β2 and β3 loop, from <i>Ala<sup>32</sup></i> and <i>Phe<sup>49</sup></i> and some activity was found in the N-terminal part of the protein.</h4> | ||
<p style="margin-top: 50px;"> | <p style="margin-top: 50px;"> | ||
Line 798: | Line 800: | ||
<p> | <p> | ||
− | We created a powerful database that helps iGEMers who aims to solve agricultural problems caused by fungus or even other disease cases by the framework. Our database has a convenient searching tool that can quickly find out | + | We created a powerful database that helps iGEMers who aims to solve agricultural problems caused by fungus or even other disease cases by the framework. Our database has a convenient searching tool that can quickly find out effective antifungal peptides by searching host species or fungal pathogens. Our database also enables users to find out potential new antifungal peptides by applying the antifungal prediction system. |
− | + | ||
</p> | </p> | ||
Line 810: | Line 811: | ||
<h6>Reference</h6> | <h6>Reference</h6> | ||
</div> | </div> | ||
− | <p><small>W. M. M. Schaaper,Synthetic peptides derived from the β2−β3 loop of Raphanus sativus antifungal protein 2 that mimic the active site, http://onlinelibrary.wiley.com/doi/10.1034/j.1399-3011.2001.00842.x/full, 2001</small></p> | + | <p>[1]<small>W. M. M. Schaaper,Synthetic peptides derived from the β2−β3 loop of Raphanus sativus antifungal protein 2 that mimic the active site, http://onlinelibrary.wiley.com/doi/10.1034/j.1399-3011.2001.00842.x/full, 2001</small></p> |
</div> | </div> | ||
<div id="fut"></div> | <div id="fut"></div> |
Latest revision as of 02:59, 2 November 2017
Overview
Our database, a new genesis for Artificial Intelligence, strengthens the power of large datasets by the antifungal peptide prediction system on the basis of SCM with other optimization. The antifungal characteristic can be evaluated and interpreted only by sequence analysis.
Furthermore, we integrated all the relative data to form a complete antifungal database to achieve the query function of hosts, pathogens, and corresponding peptides. Combining two together, a novel Parabase database achieving both new drug discovery and old drug repurposing for antifungal peptides is born.
Antifungal Prediction System
In order to evaluate peptide functions in a quicker and smarter way, we introduced SCM into making our antifungal peptide prediction system. With this applicable and interpretable tool, we are able to find potential target peptides in a large number of unknown peptides, making the best use of vast data.
Content:
- Datasets
- The concept of the dipeptide and the weight
- Intelligent Genetic Algorithm
For the prediction of our peptides, we integrated Scoring Card Method and modified to our antifungal peptide prediction system. The major advantage of the method is its simplicity, interpretability, and acceptable accuracy.
SCM, based on Support Vector Machine (SVM), is a method originating from our instructor Shinn-Ying Ho. To measure the property of anti-fungus, we introduced SCM into our model to evaluate peptides’ antifungal functions with the perspective of biological information.
- Datasets:
We obtain our positive data from antifungal databases, such as cAMP, PhytAMP and papers we found in PubMed. We collected our negative data from peptides that are not annotated to be antifungal in UniProt.
We created the train dataset and test dataset by reducing the sequence identity of positive data and negative data and divide them into two portions that each dataset has an equal amount of positive and negative data.
- Dipeptide:
The premise of this method is to hypothesize the function of peptides correspond to their sequences. We viewed two amino acids as a group to form the smallest functional unit, defined dipeptides.
A peptide that has more potentially antifungal dipeptides will more likely to be an antifungal peptide, vise versa. The total 400 individual dipeptide propensities are obtained by statistical discrimination between dipeptide composition of the antifungal peptides and non-antifungal peptides.
- Dipeptide Frequency & Score:
Each dipeptide frequency (400 types) of each peptide multiplies the weight to get a score.
The score is obtained by summing each dipeptide frequency of each peptide multiplies the weight to get a score.
- Weight:
The initial weight value for each dipeptide is the ratio of the dipeptide appearing in the positive datasets minus the ratio appearing in the negative datasets. The weight value is then further optimized by IGA.
- Selection of Weight: The Select Method:
We picked up two weights among all: the one that had the highest fitness value or the one selected by the Roulette method. These two scoring cards were used for crossover selection.
R is the value of cor relation coefficient (R-value) between the initial and the optimized propensity scores.
- AUC:
The Area Under ROC curves which are viewed as a way to evaluate the model built. The closer to 1 of the value is, the higher accuracy of the prediction model has.
- Roulette:
A choosing method to ensure the randomness even the higher fitness probably will be selected.
- IGA (intelligent genetic algorithm):
Cross Over Selection: A pair of parameters of the two weights are radomly choosed to exchange.
Optimization (developed by Shinn-Ying Ho): A creative method for large parameters optimization which the selection function has been designed to simplify the numbers of different parameter sets.
(For the algorithm in detail, please check out Peptide Prediction Model.)
Antifungal Database
In order to organize present antifungal data to a level of both high quantity and quality, we aggregated relative databases online and organized them to become a complete, useful and the largest antifungal database online.
Content:
- Connection of data: Hosts - Pathogens - Peptides
- Cross-match: Drug repurposing by the integration of databases
After we finished our prediction system, the next would be the integration of antifungal databases. There're several databases related to fungal infection on the internet yet lack of arrangement and integrity. The disorder of data would lead to the inconvenience for searching full information and end up to have the narrow- sighted absorbance of knowledge.
As a result, we planned to aggregate and organize all the relative data in different websites or databases to set up a complete antifungal database, reaching drug repurposing by cross-reference.
1. Connection of data
To focus on the problem we were dealing with, the fungal diseases in agriculture, there’re some factors related to the issue: hosts, pathogens, and antifungal peptides. Here's the data quantity we collected:
(1) hosts - pathogens : 514 (Phytopath / PHIbase)
(2) pathogens - peptides : 1334 (cAMP / PhytAMP)
(3) pathogens - peptides : 57 (paper searching)
By our processing, we have updated almost 300 peptides and found almost 70 new antifungal peptides.
2. Cross-match
After the data has been ordered and assembled by us, the quantity of data is even bigger than the original amounts of data before they gathered because of cross-reference. We call it the cross-match of data.
In the end, we set up our Parabase website, presenting the antifungal prediction system and validated antifungal peptides relative data relationships. Please check out the final presentation in Demonstration.
Results
- You can click here to view the demonstration - Parabase Website.
Here show the results of the peptide prediction.
For the antifungal database: the data amount we have collected
For the antifungal scoring system :
- The ROC curve and the results of test data
- Visualized antifungal scoring card
- Discussion of the relationships of dipeptides and active sites
For the achievement: the conclusion of what we’ve dedicated to humans
1.Antifungal Database (relative antifungal data)
(1)514 interactions between hosts and pathogens
(2)1334 experimentally validated antifungal peptides and their introductions
2.Antifungal Peptide Prediction System:
(1)The final ROC curve and the result of test datasets
Figure 1:
The test accuracy, the overall performance of classifying positive data as positive and negative data as negative, is 76%. The sensitivity, the performance of classifying positive data as positive, is 77%. The specitivity,
the performance of classifying negative data as negative, is 76%. The suitable threshold value is 354, peptides score higher than this value is considered as antifungal peptide.
(2)The score distribution between positive datasets and negative datasets
Figure 2: The score distribution between positive datasets and negative datasets
(3)Final antifungal scoring card (dipeptide score)
Figure 3: Final antifungal scoring card
3.Discussion
Figure 4: The bar graph above showed the single amino acid score calculated from each dipeptide score.
- Single Peptide Score Analysis:
By the score results, the top three amino acids are Cysteine(C), Glycine(G), and Lysine(K), and the five amino acids to have lowest scores are Aspartic acid(D), Glutamic acid(E), Serine(S), Threonine(T), Valine(V).
We interpreted the results as the following reasons:
There are many antifungal peptides for plants and mammals that contain lots of Cysteine, such as Thionins, plant defensins, and more. For Glycine, there are also many Glycine-rich peptides from Insect's antifungal peptides.
For the 5 peptides(D, E, S, T, V) of the lowest scores, four of them are hydrophilic, while most of the hydrophilic amino acids have a higher score (average score : 362.73 > threshold : 350).
Additionally, for the top 5 highest amino acids,Cysteine contains a sulfide functional group that can form disulfide bond, and Lysine(K) and Arginine(R) are easy to form hydrogen bond.
- 3D structure and active site:
To show the result of the scoring card, we visualized the peptides by drawing the dipeptide score on the peptide 3D structure. The region of a peptide become redder when the dipeptide score there is higher. Otherwise, the region becomes bluer when the dipeptide score there is lower.
By doing so , we can find the important region of an antifungal peptide.
We took Rs-AFP2 as an example. Rs-AFP2 was an antifungal peptide from the plant defensin family .
Figure 5:
This picture is the 3D rotating gif of the Rs-AFP2 with scoring card visualize score on the peptide.As you can see, the N terminal of the peptide(on the top) and the 3sheet are the reddest part of the peptide. To our scoring system based on the SCM, it indicated that these two regions are important regions that determined the whole peptide sequence as an antifungal peptide or not.
It seemed that the N term of the peptide and the 2sheet were the reddest. To our antifungal peptide prediction system based on the SCM, it indicated that these two regions were important regions that determined the full peptide sequence as an antifungal peptide or not.
Figure 6:
This is a 3D rotating gif picture of Rs-AFP2 peptide with red color labeled on it’s active region which found in the paper[1].According to this paper, showing that the major active site are between the β2 and β3 loop, from Ala32 and Phe49 and some activity was found in the N-terminal part of the protein.
To compare with papers, the paper showed that the active site are the β2−β3 loop, from Ala31 to Phe49, and some activities were found in the N-terminal part of the protein.
Comparing with the scoring card visualized picture and the real active site, we can find in the picture of score card the 3sheet and the N-termina were also labeled.
In conclusion, we can say that SCM might possess the ability to show antifungal active sites.
Achievement
We created a powerful database that helps iGEMers who aims to solve agricultural problems caused by fungus or even other disease cases by the framework. Our database has a convenient searching tool that can quickly find out effective antifungal peptides by searching host species or fungal pathogens. Our database also enables users to find out potential new antifungal peptides by applying the antifungal prediction system.
Reference
[1]W. M. M. Schaaper,Synthetic peptides derived from the β2−β3 loop of Raphanus sativus antifungal protein 2 that mimic the active site, http://onlinelibrary.wiley.com/doi/10.1034/j.1399-3011.2001.00842.x/full, 2001