Difference between revisions of "Team:NCTU Formosa/Peptide Prediction"

Revision as of 06:39, 19 October 2017

navigation

☰

MENU

Project

Disease Occurrence Prediction

Demonstration

Contribution

Improvement

Modeling

Peptide Prediction Model

Disease Occurrence Model

Wet Lab

Parts

Human Practice

Education and Public Engagement

Achievement

Team

Notebook

Peptide Prediction

Our database, a new genesis for Artificial Intelligence, strengthens the power of large datasets by the peptide prediction system on the basis of SCM with other optimization. The antifungal characteristic can be evaluated only by sequence analysis. Furthermore, we integrate all the relative data to form a complete antifungal database to achieve the query function of hosts, pathogens, and corresponding peptides. Combining two together, a novel database achieving both new drug discovery and old drug repurposing for antifungal peptides is born.

this paragraph are going to put some words that is the overview of the expanding context

For the prediction of our peptides, we integrated Scoring Card Method and modified to our antifungal scoring system. The major advantage of the method is its simplicity and acceptable accuracy.

SCM, based on Support Vector Machine (SVM), is a method originating from our instructor Shinn-Ying Ho. To measure the property of anti-fungus, we introduce SCM into our model to evaluate peptides’ antifungal functions with the perspective of biological information.

Dipeptide
The premise of this method is to hypothesize the function of peptides correspond to their sequences. We viewed two amino acids as a group to form the smallest functional unit, defined dipeptides. (圖)

We’re able to split a peptide into overlapping dipeptides and predict whether the peptide is antifungal by examining each dipeptide to be antifungal or not. A peptide that has more potentially antifungal dipeptides will more likely to be an antifungal peptide, vise versa. The total 400 individual dipeptide propensities are obtained by statistical discrimination between dipeptide composition of the antifungal peptides and non-antifungal peptides.

-Dipeptide Frequency & Score
Each dipeptide frequency (400 types) of each peptide multiplies the weight to get a score. (圖)

-Weight
The value of weight floats every round in the computing loop. The initial weight value for each dipeptide is the ratio of the dipeptide appearing in the positive datasets minus the ratio appearing in the negative datasets. (圖) Others to be the candidates in the IGA round are picked randomly.

-Selection of Weight
We will select two weights among all: the one that has the highest AUC or be selected by the Roulette method

--AUC: the Area Under ROC curves which is viewed as a way to evaluate the model built. The closer to 1 of the value is, the higher accuracy of the prediction model has. -

--Roulette: a choosing method to ensure the randomness even the higher fitness probably will be selected.

-IGA (intelligent genetics algorithm)
--Cross Over Selection
A pair of parameters of the two weights are radomly choosed to exchange.
--Optimization (developed by Shinn-Ying Ho)
a creative method for large parameters optimization which the selection function has been designed to simplify the numbers of different parameter sets

this paragraph are going to put some words that is the overview of the expanding context

After we finished our prediction system, the next would be the integration of antifungal databases. There're several databases related to fungal infection in the internet yet lack of arrangement and integrity. The disorder of data would lead to the inconvenience for searching full information and end up to have the narrow- sighted absorbance of knowledge.

As a result, we planned to aggregate all the relative data in different websites or databases to set up a complete anti-fungal database, reaching drug repurposing by cross-reference.

1. Connection of data
To focus on the problem we are dealing with, the fungal diseases in agriculture, there're some factors related to the issue: hosts, pathogens, and anti-peptides. Here's the data quantity we collect:
(1) hosts - pathogens : 514 (Phytopath / PHIbase)
(2) pathogens - peptides : 1525 (cAMP / PhytAMP)
(3) pathogens - peptides : 110 (paper searching) The relationships of them: hosts - pathogens - anti-fungal peptide have sorted out in the Parabase website we create. Users of the website can check either one option to search for others.

2. Cross-match
After the data has been ordered and assembled by us, the quantity of data is even bigger than the original amounts of data before they gather because of cross-reference. We call it the cross-match of data.
[ For instance, in Database A, only Pathogen Q can infect Plant R. In Database B, Pathogen S can infect Plant R. So the data to evaluate plant R when being infected by fungi have another choice now, and the solution anti-fungal peptides also become more because of the combination of databases.] (用示意圖表示)
Using this method, we have updated 150 data in cAMP and Phytopath for finding more target pathogens.

In the end, we set up our Parabase website. Please check out the details in Demonstration.

Untitled Document

@@ Line 207: / Line 207: @@
 <body>
-     <div class="des_head"> <img src="https://static.igem.org/mediawiki/2017/d/d4/Description_head.png" width="100%"> </div>
+     <div class="ptp_head"><img src="https://static.igem.org/mediawiki/2017/d/d7/Ptp_head.png" width="100%"> </div>
-     <div class="first_line"><img src="https://static.igem.org/mediawiki/2017/3/31/Hyperproject_line.png" width="24%"></div>
+     <div>
-    <div class="hyperlink">
+        <div class="first_line"><img src="https://static.igem.org/mediawiki/2017/3/31/Hyperproject_line.png" width="24%"></div>
-        <a href="https://2017.igem.org/Team:NCTU_Formosa/Description"><img src="https://static.igem.org/mediawiki/2017/a/a5/Hyperproject_des.png"   id="first"></a>
+        <div class="hyperlink">
+            <a href="https://2017.igem.org/Team:NCTU_Formosa/Description"><img src="https://static.igem.org/mediawiki/2017/a/a5/Hyperproject_des.png"   id="first"></a>
-        <a href="#"><img src="https://static.igem.org/mediawiki/2017/9/95/Hyperproject_design.png"  id="second"></a>
+            <a href="#"><img src="https://static.igem.org/mediawiki/2017/9/95/Hyperproject_design.png"  id="second"></a>
-        <a href="https://2017.igem.org/Team:NCTU_Formosa/Peptide_Prediction"><img src="https://static.igem.org/mediawiki/2017/1/1e/Hyperproject_ptpredic.png"  id="third"></a>
+            <a href="https://2017.igem.org/Team:NCTU_Formosa/Peptide_Prediction"><img src="https://static.igem.org/mediawiki/2017/1/1e/Hyperproject_ptpredic.png"  id="third"></a>
-        <a href="#"><img src="https://static.igem.org/mediawiki/2017/7/70/Hyperproject_sgpredic.png"  id="fourth"></a>
+            <a href="#"><img src="https://static.igem.org/mediawiki/2017/7/70/Hyperproject_sgpredic.png"  id="fourth"></a>
-        <a href="#"><img src="https://static.igem.org/mediawiki/2017/f/fc/Hyperproject_demo.png"  id="fifth"></a>
+            <a href="#"><img src="https://static.igem.org/mediawiki/2017/f/fc/Hyperproject_demo.png"  id="fifth"></a>
-        <a href="#"><img src="https://static.igem.org/mediawiki/2017/0/01/Hyperproject_contri.png"  id="sixth"></a>
+            <a href="#"><img src="https://static.igem.org/mediawiki/2017/0/01/Hyperproject_contri.png"  id="sixth"></a>
-        <a href="#"><img src="https://static.igem.org/mediawiki/2017/b/b5/Hyperproject_improve.png"  id="seventh"></a>
+            <a href="#"><img src="https://static.igem.org/mediawiki/2017/b/b5/Hyperproject_improve.png"  id="seventh"></a>
+        </div>
+        <div class="second_line"><img src="https://static.igem.org/mediawiki/2017/3/31/Hyperproject_line.png" width="24%"></div>
      </div>
-    <div class="second_line"><img src="https://static.igem.org/mediawiki/2017/3/31/Hyperproject_line.png" width="24%"></div>
      <div class="ov">
-         <div class="ov_head"> <img src="https://static.igem.org/mediawiki/2017/1/10/Description_ov_word.png" width="25%"> </div>
+         <div class="ov_head"> <img src="https://static.igem.org/mediawiki/2017/8/81/Ptp_ov.png" width="250em"> </div>
-         <div>
+         <p>Our database, a new genesis for Artificial Intelligence, strengthens the power of large datasets by the peptide prediction system on the basis of SCM with other optimization. The antifungal characteristic can be evaluated only by sequence analysis.
-            <p>We establish and optimize a powerful system that is simple while accurate to predict peptides’ function, which is based on the Scoring Card Method (SCM). Instead of heavy computation and complex operation, the SCM achieves to analyze the functions
+            Furthermore, we integrate all the relative data to form a complete antifungal database to achieve the query function of hosts, pathogens, and corresponding peptides. Combining two together, a novel database achieving both new drug discovery
-                of peptides with peptide sequences only.<br><br> Furthermore, we aggregate all the relative data to fulfill the integration of antifungal databases, which build the connection among the data of hosts, pathogens, and corresponding peptides.
+            and old drug repurposing for antifungal peptides is born. </p>
-                In addition, we are the first in iGEM history that not only constructed the system but also validated our prediction system with the wet web.<br><br> Moreover, IoT talk realized the application to gather weather information in farmland
+    </div>
-                 and predict the possibility of spore germination with cloud computing. A completion is done by NCTU_Formosa that carry out the solution of surviving in explosive information in the 21st century – Parabase, exact and fast!</p>
+    <div class="SCM"> <img id="SCM_pic" src="https://static.igem.org/mediawiki/2017/4/48/Ptp_SCM.png">
+        <div class="show">
+            <p> this paragraph are going to put some words that is the overview of the expanding context </p>
+            <div><img class="show_pic" src="https://static.igem.org/mediawiki/2017/4/4f/Ptp_hide.png" style="display:block; margin:auto;"></div>
+        </div>
+        <div class="hide">
+            <p>
+                For the prediction of our peptides, we integrated Scoring Card Method and modified to our antifungal scoring system. The major advantage of the method is its simplicity and acceptable accuracy.<br>
+                <br> SCM, based on Support Vector Machine (SVM), is a method originating from our instructor Shinn-Ying Ho. To measure the property of anti-fungus, we introduce SCM into our model to evaluate peptides’ antifungal functions with the perspective
+                of biological information.<br>
+                 <br> Dipeptide
+                <br>The premise of this method is to hypothesize the function of peptides correspond to their sequences. We viewed two amino acids as a group to form the smallest functional unit, defined dipeptides. (圖)<br>
+                <br> We’re able to split a peptide into overlapping dipeptides and predict whether the peptide is antifungal by examining each dipeptide to be antifungal or not. A peptide that has more potentially antifungal dipeptides will more likely
+                to be an antifungal peptide, vise versa. The total 400 individual dipeptide propensities are obtained by statistical discrimination between dipeptide composition of the antifungal peptides and non-antifungal peptides.<br>
+                <br> -Dipeptide Frequency & Score
+                <br>Each dipeptide frequency (400 types) of each peptide multiplies the weight to get a score. (圖)<br>
+                <br> -Weight
+                <br>The value of weight floats every round in the computing loop. The initial weight value for each dipeptide is the ratio of the dipeptide appearing in the positive datasets minus the ratio appearing in the negative datasets. (圖) Others
+                to be the candidates in the IGA round are picked randomly.<br>
+                <br> -Selection of Weight
+                <br>We will select two weights among all: the one that has the highest AUC or be selected by the Roulette method
+                <br>
+                <br> --AUC: the Area Under ROC curves which is viewed as a way to evaluate the model built. The closer to 1 of the value is, the higher accuracy of the prediction model has. -<br>
+                <br> --Roulette: a choosing method to ensure the randomness even the higher fitness probably will be selected.<br>
+                <br> -IGA (intelligent genetics algorithm)
+                <br> --Cross Over Selection
+                <br> A pair of parameters of the two weights are radomly choosed to exchange.
+                <br> --Optimization (developed by Shinn-Ying Ho)
+                <br>a creative method for large parameters optimization which the selection function has been designed to simplify the numbers of different parameter sets</p>
+            <div><img class="hide_pic" src="https://static.igem.org/mediawiki/2017/c/cb/Ptp_show.png" style="display:block; margin:auto;"></div>
          </div>
      </div>
-     <div class="line"><img src="https://static.igem.org/mediawiki/2017/6/6f/Description_line.png" width="90%"></div>
+     <div class="Database"> <img id="Database_pic" src="https://static.igem.org/mediawiki/2017/2/28/Ptp_Database.png">
+        <div class="show2">
-    <div class="mot">
+            <p> this paragraph are going to put some words that is the overview of the expanding context </p>
-        <div class="mot_head"> <img src="https://static.igem.org/mediawiki/2017/7/70/Description_mot_word.png" width="25%"> </div>
+            <div><img class="show2_pic" src="https://static.igem.org/mediawiki/2017/4/4f/Ptp_hide.png" style="display:block; margin:auto;"></div>
-         <div>
+         </div>
-             <p>Fungal diseases are crises in Taiwan which cause two-thirds of the economic loss of Taiwan’s agriculture. The method broadly used to eliminate fungal diseases is to apply chemical pesticides or to abandon entire farmlands.<br><br> Now, bio-pesticide
+        <div class="hide2">
-                might seem to be a good choice since it avoids all the drawbacks chemical one impacts that have caused great damage to the environment.<br><br> Yet, we are in the era of explosive information. How to find peptides with the right functions
+             <p> After we finished our prediction system, the next would be the integration of antifungal databases. There're several databases related to fungal infection in the internet yet lack of arrangement and integrity. The disorder of data would lead
-                 you are looking for from tons of unorganized data both effectively and accurately? It is like seeking a precious pearl in a vast ocean. Typically, a protein function analysis involves complicated calculation including template detection,
+                to the inconvenience for searching full information and end up to have the narrow- sighted absorbance of knowledge.<br>
-                 alignment, or 3D modeling.<br></p>
+                <br>As a result, we planned to aggregate all the relative data in different websites or databases to set up a complete anti-fungal database, reaching drug repurposing by cross-reference.<br>
-             <div class="mot_pic">
+                <br>1. Connection of data
-                <img src="https://static.igem.org/mediawiki/2017/b/b4/Description_mot_pic.png" width="60%">
+                <br>To focus on the problem we are dealing with, the fungal diseases in agriculture, there're some factors related to the issue: hosts, pathogens, and anti-peptides. Here's the data quantity we collect:
-            </div>
+                 <br>(1) hosts - pathogens : 514 (Phytopath / PHIbase)
-            <p>This year, we proudly announce that we are the first one to analyze by amino sequences only and consolidate our prediction with the wet lab to cure fungal diseases. </p>
+                <br>(2) pathogens - peptides : 1525 (cAMP / PhytAMP)
+                <br>(3) pathogens - peptides : 110 (paper searching) The relationships of them: hosts - pathogens - anti-fungal peptide have sorted out in the Parabase website we create. Users of the website can check either one option to search for others.<br>
+                <br>2. Cross-match
+                <br>After the data has been ordered and assembled by us, the quantity of data is even bigger than the original amounts of data before they gather because of cross-reference. We call it the cross-match of data.
+                <br>[ For instance, in Database A, only Pathogen Q can infect Plant R. In Database B, Pathogen S can infect Plant R. So the data to evaluate plant R when being infected by fungi have another choice now, and the solution anti-fungal peptides
+                 also become more because of the combination of databases.] (用示意圖表示)
+                <br>Using this method, we have updated 150 data in cAMP and Phytopath for finding more target pathogens.<br>
+                <br>In the end, we set up our Parabase website. Please check out the details in Demonstration.
+            </p>
+             <div><img class="hide2_pic" src="https://static.igem.org/mediawiki/2017/c/cb/Ptp_show.png" style="display:block; margin:auto;"></div>
          </div>
      </div>
-    <div class="footer"></div>
 </body>
 </html>
 {{NCTU_Formosa/Footer}}