Difference between revisions of "Team:NCTU Formosa/Peptide Prediction"

 
(47 intermediate revisions by 5 users not shown)
Line 10: Line 10:
 
     <script src="project_peptide_prediction.js" type="text/javascript"></script>
 
     <script src="project_peptide_prediction.js" type="text/javascript"></script>
 
     <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no, minimum-scale=1.0, maximum-scale=1.0" />
 
     <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no, minimum-scale=1.0, maximum-scale=1.0" />
 +
 
     <style>
 
     <style>
 
body {
 
body {
Line 41: Line 42:
 
     margin: 10px 0 0;
 
     margin: 10px 0 0;
 
}
 
}
 +
 +
h4{
 +
    text-align: center;
 +
    margin-top: 0px;
 +
    margin-bottom: 50px !important;
 +
}
 +
  
 
@font-face {
 
@font-face {
Line 65: Line 73:
 
     margin-bottom: 10px;
 
     margin-bottom: 10px;
 
     line-height: 1em;
 
     line-height: 1em;
 +
}
 +
 +
.latex{
 +
    font-size: 20px;
 
}
 
}
  
Line 71: Line 83:
 
     margin: 10vh 10vw 0vh;
 
     margin: 10vh 10vw 0vh;
 
}
 
}
 +
/*----------------------------------------------------------------------------*/
 +
/*----------------------------------------------------------------------------*/
  
 
.hyperlink{
 
.hyperlink{
Line 93: Line 107:
 
#first{
 
#first{
 
position: absolute;
 
position: absolute;
left: 6vw;
+
left: 0.1vw;
top: 3.1vmax;
+
top: 3.5vmax;
width: 20vw;
+
width: 23vw;
 
     -webkit-transition: all 0.5 ease;
 
     -webkit-transition: all 0.5 ease;
 
     -moz-transition: all 0.5 ease;
 
     -moz-transition: all 0.5 ease;
Line 105: Line 119:
 
#second{
 
#second{
 
position:absolute;
 
position:absolute;
left: 17.1vw;
+
left: 12.6vw;
top: 7vmax;
+
top: 8vmax;
width: 20vw;
+
width: 23vw;
 
     -webkit-transition: all 0.5 ease;
 
     -webkit-transition: all 0.5 ease;
 
     -moz-transition: all 0.5 ease;
 
     -moz-transition: all 0.5 ease;
Line 117: Line 131:
 
#third{
 
#third{
 
position: absolute;
 
position: absolute;
left: 27vw;
+
left: 24vw;
top: 1.7vmax;
+
top: 2.5vmax;
width: 20vw;
+
width: 23vw;
 
     -webkit-transition: all 0.5 ease;
 
     -webkit-transition: all 0.5 ease;
 
     -moz-transition: all 0.5 ease;
 
     -moz-transition: all 0.5 ease;
Line 129: Line 143:
 
#fourth{
 
#fourth{
 
position: absolute;
 
position: absolute;
left: 40vw;
+
left: 38vw;
width: 20vw;
+
width: 23vw;
 
     -webkit-transition: all 0.5 ease;
 
     -webkit-transition: all 0.5 ease;
 
     -moz-transition: all 0.5 ease;
 
     -moz-transition: all 0.5 ease;
Line 140: Line 154:
 
#fifth{
 
#fifth{
 
position: absolute;
 
position: absolute;
left: 53vw;
+
left: 52vw;
top: 7.1vmax;
+
top: 8.5vmax;
width: 20vw;
+
width: 23vw;
 
     -webkit-transition: all 0.5 ease;
 
     -webkit-transition: all 0.5 ease;
 
     -moz-transition: all 0.5 ease;
 
     -moz-transition: all 0.5 ease;
Line 152: Line 166:
 
#sixth{
 
#sixth{
 
position: absolute;
 
position: absolute;
left: 64.7vw;
+
left: 65vw;
top: 2.9vmax;
+
top: 3.5vmax;
width: 20vw;
+
width: 23vw;
 
     -webkit-transition: all 0.5 ease;
 
     -webkit-transition: all 0.5 ease;
 
     -moz-transition: all 0.5 ease;
 
     -moz-transition: all 0.5 ease;
Line 164: Line 178:
 
#seventh{
 
#seventh{
 
position:absolute;
 
position:absolute;
left: 72.7vw;
+
left: 74.2vw;
top: 6.8vmax;
+
top: 8.5vmax;
width: 20vw;
+
width: 23vw;
 
     -webkit-transition: all 0.5 ease;
 
     -webkit-transition: all 0.5 ease;
 
     -moz-transition: all 0.5 ease;
 
     -moz-transition: all 0.5 ease;
Line 176: Line 190:
  
 
#first:hover{
 
#first:hover{
     width: 23vw;
+
     width: 25vw;
     left: 4.5vw;
+
     left: -0.9vw;
 
}
 
}
  
 
#second:hover{
 
#second:hover{
     width: 23vw;
+
     width: 25vw;
     left: 15.6vw;
+
     left: 11.6vw;
 
}
 
}
  
 
#third:hover{
 
#third:hover{
     width: 23vw;
+
     width: 25vw;
     left: 25.5vw;
+
     left: 23vw;
 
}
 
}
  
 
#fourth:hover{
 
#fourth:hover{
     width: 23vw;
+
     width: 25vw;
     left: 38.5vw;
+
     left: 37vw;
 
}
 
}
  
 
#fifth:hover{
 
#fifth:hover{
     width: 23vw;
+
     width: 25vw;
     left: 51.5vw;
+
     left: 52vw;
 
}
 
}
  
 
#sixth:hover{
 
#sixth:hover{
     width: 23vw;
+
     width: 25vw;
     left: 63.2vw;
+
     left: 65vw;
 
}
 
}
  
 
#seventh:hover{
 
#seventh:hover{
     width: 23vw;
+
     width: 25vw;
     left: 71.2vw;
+
     left: 73.2vw;
 
}
 
}
 
 
 
 
/*----------------------------------------------------------------------------*/
 
/*----------------------------------------------------------------------------*/
 
/*----------------------------------------------------------------------------*/
 
/*----------------------------------------------------------------------------*/
Line 272: Line 283:
 
.sublist{
 
.sublist{
 
     position: relative;
 
     position: relative;
     left: 20px;
+
     width: 70vw;
     margin-bottom: -10px;
+
     margin: 0 5vw;
 
}
 
}
  
Line 282: Line 293:
 
}
 
}
  
.sub_text>h2{
+
.sub_text>h2, .hide3>h2{
 
     margin-bottom: 0px;
 
     margin-bottom: 0px;
 
     margin-top: 50px;
 
     margin-top: 50px;
 
}
 
}
 +
 +
 +
 +
.sub_text>p{
 +
    width: 60vw;
 +
    position: relative;
 +
    margin: auto;
 +
}
 +
 
/*----------------------------------------------------------------------------*/
 
/*----------------------------------------------------------------------------*/
 
/*----------------------------------------------------------------------------*/
 
/*----------------------------------------------------------------------------*/
Line 366: Line 386:
 
});
 
});
  
 +
    </script>
 +
 +
    <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js" type="text/javascript">
 +
        MathJax.Hub.Config({
 +
            extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
 +
            jax: ["input/TeX", "output/HTML-CSS"],
 +
            tex2jax: {
 +
                inlineMath: [
 +
                    ['$', '$'],
 +
                    ["\\(", "\\)"]
 +
                ],
 +
                displayMath: [
 +
                    ['$$', '$$'],
 +
                    ["\\[", "\\]"]
 +
                ],
 +
            },
 +
            "HTML-CSS": {
 +
                availableFonts: ["TeX"]
 +
            }
 +
        });
 
     </script>
 
     </script>
 
      
 
      
Line 374: Line 414:
  
  
     <div class="first_line"><img src="https://static.igem.org/mediawiki/2017/3/31/Hyperproject_line.png" width="24%"></div>
+
     <div>
    <div class="hyperlink">
+
        <div class="first_line"><img src="https://static.igem.org/mediawiki/2017/3/31/Hyperproject_line.png" width="24%"></div>
        <a href="https://2017.igem.org/Team:NCTU_Formosa/Description"><img src="https://static.igem.org/mediawiki/2017/a/a5/Hyperproject_des.png"  id="first"></a>
+
        <div class="hyperlink">
 +
            <a href="https://2017.igem.org/Team:NCTU_Formosa/Description"><img src="https://static.igem.org/mediawiki/2017/a/a5/Hyperproject_des.png"  id="first"></a>
  
        <a href="https://2017.igem.org/Team:NCTU_Formosa/Applied_Design"><img src="https://static.igem.org/mediawiki/2017/9/95/Hyperproject_design.png"  id="second"></a>
+
            <a href="https://2017.igem.org/Team:NCTU_Formosa/Applied_Design"><img src="https://static.igem.org/mediawiki/2017/9/95/Hyperproject_design.png"  id="second"></a>
  
        <a href="https://2017.igem.org/Team:NCTU_Formosa/Peptide_Prediction"><img src="https://static.igem.org/mediawiki/2017/1/1e/Hyperproject_ptpredic.png"  id="third"></a>
+
            <a href="https://2017.igem.org/Team:NCTU_Formosa/Peptide_Prediction"><img src="https://static.igem.org/mediawiki/2017/1/1e/Hyperproject_ptpredic.png"  id="third"></a>
  
        <a href="https://2017.igem.org/Team:NCTU_Formosa/Disease_Occurrence_Prediction"><img src="https://static.igem.org/mediawiki/2017/7/70/Hyperproject_sgpredic.png"  id="fourth"></a>
+
            <a href="https://2017.igem.org/Team:NCTU_Formosa/Disease_Occurrence_Prediction"><img src="https://static.igem.org/mediawiki/2017/7/70/Hyperproject_sgpredic.png"  id="fourth"></a>
  
        <a href="https://2017.igem.org/Team:NCTU_Formosa/Demonstrate"><img src="https://static.igem.org/mediawiki/2017/f/fc/Hyperproject_demo.png"  id="fifth"></a>
+
            <a href="https://2017.igem.org/Team:NCTU_Formosa/Demonstrate"><img src="https://static.igem.org/mediawiki/2017/f/fc/Hyperproject_demo.png"  id="fifth"></a>
  
        <a href="https://2017.igem.org/Team:NCTU_Formosa/Contribution"><img src="https://static.igem.org/mediawiki/2017/0/01/Hyperproject_contri.png"  id="sixth"></a>
+
            <a href="https://2017.igem.org/Team:NCTU_Formosa/Contribution"><img src="https://static.igem.org/mediawiki/2017/0/01/Hyperproject_contri.png"  id="sixth"></a>
  
        <a href="https://2017.igem.org/Team:NCTU_Formosa/Improve"><img src="https://static.igem.org/mediawiki/2017/b/b5/Hyperproject_improve.png"  id="seventh"></a>
+
            <a href="https://2017.igem.org/Team:NCTU_Formosa/Improve"><img src="https://static.igem.org/mediawiki/2017/b/b5/Hyperproject_improve.png"  id="seventh"></a>
    </div>
+
        </div>
    <div class="second_line"><img src="https://static.igem.org/mediawiki/2017/3/31/Hyperproject_line.png" width="24%"></div>
+
        <div class="second_line"><img src="https://static.igem.org/mediawiki/2017/3/31/Hyperproject_line.png" width="24%"></div>
 
     </div>
 
     </div>
  
Line 404: Line 445:
 
                 </div>
 
                 </div>
 
                 <p>
 
                 <p>
                     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Our database, a new genesis for Artificial Intelligence, strengthens the power of large datasets by the antifungal peptide prediction system on the basis of SCM with other optimization. The antifungal characteristic can be
+
                     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Our database, a new genesis for Artificial Intelligence, strengthens the power of large datasets by the antifungal peptide prediction system on the basis of SCM with other optimization. The antifungal characteristic can be evaluated and interpreted only by sequence analysis.
                    evaluated and interpreted only by sequence analysis.
+
 
                 </p>
 
                 </p>
  
 
                 <p>
 
                 <p>
                     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Furthermore, we integrated all the relative data to form a complete antifungal database to achieve the query function of hosts, pathogens, and corresponding peptides. Combining two together, a novel Parabase database achieving
+
                     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Furthermore, we integrated all the relative data to form a complete antifungal database to achieve the query function of hosts, pathogens, and corresponding peptides. Combining two together, a novel Parabase database achieving both new drug discovery and old drug repurposing for antifungal peptides is born.
                    both new drug discovery and old drug repurposing for antifungal peptides is born.
+
 
                 </p>
 
                 </p>
  
Line 423: Line 462:
  
 
                     <p>
 
                     <p>
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;In order to evaluate peptide functions in a quicker and smarter way, we introduced SCM into making our antifungal peptide prediction system. With this applicable and interpretable tool, we are able to find potential target
+
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;In order to evaluate peptide functions in a quicker and smarter way, we introduced SCM into making our antifungal peptide prediction system. With this applicable and interpretable tool, we are able to find potential target peptides in a large number of unknown peptides, making the best use of vast data.
                        peptides in a large number of unknown peptides, making the best use of vast data.
+
 
                     </p>
 
                     </p>
                     <p class="sublist">Content:</p>
+
                     <div class="sublist">
                    <ol>
+
                        <h2 style="position: relative; top: 10px;">Content:</h2>
                        <li>Datasets</li>
+
                        <ol style="font-size: 1.3em">
                        <li>The concept of the dipeptide and the weight</li>
+
                            <li>Datasets</li>
                        <li>IGA</li>
+
                            <li>The concept of the dipeptide and the weight</li>
                    </ol>
+
                            <li>Intelligent Genetic Algorithm</li>
 +
                        </ol>
 +
                    </div>
  
 
                     <div><img class="show_pic" src="https://static.igem.org/mediawiki/2017/4/4f/Ptp_hide.png" style="display:block; margin:auto;"></div>
 
                     <div><img class="show_pic" src="https://static.igem.org/mediawiki/2017/4/4f/Ptp_hide.png" style="display:block; margin:auto;"></div>
Line 437: Line 477:
 
                 <div class="hide">
 
                 <div class="hide">
  
                     <p>
+
                     <p style="margin-top: 50px;">
 
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For the prediction of our peptides, we integrated Scoring Card Method and modified to our antifungal peptide prediction system. The major advantage of the method is its simplicity, interpretability, and acceptable accuracy.
 
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For the prediction of our peptides, we integrated Scoring Card Method and modified to our antifungal peptide prediction system. The major advantage of the method is its simplicity, interpretability, and acceptable accuracy.
 
                     </p>
 
                     </p>
  
 
                     <p>
 
                     <p>
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;SCM, based on Support Vector Machine (SVM), is a method originating from our instructor Shinn-Ying Ho. To measure the property of anti-fungus, we introduced SCM into our model to evaluate peptides’ antifungal functions with
+
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;SCM, based on Support Vector Machine (SVM), is a method originating from our instructor Shinn-Ying Ho. To measure the property of anti-fungus, we introduced SCM into our model to evaluate peptides’ antifungal functions with the perspective of biological information.
                        the perspective of biological information.
+
 
                     </p>
 
                     </p>
  
 
                     <div class="sub_text">
 
                     <div class="sub_text">
                         <h2>Datasets</h2>
+
                         <h2>- Datasets:</h2>
  
 
                         <p>
 
                         <p>
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;We obtain our positive data from antifungal databases, such as cAMP, PhytAMP and papers we found in PubMed. We collected our negative data from peptides that are not annotated to be antifungal in Uniprot.
+
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;We obtain our positive data from antifungal databases, such as cAMP, PhytAMP and papers we found in PubMed. We collected our negative data from peptides that are not annotated to be antifungal in UniProt.
 
                         </p>
 
                         </p>
  
 
                         <p>
 
                         <p>
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;We created the train dataset and test dataset by reducing the sequence identity of positive data and negative data and divide them into two portion that each dataset has equal amount of positive and negative data.
+
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;We created the train dataset and test dataset by reducing the sequence identity of positive data and negative data and divide them into two portions that each dataset has an equal amount of positive and negative data.
 
                         </p>
 
                         </p>
  
                         <h2>Dipeptide</h2>
+
                         <h2>- Dipeptide:</h2>
  
 
                         <p>
 
                         <p>
Line 463: Line 502:
 
                         </p>
 
                         </p>
  
                         <p>這裡有一張夆昌提供的圖</p>
+
                         <img src="https://static.igem.org/mediawiki/2017/b/b9/Ptp_dipt.png" width="60%" style="display: block; margin: auto;">
  
 
                         <p>
 
                         <p>
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A peptide that has more potentially antifungal dipeptides will more likely to be an antifungal peptide, vise versa. The total 400 individual dipeptide propensities are obtained by statistical discrimination between dipeptide
+
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A peptide that has more potentially antifungal dipeptides will more likely to be an antifungal peptide, vise versa. The total 400 individual dipeptide propensities are obtained by statistical discrimination between dipeptide composition of the antifungal peptides and non-antifungal peptides.
                            composition of the antifungal peptides and non-antifungal peptides.
+
 
                         </p>
 
                         </p>
  
                         <h2>Dipeptide Frequency &amp; Score</h2>
+
                         <h2>- Dipeptide Frequency &amp; Score:</h2>
  
 
                         <p>
 
                         <p>
Line 476: Line 514:
 
                         </p>
 
                         </p>
  
                         <p>這裡有一張夆昌提供的圖</p>
+
                         <img src="https://static.igem.org/mediawiki/2017/1/12/Ptp_score.png" width="60%" style="display: block; margin: auto;">
  
                         <p>這裡有一組夆昌提供的式子</p>
+
                         <p style="margin-top: 20px;">
 
+
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The score is obtained by summing each dipeptide frequency of each peptide multiplies the weight to get a score.
                        <p>
+
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;where Lp and Ln represent total dipeptide numbers in positive dataset and negative dataset.
+
 
                         </p>
 
                         </p>
  
                         <p>這裡有一組夆昌提供的式子</p>
+
                         <h2>- Weight:</h2>
  
 
                         <p>
 
                         <p>
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The score is obtained by summing each dipeptide frequency (400 types) of each peptide multiplies the weight to get a score.
+
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The initial weight value for each dipeptide is the ratio of the dipeptide appearing in the positive datasets minus the ratio appearing in the negative datasets. The weight value is then further optimized by IGA.
 +
                           
 
                         </p>
 
                         </p>
  
                         <p>這裡有一張夆昌提供的圖</p>
+
                         <div class="latex">$$ Dipeptid\quad Propensity\quad Scores: $$</div>
 +
                        <div class="latex">$$ P(ij) - N(ij) $$</div>
  
                         <h2>Weight</h2>
+
                         <h2>- Selection of Weight: The Select Method:</h2>
  
 
                         <p>
 
                         <p>
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The value of weight floats every round in the computing loop. The initial weight value for each dipeptide is the ratio of the dipeptide appearing in the positive datasets minus the ratio appearing in the negative datasets.
+
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;We picked up two weights among all: the one that had the highest fitness value or the one selected by the Roulette method. These two scoring cards were used for crossover selection.
                            (圖) Others to be the candidates in the IGA round are picked randomly.
+
 
                         </p>
 
                         </p>
  
                         <p>
+
                         <img src="https://static.igem.org/mediawiki/2017/4/42/Ptp_probability.png" width="60%" style="display: block; margin: auto;">
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>Dipeptide propensity scores:</b>
+
                        </p>
+
  
                         <p>
+
                         <div class="latex">$$ Fitness\quad Value = 0.9 AUC + 0.1R $$</div>
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>DPS(ij) = P(ij) - N(ij)</b>
+
                        </p>
+
 
+
                        <h2>Selection of Weight: Pick up Method</h2>
+
  
 
                         <p>
 
                         <p>
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;We picked up two weights among all: the one that had the highest fitness value or the one selected by the Roulette method
+
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>R</b> is the value of cor relation coefficient (R-value) between the initial and the optimized propensity scores.
 
                         </p>
 
                         </p>
  
                         <p>
+
                         <h2>- AUC:</h2>
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Fitness value = 0.9AUC + 0.1R
+
                        </p>
+
  
 
                         <p>
 
                         <p>
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;R is the value of cor relation coefficient (R-value) between the initial and the optimized propensity scores.
+
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The Area Under ROC curves which are viewed as a way to evaluate the model built. The closer to 1 of the value is, the higher accuracy of the prediction model has.
 
                         </p>
 
                         </p>
  
                         <h2>AUC:</h2>
+
                         <h2>- Roulette: </h2>
 
+
                        <p>
+
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;the Area Under ROC curves which is viewed as a way to evaluate the model built. The closer to 1 of the value is, the higher accuracy of the prediction model has.
+
                        </p>
+
 
+
                        <h2>Roulette: </h2>
+
  
 
                         <p>
 
                         <p>
Line 533: Line 556:
 
                         </p>
 
                         </p>
  
                        <p>
 
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;IGA (intelligent genetic algorithm)
 
                        </p>
 
  
                         <p>
+
                         <h2>
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Cross Over Selection
+
                             - IGA (intelligent genetic algorithm):
                         </p>
+
                         </h2>
  
 
                         <p>
 
                         <p>
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; A pair of parameters of the two weights are radomly choosed to exchange.
+
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>Cross Over Selection:</b> A pair of parameters of the two weights are radomly choosed to exchange.
 
                         </p>
 
                         </p>
  
 
                         <p>
 
                         <p>
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Optimization (developed by Shinn-Ying Ho)
+
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>Optimization (developed by Shinn-Ying Ho):</b> A creative method for large parameters optimization which the selection function has been designed to simplify the numbers of different parameter
 +
                            sets.
 
                         </p>
 
                         </p>
  
                        <p>
+
                    </div>
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a creative method for large parameters optimization which the selection function has been designed to simplify the numbers of different parameter sets.
+
                    <div style="display: block; margin: auto;">
 +
                        <p style="text-align: center; margin-top: 50px;">
 +
                            (For the algorithm in detail, please check out <span><a href="https://2017.igem.org/Team:NCTU_Formosa/Model" target="_blank">Peptide Prediction Model</a></span>.)
 
                         </p>
 
                         </p>
 
 
                     </div>
 
                     </div>
                    <p>
 
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(For the algorithm in detail, please check out <span><a href="https://2017.igem.org/Team:NCTU_Formosa/Model" target="_blank">Peptide Prediction Model</a></span>.)
 
                    </p>
 
  
 
                     <div><img class="hide_pic" src="https://static.igem.org/mediawiki/2017/c/cb/Ptp_show.png" style="display:block; margin:auto;"></div>
 
                     <div><img class="hide_pic" src="https://static.igem.org/mediawiki/2017/c/cb/Ptp_show.png" style="display:block; margin:auto;"></div>
Line 570: Line 589:
 
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;In order to organize present antifungal data to a level of both high quantity and quality, we aggregated relative databases online and organized them to become a complete, useful and the largest antifungal database online.
 
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;In order to organize present antifungal data to a level of both high quantity and quality, we aggregated relative databases online and organized them to become a complete, useful and the largest antifungal database online.
 
                     </p>
 
                     </p>
                     <p class="sublist">Content:</p>
+
                     <div class="sublist">
                    <ol>
+
                        <h2 style="position: relative; top: 10px;">Content:</h2>
                        <li>Connection of data: hosts - pathogens - peptides</li>
+
                        <ol style="font-size: 1.3em">
                        <li>Cross-match: drug repurposing by the integration of databases</li>
+
                            <li>Connection of data: Hosts - Pathogens - Peptides</li>
                    </ol>
+
                            <li>Cross-match: Drug repurposing by the integration of databases</li>
 +
                        </ol>
 +
                    </div>
  
 
                     <div><img class="show2_pic" src="https://static.igem.org/mediawiki/2017/4/4f/Ptp_hide.png" style="display:block; margin:auto;"></div>
 
                     <div><img class="show2_pic" src="https://static.igem.org/mediawiki/2017/4/4f/Ptp_hide.png" style="display:block; margin:auto;"></div>
Line 580: Line 601:
 
                 <div class="hide2">
 
                 <div class="hide2">
  
                     <p>
+
                     <p style="margin-top: 50px;">
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;After we finished our prediction system, the next would be the integration of antifungal databases. There're several databases related to fungal infection in the internet yet lack of arrangement and integrity. The disorder
+
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;After we finished our prediction system, the next would be the integration of antifungal databases. There're several databases related to fungal infection on the internet yet lack of arrangement and integrity. The disorder of data would lead to the inconvenience for searching full information and end up to have the narrow- sighted absorbance of knowledge.
                        of data would lead to the inconvenience for searching full information and end up to have the narrow- sighted absorbance of knowledge.
+
 
                     </p>
 
                     </p>
  
Line 589: Line 609:
 
                     </p>
 
                     </p>
  
                     <div style="margin: 50px 0;">
+
                     <div class="sub_text">
                         <p>
+
                         <h2>
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1. Connection of data
+
                             1. Connection of data
                         </p>
+
                         </h2>
  
 
                         <p>
 
                         <p>
Line 616: Line 636:
 
                     </div>
 
                     </div>
  
                     <div style="margin-top: 50px;">
+
                     <div class="sub_text">
                         <p>
+
                         <h2>
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2. Cross-match
+
                             2. Cross-match
                         </p>
+
                         </h2>
  
 
                         <p>
 
                         <p>
Line 625: Line 645:
 
                         </p>
 
                         </p>
  
                         <p>用示意圖表示</p>
+
                         <img src="https://static.igem.org/mediawiki/2017/2/2f/Ptp_cross.png" width="40%" style="display: block; margin: auto;">
  
                         <p>
+
                         <p style="margin-bottom: 20px;">
 
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;In the end, we set up our Parabase website, presenting the antifungal prediction system and validated antifungal peptides relative data relationships. Please check out the final presentation in <span><a href="https://2017.igem.org/Team:NCTU_Formosa/Demonstrate" target="_blank">Demonstration</a></span>.
 
                             &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;In the end, we set up our Parabase website, presenting the antifungal prediction system and validated antifungal peptides relative data relationships. Please check out the final presentation in <span><a href="https://2017.igem.org/Team:NCTU_Formosa/Demonstrate" target="_blank">Demonstration</a></span>.
 
                         </p>
 
                         </p>
Line 639: Line 659:
 
             <div class="subtitle">
 
             <div class="subtitle">
 
                 <h6>Results</h6>
 
                 <h6>Results</h6>
                 <h5 style="font-size: 1.3em;">You can click here to view the <span><a href="https://2017.igem.org/Team:NCTU_Formosa/Demonstrate" target="_blank">demonstration</a></span> - <span><a href="http://web.it.nctu.edu.tw/~nctu_formosa/Parabase/" target="_blank">Parabase Website</a></span>.</h5>
+
                 <h5>- You can click here to view the <span><a href="https://2017.igem.org/Team:NCTU_Formosa/Demonstrate" target="_blank">demonstration</a></span> - <span><a href="http://web.it.nctu.edu.tw/~nctu_formosa/Parabase/" target="_blank">Parabase Website</a></span>.</h5>
 
             </div>
 
             </div>
 
             <div class="ptp_result">
 
             <div class="ptp_result">
Line 646: Line 666:
 
                     <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For the antifungal database: the data amount we have collected </p>
 
                     <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For the antifungal database: the data amount we have collected </p>
 
                     <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For the antifungal scoring system : </p>
 
                     <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For the antifungal scoring system : </p>
                     <ol style="margin-top: 0;">
+
                     <div class="sublist">
                        <li>The ROC curve and the results of test data</li>
+
                        <ol style="font-size: 1.3em">
                        <li>Visualized antifungal scoring card</li>
+
                            <li>The ROC curve and the results of test data</li>
                        <li>Discussion of the relationships of dipeptides and active sites</li>
+
                            <li>Visualized antifungal scoring card</li>
                    </ol>
+
                            <li>Discussion of the relationships of dipeptides and active sites</li>
 +
                        </ol>
 +
                    </div>
  
 
                     <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For the achievement: the conclusion of what we’ve dedicated to humans</p>
 
                     <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For the achievement: the conclusion of what we’ve dedicated to humans</p>
Line 657: Line 679:
 
                 <div class="hide3">
 
                 <div class="hide3">
  
                     <p>
+
                     <h2>
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1.Antifungal Database (relative antifungal data)
+
                         1.Antifungal Database (relative antifungal data)
                     </p>
+
                     </h2>
  
 
                     <p>
 
                     <p>
Line 669: Line 691:
 
                     </p>
 
                     </p>
  
                     <p>
+
                     <h2>
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2.Antifungal Peptide Prediction System
+
                         2.Antifungal Peptide Prediction System:
                     </p>
+
                     </h2>
  
 
                     <p>
 
                     <p>
Line 678: Line 700:
  
 
                     <img src="https://static.igem.org/mediawiki/2017/d/da/Ptp_result_photo1.png" width="60%" style="display: block; margin: auto;">
 
                     <img src="https://static.igem.org/mediawiki/2017/d/da/Ptp_result_photo1.png" width="60%" style="display: block; margin: auto;">
 +
                    <h4>Figure 1:<br> The test accuracy, the overall performance of classifying positive data as positive and negative data as negative, is 76%. The sensitivity, the performance of classifying positive data as positive, is 77%. The specitivity,
 +
                    the performance of classifying negative data as negative, is 76%. The suitable threshold value is 354, peptides score higher than this value is considered as antifungal peptide.</h4>
 +
  
                    <p>
 
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The test accuracy, the overall performance of classifying positive data as positive and negative data as negative, is 76%. The sensitivity, the performance of classifying positive data as positive, is 77%. The specitivity,
 
                        the performance of classifying negative data as negative, is 76%. The suitable threshold value is 354, peptides score higher than this value is considered as antifungal peptide.
 
                    </p>
 
  
 
                     <p>
 
                     <p>
Line 689: Line 710:
  
 
                     <img src="https://static.igem.org/mediawiki/2017/b/b6/Ptp_result_photo2.png" width="60%" style="display: block; margin: auto;">
 
                     <img src="https://static.igem.org/mediawiki/2017/b/b6/Ptp_result_photo2.png" width="60%" style="display: block; margin: auto;">
 +
                    <h4 style="margin-top: -20px;">Figure 2: The score distribution between positive datasets and negative datasets</h4>
  
 
                     <p>
 
                     <p>
 
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(3)Final antifungal scoring card (dipeptide score)
 
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(3)Final antifungal scoring card (dipeptide score)
 
                     </p>
 
                     </p>
 +
 +
                    <div class="latex">$$ \sum_{i=0}^{400} x_{i}\cdot w_{i}=score $$</div>
  
 
                     <img src="https://static.igem.org/mediawiki/2017/f/fb/Ptp_result_photo3.jpeg" width="60%" style="display: block; margin: auto;">
 
                     <img src="https://static.igem.org/mediawiki/2017/f/fb/Ptp_result_photo3.jpeg" width="60%" style="display: block; margin: auto;">
 +
                    <h4>Figure 3: Final antifungal scoring card</h4>
  
                     <p>
+
                     <h2>
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;算分數公式:
+
                         3.Discussion
                    </p>
+
                     </h2>
 
+
                    <p>
+
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3.Discussion
+
                     </p>
+
  
 
                     <img src="https://static.igem.org/mediawiki/2017/6/67/Ptp_result_photo4.png" width="60%" style="display: block; margin: auto;">
 
                     <img src="https://static.igem.org/mediawiki/2017/6/67/Ptp_result_photo4.png" width="60%" style="display: block; margin: auto;">
  
                     <p>
+
                     <h4 style=" text-align: center; margin-top: 0; margin-bottom: 50px;">
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The bar graph above showed the single amino acid score calculated from each dipeptide score.
+
                         Figure 4: The bar graph above showed the single amino acid score calculated from each dipeptide score.
                     </p>
+
                     </h4>
  
                     <p>
+
                     <div class="sub_text">
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;single peptide score analysis
+
                         <h2>
                    </p>
+
                            - Single Peptide Score Analysis:
 +
                        </h2>
  
                    <p>
+
                        <p>
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;By the score results, the top three amino acids are Cysteine(C), Glycine(G), and Lysine(K), and the five amino acids to have lowest scores are Aspartic acid(D), Glutamic acid(E), Serine(S), Threonine(T), Valine(V).
+
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;By the score results, the top three amino acids are Cysteine(C), Glycine(G), and Lysine(K), and the five amino acids to have lowest scores are Aspartic acid(D), Glutamic acid(E), Serine(S), Threonine(T), Valine(V).
                    </p>
+
                        </p>
  
                    <p>
+
                        <p>
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;We interpreted the results as the following reasons:
+
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;We interpreted the results as the following reasons:
                    </p>
+
                        </p>
  
                    <p>
+
                        <p>
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;There are many antifungal peptides for plants and mammals that contain lots of Cysteine , such as Thionins, plant defensins, and more. For Glycine, there are also many Glycine-rich peptides from Insect's antifungal peptides.
+
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;There are many antifungal peptides for plants and mammals that contain lots of Cysteine, such as Thionins, plant defensins, and more. For Glycine, there are also many Glycine-rich peptides from Insect's antifungal peptides.
                    </p>
+
                        </p>
  
                    <p>
+
                        <p>
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For the 5 peptides(D, E, S, T, V. ) of the lowest scores, four of them are hydrophilic, while most of the hydrophilic amino acids have a higher score (average score : 362.73 > threshold : 350).
+
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For the 5 peptides(D, E, S, T, V) of the lowest scores, four of them are hydrophilic, while most of the hydrophilic amino acids have a higher score (average score : 362.73 > threshold : 350).
                    </p>
+
                        </p>
  
                    <p>
+
                        <p>
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Additionally, for the top 5 highest amino acids,Cysteine contains a sulfide functional group that can form disulfide bond, and Lysine(K) and Arginine(R) are easy to form hydrogen bond.
+
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Additionally, for the top 5 highest amino acids,Cysteine contains a sulfide functional group that can form disulfide bond, and Lysine(K) and Arginine(R) are easy to form hydrogen bond.
                    </p>
+
                        </p>
  
                    <p>
+
                        <h2>
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3D structure and active site
+
                            - 3D structure and active site:
                    </p>
+
                        </h2>
  
                    <p>
+
                        <p>
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;To show the result of the scoring card, we visualized the peptides by drawing the dipeptide score on the peptide 3D structure. The region of a peptide become redder when the dipeptide score there is higher. Otherwise, the
+
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;To show the result of the scoring card, we visualized the peptides by drawing the dipeptide score on the peptide 3D structure. The region of a peptide become redder when the dipeptide score there is higher. Otherwise, the region becomes bluer when the dipeptide score there is lower.
                        region become bluer when the dipeptide score there is lower.
+
                        </p>
                    </p>
+
  
                    <p>
+
                        <p>
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;By doing so , we can find the important region of an antifungal peptide.
+
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;By doing so , we can find the important region of an antifungal peptide.
                    </p>
+
                        </p>
  
                    <p>
+
                        <p style="margin-bottom: 50px">
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;We took Rs-AFP2 as an example. Rs-AFP2 was an antifungal peptide from the plant defensin family .
+
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;We took Rs-AFP2 as an example. Rs-AFP2 was an antifungal peptide from the plant defensin family .
                    </p>
+
                        </p>
  
                    <img src="https://static.igem.org/mediawiki/2017/0/0e/Design_photo1.png" width="40%" style="display: block; margin: auto;">
+
                        <img src="https://static.igem.org/mediawiki/2017/1/1c/Design_photo1.gif" width="60%" style="display: block; margin: auto;">
 +
                        <h4>Figure 5: <br>This picture is the 3D rotating gif of the Rs-AFP2 with scoring card visualize score on the peptide.As you can see, the N terminal of the peptide(on the top) and the 3sheet are the reddest part of the peptide. To our scoring system based on the SCM, it indicated that these two regions are important regions that determined the whole peptide sequence as an antifungal peptide or not.</h4>
  
                    <p>
+
                        <p style="margin-top: 50px; margin-bottom: 50px">
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;It seemed that the N term of the peptide and the 2sheet were the reddest. To our antifungal peptide prediction system based on the SCM, it indicated that these two regions were important regions that determined the full peptide
+
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;It seemed that the N term of the peptide and the 2sheet were the reddest. To our antifungal peptide prediction system based on the SCM, it indicated that these two regions were important regions that determined the full peptide sequence as an antifungal peptide or not.
                        sequence as an antifungal peptide or not.
+
                        </p>
                    </p>
+
  
                    <img src="https://static.igem.org/mediawiki/2017/f/f9/Design_photo2.png" width="40%" style="display: block; margin: auto;">
+
                        <img src="https://static.igem.org/mediawiki/2017/f/f0/Design_photo2.gif" width="60%" style="display: block; margin: auto;">
 +
                        <h4>Figure 6: <br>This is a 3D rotating gif picture of Rs-AFP2 peptide with red color labeled on it’s active region which found in the paper<sup>[1]</sup>.According to this paper, showing that the major active site are between the β2 and β3 loop, from <i>Ala<sup>32</sup></i> and <i>Phe<sup>49</sup></i> and some activity was found in the N-terminal part of the protein.</h4>
  
                    <p>
+
                        <p style="margin-top: 50px;">
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;To compare with papers, the paper showed that the active site are the β2−β3 loop, from <i>Ala<sup>31</sup></i> to <i>Phe<sup>49</sup></i>, and some activities were found in the N-terminal part of the
+
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;To compare with papers, the paper showed that the active site are the β2−β3 loop, from <i>Ala<sup>31</sup></i> to <i>Phe<sup>49</sup></i>, and some activities were found in the N-terminal part
                        protein.
+
                            of the protein.
                    </p>
+
                        </p>
  
                    <p>
+
                        <p>
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Comparing with the scoring card visualized picture and the real active site, we can find in the picture of score card the 3sheet and the N-termina were also labeled.
+
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Comparing with the scoring card visualized picture and the real active site, we can find in the picture of score card the 3sheet and the N-termina were also labeled.
                    </p>
+
                        </p>
  
                    <p>
+
                        <p>
                        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;In conclusion, we can say that SCM might possess the ability to show antifungal active sites.
+
                            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;In conclusion, we can say that SCM might possess the ability to show antifungal active sites.
                    </p>
+
                        </p>
  
                     <h1>Achievement</h1>
+
                     </div>
 +
 
 +
                    <h2>Achievement</h2>
  
 
                     <p>
 
                     <p>
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;We created a powerful database that helps iGEMers who aims to solve agricultural problems caused by fungus or even other disease cases by the framework. Our database has a convenient searching tool that can quickly find out
+
                         &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;We created a powerful database that helps iGEMers who aims to solve agricultural problems caused by fungus or even other disease cases by the framework. Our database has a convenient searching tool that can quickly find out effective antifungal peptides by searching host species or fungal pathogens. Our database also enables users to find out potential new antifungal peptides by applying the antifungal prediction system.
                        effective antifungal peptides by searching host species or fungal pathogens. Our database also enables users to find out potential new antifungal peptides by applying the antifungal prediction system.
+
 
                     </p>
 
                     </p>
 +
  
 
                     <div><img class="hide3_pic" src="https://static.igem.org/mediawiki/2017/c/cb/Ptp_show.png" style="display:block; margin:auto;"></div>
 
                     <div><img class="hide3_pic" src="https://static.igem.org/mediawiki/2017/c/cb/Ptp_show.png" style="display:block; margin:auto;"></div>
Line 787: Line 811:
 
                 <h6>Reference</h6>
 
                 <h6>Reference</h6>
 
             </div>
 
             </div>
             <p>夆昌要補上</p>
+
             <p>[1]<small>W. M. M. Schaaper,Synthetic peptides derived from the β2−β3 loop of Raphanus sativus antifungal protein 2 that mimic the active site, http://onlinelibrary.wiley.com/doi/10.1034/j.1399-3011.2001.00842.x/full, 2001</small></p>
 
         </div>
 
         </div>
 
         <div id="fut"></div>
 
         <div id="fut"></div>

Latest revision as of 02:59, 2 November 2017

navigation

NCTU_Formosa: Peptide Prediction
Overview

     Our database, a new genesis for Artificial Intelligence, strengthens the power of large datasets by the antifungal peptide prediction system on the basis of SCM with other optimization. The antifungal characteristic can be evaluated and interpreted only by sequence analysis.

     Furthermore, we integrated all the relative data to form a complete antifungal database to achieve the query function of hosts, pathogens, and corresponding peptides. Combining two together, a novel Parabase database achieving both new drug discovery and old drug repurposing for antifungal peptides is born.

Antifungal Prediction System

     In order to evaluate peptide functions in a quicker and smarter way, we introduced SCM into making our antifungal peptide prediction system. With this applicable and interpretable tool, we are able to find potential target peptides in a large number of unknown peptides, making the best use of vast data.

Content:

  1. Datasets
  2. The concept of the dipeptide and the weight
  3. Intelligent Genetic Algorithm

     For the prediction of our peptides, we integrated Scoring Card Method and modified to our antifungal peptide prediction system. The major advantage of the method is its simplicity, interpretability, and acceptable accuracy.

     SCM, based on Support Vector Machine (SVM), is a method originating from our instructor Shinn-Ying Ho. To measure the property of anti-fungus, we introduced SCM into our model to evaluate peptides’ antifungal functions with the perspective of biological information.

- Datasets:

     We obtain our positive data from antifungal databases, such as cAMP, PhytAMP and papers we found in PubMed. We collected our negative data from peptides that are not annotated to be antifungal in UniProt.

     We created the train dataset and test dataset by reducing the sequence identity of positive data and negative data and divide them into two portions that each dataset has an equal amount of positive and negative data.

- Dipeptide:

     The premise of this method is to hypothesize the function of peptides correspond to their sequences. We viewed two amino acids as a group to form the smallest functional unit, defined dipeptides.

     A peptide that has more potentially antifungal dipeptides will more likely to be an antifungal peptide, vise versa. The total 400 individual dipeptide propensities are obtained by statistical discrimination between dipeptide composition of the antifungal peptides and non-antifungal peptides.

- Dipeptide Frequency & Score:

     Each dipeptide frequency (400 types) of each peptide multiplies the weight to get a score.

     The score is obtained by summing each dipeptide frequency of each peptide multiplies the weight to get a score.

- Weight:

     The initial weight value for each dipeptide is the ratio of the dipeptide appearing in the positive datasets minus the ratio appearing in the negative datasets. The weight value is then further optimized by IGA.

$$ Dipeptid\quad Propensity\quad Scores: $$
$$ P(ij) - N(ij) $$

- Selection of Weight: The Select Method:

     We picked up two weights among all: the one that had the highest fitness value or the one selected by the Roulette method. These two scoring cards were used for crossover selection.

$$ Fitness\quad Value = 0.9 AUC + 0.1R $$

     R is the value of cor relation coefficient (R-value) between the initial and the optimized propensity scores.

- AUC:

     The Area Under ROC curves which are viewed as a way to evaluate the model built. The closer to 1 of the value is, the higher accuracy of the prediction model has.

- Roulette:

     A choosing method to ensure the randomness even the higher fitness probably will be selected.

- IGA (intelligent genetic algorithm):

     Cross Over Selection: A pair of parameters of the two weights are radomly choosed to exchange.

     Optimization (developed by Shinn-Ying Ho): A creative method for large parameters optimization which the selection function has been designed to simplify the numbers of different parameter sets.

(For the algorithm in detail, please check out Peptide Prediction Model.)

Antifungal Database

     In order to organize present antifungal data to a level of both high quantity and quality, we aggregated relative databases online and organized them to become a complete, useful and the largest antifungal database online.

Content:

  1. Connection of data: Hosts - Pathogens - Peptides
  2. Cross-match: Drug repurposing by the integration of databases

     After we finished our prediction system, the next would be the integration of antifungal databases. There're several databases related to fungal infection on the internet yet lack of arrangement and integrity. The disorder of data would lead to the inconvenience for searching full information and end up to have the narrow- sighted absorbance of knowledge.

     As a result, we planned to aggregate and organize all the relative data in different websites or databases to set up a complete antifungal database, reaching drug repurposing by cross-reference.

1. Connection of data

     To focus on the problem we were dealing with, the fungal diseases in agriculture, there’re some factors related to the issue: hosts, pathogens, and antifungal peptides. Here's the data quantity we collected:

     (1) hosts - pathogens : 514 (Phytopath / PHIbase)

     (2) pathogens - peptides : 1334 (cAMP / PhytAMP)

     (3) pathogens - peptides : 57 (paper searching)

     By our processing, we have updated almost 300 peptides and found almost 70 new antifungal peptides.

2. Cross-match

     After the data has been ordered and assembled by us, the quantity of data is even bigger than the original amounts of data before they gathered because of cross-reference. We call it the cross-match of data.

     In the end, we set up our Parabase website, presenting the antifungal prediction system and validated antifungal peptides relative data relationships. Please check out the final presentation in Demonstration.

Results
- You can click here to view the demonstration - Parabase Website.

     Here show the results of the peptide prediction.

     For the antifungal database: the data amount we have collected

     For the antifungal scoring system :

  1. The ROC curve and the results of test data
  2. Visualized antifungal scoring card
  3. Discussion of the relationships of dipeptides and active sites

     For the achievement: the conclusion of what we’ve dedicated to humans

1.Antifungal Database (relative antifungal data)

     (1)514 interactions between hosts and pathogens

     (2)1334 experimentally validated antifungal peptides and their introductions

2.Antifungal Peptide Prediction System:

     (1)The final ROC curve and the result of test datasets

Figure 1:
The test accuracy, the overall performance of classifying positive data as positive and negative data as negative, is 76%. The sensitivity, the performance of classifying positive data as positive, is 77%. The specitivity, the performance of classifying negative data as negative, is 76%. The suitable threshold value is 354, peptides score higher than this value is considered as antifungal peptide.

     (2)The score distribution between positive datasets and negative datasets

Figure 2: The score distribution between positive datasets and negative datasets

     (3)Final antifungal scoring card (dipeptide score)

$$ \sum_{i=0}^{400} x_{i}\cdot w_{i}=score $$

Figure 3: Final antifungal scoring card

3.Discussion

Figure 4: The bar graph above showed the single amino acid score calculated from each dipeptide score.

- Single Peptide Score Analysis:

     By the score results, the top three amino acids are Cysteine(C), Glycine(G), and Lysine(K), and the five amino acids to have lowest scores are Aspartic acid(D), Glutamic acid(E), Serine(S), Threonine(T), Valine(V).

     We interpreted the results as the following reasons:

     There are many antifungal peptides for plants and mammals that contain lots of Cysteine, such as Thionins, plant defensins, and more. For Glycine, there are also many Glycine-rich peptides from Insect's antifungal peptides.

     For the 5 peptides(D, E, S, T, V) of the lowest scores, four of them are hydrophilic, while most of the hydrophilic amino acids have a higher score (average score : 362.73 > threshold : 350).

     Additionally, for the top 5 highest amino acids,Cysteine contains a sulfide functional group that can form disulfide bond, and Lysine(K) and Arginine(R) are easy to form hydrogen bond.

- 3D structure and active site:

     To show the result of the scoring card, we visualized the peptides by drawing the dipeptide score on the peptide 3D structure. The region of a peptide become redder when the dipeptide score there is higher. Otherwise, the region becomes bluer when the dipeptide score there is lower.

     By doing so , we can find the important region of an antifungal peptide.

     We took Rs-AFP2 as an example. Rs-AFP2 was an antifungal peptide from the plant defensin family .

Figure 5:
This picture is the 3D rotating gif of the Rs-AFP2 with scoring card visualize score on the peptide.As you can see, the N terminal of the peptide(on the top) and the 3sheet are the reddest part of the peptide. To our scoring system based on the SCM, it indicated that these two regions are important regions that determined the whole peptide sequence as an antifungal peptide or not.

     It seemed that the N term of the peptide and the 2sheet were the reddest. To our antifungal peptide prediction system based on the SCM, it indicated that these two regions were important regions that determined the full peptide sequence as an antifungal peptide or not.

Figure 6:
This is a 3D rotating gif picture of Rs-AFP2 peptide with red color labeled on it’s active region which found in the paper[1].According to this paper, showing that the major active site are between the β2 and β3 loop, from Ala32 and Phe49 and some activity was found in the N-terminal part of the protein.

     To compare with papers, the paper showed that the active site are the β2−β3 loop, from Ala31 to Phe49, and some activities were found in the N-terminal part of the protein.

     Comparing with the scoring card visualized picture and the real active site, we can find in the picture of score card the 3sheet and the N-termina were also labeled.

     In conclusion, we can say that SCM might possess the ability to show antifungal active sites.

Achievement

     We created a powerful database that helps iGEMers who aims to solve agricultural problems caused by fungus or even other disease cases by the framework. Our database has a convenient searching tool that can quickly find out effective antifungal peptides by searching host species or fungal pathogens. Our database also enables users to find out potential new antifungal peptides by applying the antifungal prediction system.

Reference

[1]W. M. M. Schaaper,Synthetic peptides derived from the β2−β3 loop of Raphanus sativus antifungal protein 2 that mimic the active site, http://onlinelibrary.wiley.com/doi/10.1034/j.1399-3011.2001.00842.x/full, 2001

Untitled Document