Difference between revisions of "Team:NCTU Formosa/Peptide Prediction"

Revision as of 11:42, 21 October 2017

navigation

☰

MENU

Project

Disease Occurrence Prediction

Demonstration

Contribution

Improvement

Modeling

Peptide Prediction Model

Disease Occurrence Model

Wet Lab

Parts

Human Practice

Education and Public Engagement

Achievement

Team

Notebook

MENU

TEAM

PROJECT

PARTS

HUMAN PRACTICES

AWARDS

Peptide Prediction

Our database, a new genesis for Artificial Intelligence, strengthens the power of large datasets by the peptide prediction system on the basis of SCM with other optimization. The antifungal characteristic can be evaluated only by sequence analysis. Furthermore, we integrate all the relative data to form a complete antifungal database to achieve the query function of hosts, pathogens, and corresponding peptides. Combining two together, a novel database achieving both new drug discovery and old drug repurposing for antifungal peptides is born.

this paragraph are going to put some words that is the overview of the expanding context

For the prediction of our peptides, we integrated Scoring Card Method and modified to our antifungal scoring system. The major advantage of the method is its simplicity and acceptable accuracy.

SCM, based on Support Vector Machine (SVM), is a method originating from our instructor Shinn-Ying Ho. To measure the property of anti-fungus, we introduce SCM into our model to evaluate peptides’ antifungal functions with the perspective of biological information.

Dipeptide
The premise of this method is to hypothesize the function of peptides correspond to their sequences. We viewed two amino acids as a group to form the smallest functional unit, defined dipeptides. (圖)

We’re able to split a peptide into overlapping dipeptides and predict whether the peptide is antifungal by examining each dipeptide to be antifungal or not. A peptide that has more potentially antifungal dipeptides will more likely to be an antifungal peptide, vise versa. The total 400 individual dipeptide propensities are obtained by statistical discrimination between dipeptide composition of the antifungal peptides and non-antifungal peptides.

-Dipeptide Frequency & Score
Each dipeptide frequency (400 types) of each peptide multiplies the weight to get a score. (圖)

-Weight
The value of weight floats every round in the computing loop. The initial weight value for each dipeptide is the ratio of the dipeptide appearing in the positive datasets minus the ratio appearing in the negative datasets. (圖) Others to be the candidates in the IGA round are picked randomly.

-Selection of Weight
We will select two weights among all: the one that has the highest AUC or be selected by the Roulette method

--AUC: the Area Under ROC curves which is viewed as a way to evaluate the model built. The closer to 1 of the value is, the higher accuracy of the prediction model has. -

--Roulette: a choosing method to ensure the randomness even the higher fitness probably will be selected.

-IGA (intelligent genetics algorithm)
--Cross Over Selection
A pair of parameters of the two weights are radomly choosed to exchange.
--Optimization (developed by Shinn-Ying Ho)
a creative method for large parameters optimization which the selection function has been designed to simplify the numbers of different parameter sets

this paragraph are going to put some words that is the overview of the expanding context

After we finished our prediction system, the next would be the integration of antifungal databases. There're several databases related to fungal infection in the internet yet lack of arrangement and integrity. The disorder of data would lead to the inconvenience for searching full information and end up to have the narrow- sighted absorbance of knowledge.

As a result, we planned to aggregate all the relative data in different websites or databases to set up a complete anti-fungal database, reaching drug repurposing by cross-reference.

1. Connection of data
To focus on the problem we are dealing with, the fungal diseases in agriculture, there're some factors related to the issue: hosts, pathogens, and anti-peptides. Here's the data quantity we collect:
(1) hosts - pathogens : 514 (Phytopath / PHIbase)
(2) pathogens - peptides : 1525 (cAMP / PhytAMP)
(3) pathogens - peptides : 110 (paper searching) The relationships of them: hosts - pathogens - anti-fungal peptide have sorted out in the Parabase website we create. Users of the website can check either one option to search for others.

2. Cross-match
After the data has been ordered and assembled by us, the quantity of data is even bigger than the original amounts of data before they gather because of cross-reference. We call it the cross-match of data.
[ For instance, in Database A, only Pathogen Q can infect Plant R. In Database B, Pathogen S can infect Plant R. So the data to evaluate plant R when being infected by fungi have another choice now, and the solution anti-fungal peptides also become more because of the combination of databases.] (用示意圖表示)
Using this method, we have updated 150 data in cAMP and Phytopath for finding more target pathogens.

In the end, we set up our Parabase website. Please check out the details in Demonstration.

Untitled Document

@@ Line 1: / Line 1: @@
 {{NCTU Formosa/Navigation}}
+{{NCTU Formosa}}
 <html>