|
|
Line 1: |
Line 1: |
− | <html>
| + | |
− |
| + | |
− | <head>
| + | |
− | <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
| + | |
− |
| + | |
− | <title>Team:HFUT-China</title>
| + | |
− |
| + | |
− | <link href="https://2017.igem.org/Template:HFUT-China/Safety/css/headerCss?action=raw&ctype=text/css" rel="stylesheet" />
| + | |
− | <link href="https://2017.igem.org/Template:HFUT-China/Safety/css/stylewikiCss?action=raw&ctype=text/css" rel="stylesheet"
| + | |
− | />
| + | |
− | <link href="https://2017.igem.org/Template:HFUT-China/MainPage/css/styleCss?action=raw&ctype=text/css" rel="stylesheet" />
| + | |
− | <style>
| + | |
− | body {
| + | |
− | background: #f7f5e6;
| + | |
− | }
| + | |
− | </style>
| + | |
− | </head>
| + | |
− |
| + | |
− | <center>
| + | |
− | <div class="header" style="z-index:999;position:absolute">
| + | |
− | <ul class="nav">
| + | |
− | <li>
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China"> Main page </a>
| + | |
− | </li>
| + | |
− | <li>
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Description"> Project
| + | |
− | <img src="https://static.igem.org/mediawiki/2017/5/53/Xia.png" width="12px" ;> </a>
| + | |
− | <ul>
| + | |
− | <li class="selfnav hef">
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Description">
| + | |
− | <font class="hef">Description</font>
| + | |
− | </a>
| + | |
− | </li>
| + | |
− | <li class="selfnav hef">
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Design">
| + | |
− | <font class="hef">Design</font>
| + | |
− | </a>
| + | |
− | </li>
| + | |
− | <li class="selfnav hef">
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Contribution">
| + | |
− | <font class="hef">Contribution</font>
| + | |
− | </a>
| + | |
− | </li>
| + | |
− | <li class="selfnav hef">
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Demonstrate">
| + | |
− | <font class="hef">Demonstrate</font>
| + | |
− | </a>
| + | |
− | </li>
| + | |
− | </ul>
| + | |
− | </li>
| + | |
− | <li>
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Software"> Software
| + | |
− | <img src="https://static.igem.org/mediawiki/2017/5/53/Xia.png" width="12px" ;> </a>
| + | |
− | <ul>
| + | |
− | <li class="selfnav">
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Software">
| + | |
− | <font class="hef" color="#000">Software</font>
| + | |
− | </a>
| + | |
− | </li>
| + | |
− | <li class="selfnav">
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Model">
| + | |
− | <font class="hef" color="#000">Model</font>
| + | |
− | </a>
| + | |
− | </li>
| + | |
− | <li class="selfnav">
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Improve">
| + | |
− | <font class="hef" color="#000">Improve</font>
| + | |
− | </a>
| + | |
− | </li>
| + | |
− | </ul>
| + | |
− | </li>
| + | |
− | <li>
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Notebook"> Documents
| + | |
− | <img src="https://static.igem.org/mediawiki/2017/5/53/Xia.png" width="12px" ;> </a>
| + | |
− | <ul>
| + | |
− | <li class="selfnav">
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Notebook">
| + | |
− | <font class="hef" color="#000">Notebook</font>
| + | |
− | </a>
| + | |
− | </li>
| + | |
− | <li class="selfnav">
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Medals">
| + | |
− | <font class="hef" color="#000">Medals</font>
| + | |
− | </a>
| + | |
− | </li>
| + | |
− | <li class="selfnav">
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Safety">
| + | |
− | <font class="hef" color="#000">Safety</font>
| + | |
− | </a>
| + | |
− | </li>
| + | |
− | <li class="selfnav">
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/User_guide">
| + | |
− | <font class="hef" color="#000">User guide</font>
| + | |
− | </a>
| + | |
− | </li>
| + | |
− | </ul>
| + | |
− | </li>
| + | |
− | <li>
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Team">Team
| + | |
− | <img src="https://static.igem.org/mediawiki/2017/5/53/Xia.png" width="12px" ;>
| + | |
− | </a>
| + | |
− | <ul>
| + | |
− | <li class="selfnav"><a href="https://2017.igem.org/Team:HFUT-China/Team"><font class="hef" color="#000">Members &<br>Attributions</font></a></li>
| + | |
− | <li class="selfnav">
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Collaborations">
| + | |
− | <font class="hef" color="#000">Collaborations</font>
| + | |
− | </a>
| + | |
− | </li>
| + | |
− |
| + | |
− | </ul>
| + | |
− | </li>
| + | |
− | <li>
| + | |
− | <a href="#"> Human practice
| + | |
− | <img src="https://static.igem.org/mediawiki/2017/5/53/Xia.png" width="12px" ;> </a>
| + | |
− | <ul>
| + | |
− | <li class="selfnav">
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Silver">
| + | |
− | <font class="hef" color="#000">Silver HP</font>
| + | |
− | </a>
| + | |
− | </li>
| + | |
− | <li class="selfnav">
| + | |
− | <a href="https://2017.igem.org/Team:HFUT-China/Gold_Integrated">
| + | |
− | <font class="hef" color="#000">Integrated
| + | |
− | <br>and Gold</font>
| + | |
− | </a>
| + | |
− | </li>
| + | |
− | </ul>
| + | |
− | </li>
| + | |
− | </ul>
| + | |
− | </div>
| + | |
− |
| + | |
− | <div style="height:160px"></div>
| + | |
− | <div class="title">
| + | |
− | <b>
| + | |
− | <font color="#555555">Model</font>
| + | |
− | <!-- <div class="subtitle"><a href="http://47.93.11.157" target="blank" style=" text-decoration:none;"><font color="#555555"><br><br><br>Click <font color="#0089a7"><b>here</b></font> to use our software ~ :p</font></a></div> -->
| + | |
− |
| + | |
− | <div style="width: 76%">
| + | |
− | <div style="margin-top: 90px;font-family: Light; text-align: left;">
| + | |
− | <b>
| + | |
− | <font color="#333" size="6px" weight="bold">
| + | |
− | <br>
| + | |
− | <br>
| + | |
− | <br>1. Latent Dirichlet Allocation (LDA) model</font>
| + | |
− | </b>
| + | |
− | </div>
| + | |
− | </div>
| + | |
− |
| + | |
− | <div class="p">
| + | |
− | <div class="q" style="line-height: 2.5;">
| + | |
− | <br>For the information of all teams under each track, we tried to let our computers “understand” it, and
| + | |
− | automatically classify it into groups. Conventional LDA model is used to explore keywords of themes among
| + | |
− | documents, but here we regard it as an unsupervised classifier, and unsupervised means we don’t have
| + | |
− | to provide any manually labeled data. As a result it can give us clusters of documents, and documents
| + | |
− | in the same cluster have the same theme. The picture below better explains how LDA works.
| + | |
− | <br>
| + | |
− | <br>
| + | |
− | </div>
| + | |
− | <!-- <img src="https://static.igem.org/mediawiki/2017/a/ae/Index.png" width="79%" style="box-shadow: 0px 3px 19px #ddd;margin-top:30px;border-radius:20px;"> -->
| + | |
− | </div>
| + | |
− |
| + | |
− | <div style="width: 76%">
| + | |
− | <div style="margin-top: 90px;font-family: Light;text-align: left;">
| + | |
− | <b>
| + | |
− | <font color="#333" size="6px" weight="bold">
| + | |
− | <br>
| + | |
− | <br>2. TF-IDF model</font>
| + | |
− | </b>
| + | |
− | </div>
| + | |
− | </div>
| + | |
− |
| + | |
− | <div class="p">
| + | |
− | <div class="q" style="line-height: 2.5;">
| + | |
− | <br>TF-IDF refers to Term Frequency–Inverse Document Frequency. It is used in our system to excavate the
| + | |
− | keywords of a document. It consists of two parts TF value and IDF value. Primarily, we calculate the
| + | |
− | TF value for each document by simply counting how many times a word appears in the document. As for IDF
| + | |
− | value of word w_i, it is calculated according to the following formula:
| + | |
− | <center>
| + | |
− | <img src="https://static.igem.org/mediawiki/2017/f/fc/Formula0.png" width="60%" style="box-shadow: 0px 3px 19px #ddd;margin-top:60px;margin-bottom:20px;border-radius:20px;">
| + | |
− | </center>
| + | |
− | <br>The IDF value represents how general a word is, and the higher it is, the less the word is commonly seen.
| + | |
− | <br>
| + | |
− | <br> Finally, we combine the TF and IDF value by multiplying them. By doing this,
| + | |
− | we can filter off some the general words, and keywords are left as we expected.
| + | |
− | </div>
| + | |
− | </div>
| + | |
− |
| + | |
− | <div style="width: 76%;">
| + | |
− | <div style="margin-top: 90px;font-family: Light; text-align: left;">
| + | |
− | <b>
| + | |
− | <font color="#333" size="6px" family="he" weight="bold">3. Word2Vec</font>
| + | |
− | </b>
| + | |
− | </div>
| + | |
− | </div>
| + | |
− | <div class="p">
| + | |
− | <div class="q" style="line-height: 2.5;">
| + | |
− | <br>Word2Vec plays an important role in our system. Word Vector is an effective and promising substitute
| + | |
− | for the conventionally used one-hot encoding method in NLP (Natural Language Processing). About one-hot
| + | |
− | encoding, take a sentence “I love you so much” for example. We want to find a vector to represent each
| + | |
− | word in this sentence. What one-hot does is that it assigns “1” to the entry in this vector according
| + | |
− | to the word’s position in the sentence. For example, “love” in one-hot is “0 1 0 0 0” and “so” is “0
| + | |
− | 0 0 1 0”. Nonetheless, this kind of encoding does not contain the semantic meaning of a word. If we want
| + | |
− | to measure the semantic similarity between two words, unless these words are identical, the similarity
| + | |
− | will be zero. So researchers proposed “Word Vector”, which is a vector representing the word’s semantic
| + | |
− | meaning. It takes the context of the word into consideration, and the effect turns out to be really excellent.
| + | |
− | <br>
| + | |
− | <br>Thus, the similarity between two words can be easily measured using L2 norm. The whole process of calculating
| + | |
− | word vectors is through neural networks, and the detailed structure of it can be found here.
| + | |
− | <br>
| + | |
− | <br>
| + | |
− | <center><img src="https://static.igem.org/mediawiki/2017/0/0e/Word2vec.png" width="80%" style="box-shadow: 0px 3px 19px #ddd;margin-top:60px;margin-bottom:20px;border-radius:20px;"></center>
| + | |
− | <!-- <div>
| + | |
− | <script src="https://2017.igem.org/Template:HFUT-China/js/chartTest?action=raw&ctype=text/javascript" type="text/javascript"></script>
| + | |
− | </div> -->
| + | |
− | </div>
| + | |
− | </div>
| + | |
− |
| + | |
− | <div style="width: 76%">
| + | |
− | <div style="margin-top: 90px;font-family: Light; text-align: left;">
| + | |
− | <b>
| + | |
− | <font color="#333" size="6px" weight="bold">4. LSI (Latent Semantic Indexing)</font>
| + | |
− | </b>
| + | |
− | </div>
| + | |
− | </div>
| + | |
− |
| + | |
− | <div class="p">
| + | |
− | <div class="q" style="line-height: 2.5;">
| + | |
− | <br>Word Vector can only be used to explore the semantic meaning for a word, and if we try to measure the
| + | |
− | semantic distance between two documents, we will have to find another way. And that’s why we introduced
| + | |
− | LSI model. LSI uses SVD (Singular Value Decomposition) to find the latent similarity between documents.
| + | |
− | SVD can be thought as factorization in matrices version. For example, the number 12 can be decomposed
| + | |
− | to 2×2×3, and SVD does the same thing for matrices. Suppose that we have m documents and n total words.
| + | |
− | We decompose it as follows:
| + | |
− | <center>
| + | |
− | <img src="https://static.igem.org/mediawiki/2017/a/af/Formula1.png" width="79%" style="box-shadow: 0px 3px 19px #ddd;margin-top:60px;border-radius:20px;">
| + | |
− | </center>
| + | |
− | <br>Where A_(i,j) stands for the feature value, which is TF-IDF value of word j in document i generally.
| + | |
− | We regard U_i, which is the row vector of the matrix U to be the semantic value of document i. The similarity
| + | |
− | among documents i and j can be calculated using cosine similarity as the following expression:
| + | |
− | <center>
| + | |
− | <img src="https://static.igem.org/mediawiki/2017/0/03/Formula2.png" width="79%" style="box-shadow: 0px 3px 19px #ddd;margin-top:60px;border-radius:20px;">
| + | |
− | </center>
| + | |
− | <br>
| + | |
− | </div>
| + | |
− | </div>
| + | |
− |
| + | |
− | <div style="width: 76%">
| + | |
− | <div style="margin-top: 90px;font-family: Light; text-align: left;">
| + | |
− | <b>
| + | |
− | <font color="#333" size="6px" weight="bold">Reference</font>
| + | |
− | </b>
| + | |
− | </div>
| + | |
− | </div>
| + | |
− |
| + | |
− | <div class="p">
| + | |
− | <div class="q" style="line-height: 2.5;">
| + | |
− | <br>
| + | |
− | <a href="https://commons.wikimedia.org/w/index.php?curid=3610403">1.By Bkkbrad - Own work, GFDL,</a>
| + | |
− | <br>
| + | |
− | <img src="https://static.igem.org/mediawiki/2017/f/fa/Topic_model_scheme.gif"></img>
| + | |
− | <br>
| + | |
− | <br>
| + | |
− | </div>
| + | |
− | </div>
| + | |
− | </div>
| + | |
− | <div style="height:20%;"></div>
| + | |
− | <div class="footer">
| + | |
− | <div class="foot-icon">
| + | |
− | <a href="">
| + | |
− | <img src="https://static.igem.org/mediawiki/2017/5/51/Fb.png" class="icon">
| + | |
− | </a>
| + | |
− | <a href="">
| + | |
− | <img src="https://static.igem.org/mediawiki/2017/d/da/Emial.png" class="icon">
| + | |
− | </a>
| + | |
− | <a href="">
| + | |
− | <img src="https://static.igem.org/mediawiki/2017/5/51/Git.png" class="icon">
| + | |
− | </a>
| + | |
− | </div>
| + | |
− | </div>
| + | |
− | </center>
| + | |
− |
| + | |
− |
| + | |
− | </html>
| + | |