|
|
Line 23: |
Line 23: |
| | | |
| </script> | | </script> |
| + | |
| <link rel="stylesheet" href="/wiki/index.php?title=Template:SYSU-Software/lib/semantic.min.css&action=raw&ctype=text/css"> | | <link rel="stylesheet" href="/wiki/index.php?title=Template:SYSU-Software/lib/semantic.min.css&action=raw&ctype=text/css"> |
| <link href="/wiki/index.php?title=Template:SYSU-Software/css/common.css&action=raw&ctype=text/css" rel="stylesheet" rel="stylesheet"> | | <link href="/wiki/index.php?title=Template:SYSU-Software/css/common.css&action=raw&ctype=text/css" rel="stylesheet" rel="stylesheet"> |
| + | <link |
| <link rel="stylesheet" href="https://2017.igem.org/wiki/index.php?title=Template:SYSU-Software/css/modeling.css&action=raw&ctype=text/css"> | | <link rel="stylesheet" href="https://2017.igem.org/wiki/index.php?title=Template:SYSU-Software/css/modeling.css&action=raw&ctype=text/css"> |
| </head> | | </head> |
Line 174: |
Line 176: |
| <li>When the user enter a new word , we search the similar key words along the KD Tree that we constructed in step 1) . (Online)</li> | | <li>When the user enter a new word , we search the similar key words along the KD Tree that we constructed in step 1) . (Online)</li> |
| </ol> | | </ol> |
− | <p>Note that KD Tree Algorithm is based on Euclidean distance measure , while in our case , we need to calculate the cosine distance of the word vectors. To solve this problem ,we normalize all the word vectors of key words before constructing KD Tree and normalize the word vector of new words offered by users before letting it travel along the constructed KD Tree. The reason behind the method is that by Law of Cosines:</p> | + | <p>Note that KD Tree Algorithm is based on Euclidean distance measure , while in our case , we need to calculate the cosine distance of the word vectors. To solve this problem ,we normalize all the word vectors of key words before constructing KD Tree and normalize the word vector of new words offered by users before letting it travel along the constructed KD Tree. The reason behind the method is that by Law of Cosines: |
− | <img src="https://static.igem.org/mediawiki/2017/6/6f/T--SYSU-Software--model_law-of-cos.png" alt="law of cosine" class="centered formula" id="law-of-cos-img"> | + | $$|x_1-x_2|=\sqrt{x_1^2+x_2^2-2|x_1||x_2|\cos(x_1,x_2)}\,,\,\,x_1,x_2\in R^n$$ |
− | <p>while since <code>x<sup>1</sup>,x<sup>2</sup></code> are normalized, we have :</p> | + | </p> |
− | <img src="https://static.igem.org/mediawiki/2017/a/a8/T--SYSU-Software--modeling_law-of-cos2.png" alt="law of cosine" class="centered formula" id="law-of-cos2-img"> | + | |
| + | <p>while since <code>x<sup>1</sup>,x<sup>2</sup></code> are normalized, we have : |
| + | $$|x_1-x_2| = \sqrt{1+1-2\cos(x_1,x_2)}\,,\,\,x_1,x_2\in R^n, |x_1|=|x_2|=1$$ |
| + | </p> |
| + | |
| <p>Now we see that the calculation of cosine distance between normalized vectors can be replaced by the calculation of Euclidean distance since they can determine each other in this normalization context.</p> | | <p>Now we see that the calculation of cosine distance between normalized vectors can be replaced by the calculation of Euclidean distance since they can determine each other in this normalization context.</p> |
| <p>In our Recommendation System , we use the KD Tree implementation in scikit-learn , a simple and efficient open source machine learning module in Python. For more details of KD Tree algorithm, see: <a href=" https://en.wikipedia.org/wiki/K-d_tree" target="_blank">wiki page of K-D Tree</a></p> | | <p>In our Recommendation System , we use the KD Tree implementation in scikit-learn , a simple and efficient open source machine learning module in Python. For more details of KD Tree algorithm, see: <a href=" https://en.wikipedia.org/wiki/K-d_tree" target="_blank">wiki page of K-D Tree</a></p> |
Line 204: |
Line 210: |
| <p><code>for</code> keyword <code>i</code> in <code>1 to N</code></p> | | <p><code>for</code> keyword <code>i</code> in <code>1 to N</code></p> |
| <h6>Define and initialize <code>v</code></h6> | | <h6>Define and initialize <code>v</code></h6> |
− | <p>Let <code>e<sub>i</sub></code> be the column vector that has 1 in the row for node <code>i</code> and 0’s elsewhere, i.e.</p> | + | <p>Let <code>e<sub>i</sub></code> be the column vector that has 1 in the row for node <code>i</code> and 0’s elsewhere, i.e. |
− | <img src="https://static.igem.org/mediawiki/2017/5/58/T--SYSU-Software--modeling_eij.png" alt="eij" id="eij-img" class="centered formula"> | + | $$e_{ij}=\delta_{ij}= |
| + | \begin{cases} |
| + | 1, & \text{if $i=j$} \\ |
| + | 0, & \text{if $i\neq j$} |
| + | \end{cases}$$ |
| + | </p> |
| + | |
| <p>And <code>e<sub>i</sub></code> is the column vector such that each item reflects the probability the walker stays at each node of the graph and the initial <code>v</code> is set to be <code>e<sub>i</sub></code>.</p> | | <p>And <code>e<sub>i</sub></code> is the column vector such that each item reflects the probability the walker stays at each node of the graph and the initial <code>v</code> is set to be <code>e<sub>i</sub></code>.</p> |
| <h6>Update <code>v</code> by</h6> | | <h6>Update <code>v</code> by</h6> |
− | <img src="https://static.igem.org/mediawiki/2017/2/2e/T--SYSU-Software--modeling_bmv.png" alt="bmv+(1-b)ei" id="bmv-img" class="centered formula"> | + | <p>$$v'=\beta M v+(1-\beta)e_N$$</p> |
− | <p>update code{v} until the terminal condition is satisfied, i.e.</p> | + | <p>update code{v} until the terminal condition is satisfied, i.e. |
− | <img src="https://static.igem.org/mediawiki/2017/e/e6/T--SYSU-Software--modeling_term-cond.png" alt="terminal condition" id="time-cond-img" class="centered formula"> | + | $$|v^{(t)}-v^{(t-1)}|<\epsilon,\,\,\text{for sometime $t$}$$ |
| + | </p> |
| + | |
| <h6>Modify <code>v</code></h6> | | <h6>Modify <code>v</code></h6> |
| <p>Since the first n components of <code>v</code> are the affiliation between key word <code>i</code> and the other key words which we are not interested , therefore we only keep the last m components of <code>v</code>, which reflects the affiliation between key word <code>i</code> and all the parts.</p> | | <p>Since the first n components of <code>v</code> are the affiliation between key word <code>i</code> and the other key words which we are not interested , therefore we only keep the last m components of <code>v</code>, which reflects the affiliation between key word <code>i</code> and all the parts.</p> |
Line 222: |
Line 236: |
| <h5>Train word vector model</h5> | | <h5>Train word vector model</h5> |
| <div class="paragraph"> | | <div class="paragraph"> |
− | <p>We feed our word vector model with Wikimedia corpus . After the training process is done, we have a model that can convert words into numerical vectors, i.e.</p> | + | <p>We feed our word vector model with Wikimedia corpus . After the training process is done, we have a model that can convert words into numerical vectors, i.e. |
− | <img src="https://static.igem.org/mediawiki/2017/c/c0/T--SYSU-Software--modeling_f-gamma-r.png" id="f-gamma-r-img" alt="" class="centered formula"> | + | $$f:\Gamma\rightarrow R^n$$ |
| + | </p> |
| <p>where Γ denoted the corpus space</p> | | <p>where Γ denoted the corpus space</p> |
| </div> | | </div> |
Line 242: |
Line 257: |
| </div> | | </div> |
| <h5>Transform users input</h5> | | <h5>Transform users input</h5> |
− | <p>We get users input word <code>k</code>, we then convert it into normalized word vector <code>v</code>, i.e.</p> | + | <p>We get users input word <code>k</code>, we then convert it into normalized word vector <code>v</code>, i.e. $$v=\frac{f(k)}{|f(k)|}$$</p> |
− | <img src="https://static.igem.org/mediawiki/2017/0/00/T--SYSU-Software--modeling_normalize-k.png" alt="" id="normalize-k-img" class="centered formula"> | + | |
| <h5>Search K most similar keywords(KNN)</h5> | | <h5>Search K most similar keywords(KNN)</h5> |
| <p>Let <code>v</code> travel along the KD Tree in a way like binary search , we can end up with <code>K</code>(in our case , <code>K</code> is set to be 5) most similar key words(KNN) to the current word <code>k</code>.</p> | | <p>Let <code>v</code> travel along the KD Tree in a way like binary search , we can end up with <code>K</code>(in our case , <code>K</code> is set to be 5) most similar key words(KNN) to the current word <code>k</code>.</p> |
| <h5>Make Recommendation</h5> | | <h5>Make Recommendation</h5> |
| <div class="paragraph"> | | <div class="paragraph"> |
− | <p>We have calculated the affiliation between key words and parts which shall be used to guide our final recommendation. Let <code>p<sub>i</sub></code> denotes the connection between the <code>i</code>th part and the current word <code>k</code> , we calculate <code>p<sub>i</sub></code> by formula</p> | + | <p>We have calculated the affiliation between key words and parts which shall be used to guide our final recommendation. Let <code>p<sub>i</sub></code> denotes the connection between the <code>i</code>th part and the current word <code>k</code> , we calculate <code>p<sub>i</sub></code> by formula $$p_i=\sum^K_{n=1}\frac{\text{Affiliation}(w_n,p_i)}{\text{distance}(w_n,k)+\text{bias}}$$ |
− | <img src="https://static.igem.org/mediawiki/2017/0/00/T--SYSU-Software--modeling_affiliation.png" alt="" id="affi-img" class="centered formula"> | + | </p> |
| + | |
| <p>where <img src="https://static.igem.org/mediawiki/2017/f/f1/T--SYSU-Software--modeling_wn.png" alt="" class="inlined-formula"> denotes the <code>k</code> most similar key words</p> | | <p>where <img src="https://static.igem.org/mediawiki/2017/f/f1/T--SYSU-Software--modeling_wn.png" alt="" class="inlined-formula"> denotes the <code>k</code> most similar key words</p> |
| <p><code>Affiliation(w<sub>n</sub>, p<sub>i</sub>)</code>denotes the connection between <code>w<sub>i</sub></code> and the <code>i</code>th part and <code>distance(w<sub>i</sub>, k)</code> denotes the the Euclidean distance between the normalized word vectors of <code>w<sub>n</sub></code> and word <code>k</code>.</p> | | <p><code>Affiliation(w<sub>n</sub>, p<sub>i</sub>)</code>denotes the connection between <code>w<sub>i</sub></code> and the <code>i</code>th part and <code>distance(w<sub>i</sub>, k)</code> denotes the the Euclidean distance between the normalized word vectors of <code>w<sub>n</sub></code> and word <code>k</code>.</p> |
Line 262: |
Line 278: |
| <h3>Model</h3> | | <h3>Model</h3> |
| <div class="paragraph"> | | <div class="paragraph"> |
− | <p>The interaction among the substances in genetic circuits can be characterized by a <code>n * n</code> relation matrix <code>R</code>, where <code>n</code> is number of substances. The entry in row <code>i</code> and column <code>j</code> has 3 possible values for 3 possible relations, i.e. </p> | + | <p>The interaction among the substances in genetic circuits can be characterized by a <code>n * n</code> relation matrix <code>R</code>, where <code>n</code> is number of substances. The entry in row <code>i</code> and column <code>j</code> has 3 possible values for 3 possible relations, i.e. $$ |
− | <img src="https://static.igem.org/mediawiki/2017/6/6c/T--SYSU-Software--model_rij.png" id="rij-img" alt="" class="centered formula"> | + | R_{i,j}= |
| + | \begin{cases} |
| + | 1 & \text{if $i$ promotes $j$} \\ |
| + | -1 & \text{if $i$ inhibits $j$} \\ |
| + | 0 & \text{No effect} |
| + | \end{cases} |
| + | $$</p> |
| + | |
| <p>Therefore , the <code>j</code>th column records all the impacts the system exerts to the <code>j</code>th substances.</p> | | <p>Therefore , the <code>j</code>th column records all the impacts the system exerts to the <code>j</code>th substances.</p> |
| <p>Let <code>x<sub>j</sub>(t)</code> denotes the concentration of substance <code>j</code> at time <code>t</code> , then we use the following ODE to characterize <code>x<sub>j</sub>(t)</code> for <img src="https://static.igem.org/mediawiki/2017/5/51/T--SYSU-Software--modeling_j-1-to-n.png" alt="" class="inlined-formula"></p> | | <p>Let <code>x<sub>j</sub>(t)</code> denotes the concentration of substance <code>j</code> at time <code>t</code> , then we use the following ODE to characterize <code>x<sub>j</sub>(t)</code> for <img src="https://static.igem.org/mediawiki/2017/5/51/T--SYSU-Software--modeling_j-1-to-n.png" alt="" class="inlined-formula"></p> |
Line 308: |
Line 331: |
| <script src="https://2017.igem.org/wiki/index.php?title=Template:SYSU-Software/lib/T--SYSU-Software--jquery.min.js&action=raw&ctype=text/javascript"></script> | | <script src="https://2017.igem.org/wiki/index.php?title=Template:SYSU-Software/lib/T--SYSU-Software--jquery.min.js&action=raw&ctype=text/javascript"></script> |
| <script src="https://2017.igem.org/wiki/index.php?title=Template:SYSU-Software/lib/semantic.min.js&action=raw&ctype=text/javascript"></script> | | <script src="https://2017.igem.org/wiki/index.php?title=Template:SYSU-Software/lib/semantic.min.js&action=raw&ctype=text/javascript"></script> |
| + | <script src="https://2017.igem.org/common/MathJax-2.5-latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script> |
| + | |
| </body> | | </body> |
| | | |
| </html> | | </html> |