Team:SYSU-Software/Applied Design

<!DOCTYPE html> Project

Synbio is just a S-Din away!

SYSU-Software 2017

Applied Design

Introduce

Words from designer: Scientific research is facing a problem of how to utilize the previous works, some of them are ignored, some of them are underestimated. We‘ve always wanted to make a difference in Synthetic Biology. With the help of our search engine, our users are able to examine previous projects and view at a different angle, and will be inspired by it. The gap between searching and designing are now gone. Once you catch the muse, your design can be done instantly.

Database

We collect data from many channels, most of the previous projects data and all of the Parts data come from the iGEM. Projects published in articles are collected on various synthetic biology Journals. Usage permissions are granted by the publication groups.

All of these are stored in S-Din database so that we can satisfy the need of our users to design a Synthetic Biological project.

Specialty

Even though the exponential increase of published papers is a field-wide phenomenon no only limited in Biology, we try to provide an experimental and innovative solution specialized for Synthetic Biology instead of a general solution. S-Din is built specifically for Synthetic Biology, allowing you to view previous projects and work in a standardized manner. S-Din surpasses Google on the Specialty in Synthetic Biology, seamless searching and designing and its unique way to inspire users.

When you search on S-Din , we provide network analysis services to help find your targets faster, and also try to inspire you. Once you find your preferred project, you are able to edit it simultaneously on our design platform. More parts can be added by the search tools provided on the left, or you can just drag in circuits that you bookmarked into your favorite before. The design can be uploaded, downloaded or shared for collaborations between different accounts. Once you finish the design, you can run simulation program to check if it works well and then export it as plasmid and we will generate the DNA sequence for synthesis.

Algorithm

The users of our software are all innovative researchers who are interested in utilizing Synthetic Biology to tackle many different real-world problems. The purpose of our system is to recommend the most related genetic parts to the users based on users' research interest detected by our system.

The information we use to make recommendations is based on a database powered by NLP and Random Walk, which contains scores between each keyword and each genetic part. Formally, it is a matrix consisting of the number of keywords and the number of Parts. The element at the row and column is the score between the keyword and genetic Part that can reflect the connection between them. (Higher score suggests stronger connection).

The overall strategy of our system is Collaborative Filtering, i.e. we first search similar keywords in our database of the unknown word offered by users, then recommend genetic parts which are highly related to those keywords that are suggested for users.

How to find the most similar keywords in our dataset of a given unknown word in an accurate and efficient way? Here is our general solution: To quantify the semantic similarities between words accurately, we use techniques in Deep Learning to convert words into numerical vectors and the cosine similarities between each vector can represent the semantic similarities between words. To search the similar keyword efficiently, we use the KD Tree Algorithm, a speedy algorithm based on the binary tree, to implement the K Nearest Neighbors strategy.

Network Analysis

Search analysis

At scientific journals, keywords are often required for publishing articles. Keywords can be used for indexing or searching, and offer a faster way for readers to browse the content of a project and find their interests.

In fact, in 2015, iGEM try to collect some keywords by asking teams to submit.

This year, a new data format is created and its keywords enable our network analysis. First, We used Microsoft Text Analytics Service to extract keywords from wiki texts of previous projects. Only the most frequently used keywords will be focused. By using machine learning, those keywords were converted to word vectors base on a semantic database. A network profiling relationship among keywords, projects and parts data is built and stored in the S-Din database. When searching, related information will be provided based on the network analysis, which would spark a brand new idea unexpectedly.

Interaction analysis

When you use interaction analysis for a Part on the design platform, any other parts or protein that might interact with the chosen part will be displayed and can be added to the platform. We had built an interaction database ourselves, the sources of data were iGEM Registry and STRING Database. Scores for parts come from the STRING Database.

Simulation & Plasmid Design

S-Din allow our innovative users with fascinating idea to combine various genetic parts together to form a brand-new genetic circuits that may never exist before.To help them have a deeper understanding of how the genetic system that they created works, we construct an ODE system to simulate the dynamic behaviors of the genetic system mathematically. The main challenge of simulation is the uncertainty of genetic circuits since the users can construct them arbitrarily. Therefore we built a rather generic model capable of simulating various genetic circuits.

After your Design, you can choose to convert your circuits into Plasmids (Customizable) for further synthesis.

Techniques

As a web application. Django with Python 3 was choosen for back-end which is fast, strong, robust and friendly to developers; MySQL with Django models binding give us fantastic database support; Algorithms including Word2Vec, K-D Tree are also coded in Python 3 with the famous SciPy. For front-end, we build the view with customized Semantic UI, and control the logic behind beautiful widgets using jQuery. Some other open-source JavaScript libraries are also used to build the site, such as Chart.js for chart, jsPlumb for links in the design page, etc.. Advanced TypeScript was used to build the interesting game BioLab Rescue, hope you will enjoy it!

contact

sysusoftware@126.com

address

135# Xin'gang Rd(W.)
Sun Yat-sen University, Guangzhou, China

GET IN TOUCH