# Team:SYSU-Software/Project

<!DOCTYPE html> Project

## Synbio is just a S-Din away!

### SYSU-Software 2017

###### Achievement
• 2017 Best Software Project
• 2017 Best Applied Design Nominated
• 2017 Best Model Nominated
• 2017 Gold Medal

## Overview

S-Din (Search engine and Design platform for inspiration with Network analysis) is an integrated Search Engine and Design Platform, developed specifically for Synthetic Biology. Due to the exponential growth of published researches, many of them have never been fully exploited of their values. S-Din has been designed to integrate searching and designing seamlessly, finding exactly what users need and enabling users to edit simultaneously. What's more exciting about S-Din is that tree of words and various analyses were developed to inspire the user to generate a better idea. A simulation model was developed to help ensure the robustness of user’s design and wet-lab validation was conducted to validate its reliability. Our Github repo can be access from here.

## Description

#### WHAT IS S-Din?

S-Din integrated Search Engine and Design Platform was developed specifically for Synthetic Biology. With the power of our Algorithm, we provide a one-stop solution to start a project.

To be better,
We don’t have to create more.
We choose S-Din.
Make a better project with S-Din.

#### Features

##### Seamless Search & Design

The moment you’re inspired by one of the projects, you can edit it instantly. Combining different functional parts together to fulfill your need. Or you can create a de novo design.

##### Network analysis

We developed a series of algorithms to generate network composites of projects, parts, and keywords for analysis, aiming to help our users explore the treasures buried in the ocean of articles.

##### The Best Design Platform

The needs in Designing a Circuits were considered seriously to achieve the best user experience. Simple interaction logic, beautiful UI.

##### The tree of words

Simply by clicking the words, you will see the subordinated words. After few repeats, you will finally get the projects that you want. This function is designed to help specify your need, to help you construct your idea better.

##### Interaction Analysis

This function is implanted in the design platform which will tell you the potential parts interacting with the chosen one. Use your imagination, ponder how to utilize these interactions.

##### Simulation

Even though it’s difficult to develop a model that fits in all the circuits, it worths a shot. Our modeler developed a general model to allow users to simulate any circuits.

#### WHY S-Din?

##### Background

Biosciences research is now facing an increasingly severe problem that most works were ignored and not evaluated properly. Numbers of research articles are published every year. How to utilize previous works to aid the current problem should be considered by everyone.

This year, SYSU-Software is trying to ameliorate this situation in Synthetic Biology, exploring treasures buried in the ocean of projects.

To make a full use of our predecessors' geniuses and efforts in Synthetic Biology. We concluded two problems.

1. How to find the project you need much faster and more accurately?
2. How to integrate parts of the project you found into your own work?

Following the standardization in Synthetic Biology, we standardized all aspects of previous projects, no matter they were from published articles or iGEM projects, into a New data format to provide you a clearer, better view when using our search engine.

Network analysis: designed to uncover potential connections between projects, helping users to locate exactly what they need.

Customization:The most exciting part of S-Din is that searching and designing are seamless. Once you find a project that might be useful, you can start editing in design platform immediately. Or you can just simply create your own from scratch.

Extensible: To ensure the database can be updated, new information can be added manually or ‘Spider’(Web crawler).

#### Inspiration

iGEMers have been complaining that tackling obstacles in a project is doable, but deciding what to do is exasperating. Catching your muse is like grabbing swirling smoke.

We extend S-Din further, not only satisfying the need of starting a project but also guiding you to a clearer path.

Therefore, the tree of words and interaction analysis are developed to inspire users. Part of the reasons we have made searching and designing seamlessly is to enable you to start designing once you catch your eureka moment.

And here comes S-Din, to inspire you, to do more in Synthetic Biology.

DONT BE LIMITED, unleash your imagination.
Want to solve Energy Crisis with Carbon dioxide? Search for circuits that can take in Carbon Dioxide and try to combine an energy generated circuit together to create your own circuits.

Undoubtedly, Google is the most powerful search engine in the world. Let’s see what happen when Google meets S-Din.

## Applied Design

#### Introduce

Words from designer: Scientific research is facing a problem of how to utilize the previous works, some of them are ignored, some of them are underestimated. We‘ve always wanted to make a difference in Synthetic Biology. With the help of our search engine, our users are able to examine previous projects and view at a different angle, and will be inspired by it. The gap between searching and designing are now gone. Once you catch the muse, your design can be done instantly.

#### Database

We collect data from many channels, most of the previous projects data and all of the Parts data come from the iGEM. Projects published in articles are collected on various synthetic biology Journals. Usage permissions are granted by the publication groups.

All of these are stored in S-Din database so that we can satisfy the need of our users to design a Synthetic Biological project.

#### Specialty

Even though the exponential increase of published papers is a field-wide phenomenon no only limited in Biology, we try to provide an experimental and innovative solution specialized for Synthetic Biology instead of a general solution. S-Din is built specifically for Synthetic Biology, allowing you to view previous projects and work in a standardized manner. S-Din surpasses Google on the Specialty in Synthetic Biology, seamless searching and designing and its unique way to inspire users.

When you search on S-Din , we provide network analysis services to help find your targets faster, and also try to inspire you. Once you find your preferred project, you are able to edit it simultaneously on our design platform. More parts can be added by the search tools provided on the left, or you can just drag in circuits that you bookmarked into your favorite before. The design can be uploaded, downloaded or shared for collaborations between different accounts. Once you finish the design, you can run simulation program to check if it works well and then export it as plasmid and we will generate the DNA sequence for synthesis.

#### Algorithm

The users of our software are all innovative researchers who are interested in utilizing Synthetic Biology to tackle many different real-world problems. The purpose of our system is to recommend the most related genetic parts to the users based on users' research interest detected by our system.

The information we use to make recommendations is based on a database powered by NLP and Random Walk, which contains scores between each keyword and each genetic part. Formally, it is a matrix consisting of the number of keywords and the number of Parts. The element at the row and column is the score between the keyword and genetic Part that can reflect the connection between them. (Higher score suggests stronger connection).

The overall strategy of our system is Collaborative Filtering, i.e. we first search similar keywords in our database of the unknown word offered by users, then recommend genetic parts which are highly related to those keywords that are suggested for users.

How to find the most similar keywords in our dataset of a given unknown word in an accurate and efficient way? Here is our general solution: To quantify the semantic similarities between words accurately, we use techniques in Deep Learning to convert words into numerical vectors and the cosine similarities between each vector can represent the semantic similarities between words. To search the similar keyword efficiently, we use the KD Tree Algorithm, a speedy algorithm based on the binary tree, to implement the K Nearest Neighbors strategy.

#### Network Analysis

##### Search analysis

At scientific journals, keywords are often required for publishing articles. Keywords can be used for indexing or searching, and offer a faster way for readers to browse the content of a project and find their interests.

In fact, in 2015, iGEM try to collect some keywords by asking teams to submit.

This year, a new data format is created and its keywords enable our network analysis. First, We used Microsoft Text Analytics Service to extract keywords from wiki texts of previous projects. Only the most frequently used keywords will be focused. By using machine learning, those keywords were converted to word vectors base on a semantic database. A network profiling relationship among keywords, projects and parts data is built and stored in the S-Din database. When searching, related information will be provided based on the network analysis, which would spark a brand new idea unexpectedly.

##### Interaction analysis

When you use interaction analysis for a Part on the design platform, any other parts or protein that might interact with the chosen part will be displayed and can be added to the platform. We had built an interaction database ourselves, the sources of data were iGEM Registry and STRING Database. Scores for parts come from the STRING Database.

#### Simulation & Plasmid Design

S-Din allow our innovative users with fascinating idea to combine various genetic parts together to form a brand-new genetic circuits that may never exist before.To help them have a deeper understanding of how the genetic system that they created works, we construct an ODE system to simulate the dynamic behaviors of the genetic system mathematically. The main challenge of simulation is the uncertainty of genetic circuits since the users can construct them arbitrarily. Therefore we built a rather generic model capable of simulating various genetic circuits.

After your Design, you can choose to convert your circuits into Plasmids (Customizable) for further synthesis.

#### Techniques

As a web application. Django with Python 3 was choosen for back-end which is fast, strong, robust and friendly to developers; MySQL with Django models binding give us fantastic database support; Algorithms including Word2Vec, K-D Tree are also coded in Python 3 with the famous SciPy. For front-end, we build the view with customized Semantic UI, and control the logic behind beautiful widgets using jQuery. Some other open-source JavaScript libraries are also used to build the site, such as Chart.js for chart, jsPlumb for links in the design page, etc.. Advanced TypeScript was used to build the interesting game BioLab Rescue, hope you will enjoy it!

## Wet-Lab Validation

#### Overview

At S-Din, we want to help bio-builder design genetic circuit seamlessly and simply. After we finish building it, we carried out a validation experiment of this workflow. We tested the efficiency and reliability of S-Din in this wet lab validation, and the experimental data can also validate the simulation part in our software.

#### Motivation

In the recent few decades, due to the depletion of the ozone layer, the potential risks of skin cancer and cataract has been growing. Our journey with S-Din began with a curiosity of how to detect UV. But we didn’t have a clear idea and didn't know where to start. Luckily, S-Din can light up one's path.

#### Design our circuit

1. Setting up the interested research field: After logging in, we set up our interest. Because we were curious about UV detection, we chose DETECTION.

2. Browsing on S-Din: When surfing on S-Din, we found UV detection to be a hot area. Annual projects statistics is on the right side.

3. Searching the previous project: Standing on the shoulders of giants, we wondered what ideas our predecessors could give us. We searched for ‘UV detection'. With data analysis and score, we found ETH_Zurich 2012 and marked it.

4. Intelligent recommendation: Based on the database, S-Din recommended us to use a device from Colombia 2014. After browsing its information, we decided to accept the suggestion from S-Din.

5. Editing the circuit: We deleted some extra parts of the ETH_Zurich device because we decided to use only the UV sensor. The surveillance system showed that the safety level of our circuit is low risk (Upper right corner).

6. Simulation:We ran the mathematical models in our software. Picture.1 shows the result.

Picture 1. Our software’s simulation of AmilCP

Parameter explaination

$$\frac{\text{d}\,\text{[UVR8-TetR]}}{\text{d}\,t}=\alpha_1\frac{\text{[UVR8-TetR]}}{1+\text{[UV]}^k}-d_1\text{[UVR8-TetR]}$$ $$\frac{\text{d}\,\text{[Blue]}}{\text{d}\,t}=\beta_1{\frac{\text{[amilcp]}}{1+\text{[UVB-TetR]}^{a_1}}}\cdot\frac{1}{1+(\frac{h_1}{\text{[PSP]}})^{a_2}}-\text{d}_2\text{[Blue]}$$ $$\frac{\text{d}\,\text{[PsP3]}}{\text{d}\,t}=\gamma_1{\frac{\text{[PsP3]}}{1+\text{[UVB-TetR]}^{a_1}}}\cdot\frac{1}{1+(\frac{h_1}{\text{[PSP]}})^{a_2}}-\text{d}_3\text{[PsP3]}$$

7. The plasmid design: After finishing our design, we clicked on the bottom, then the shape of our plasmid came out. So we had a system that will produce AmilCP, a kind of blue protein when it detects UV.

#### Validation experiment

##### Confirmation

After we constructed two plasmids, Escherichia coli DH5α was transformed with plasmids and positive clones were identified by colony polymerase chain reaction (PCR) and restriction endonuclease digestion. Photo.1 and 2 show the confirmation result. Further confirmations were finished by sequencing.

Photo 1. After incubating for about 18 hours, the positive clone turned blue.

Photo 2. The Colony PCR showed the result of Transformation.

##### Experimental data

After the sequencing confirmation, Escherichia coli strain BL21 (DE3) was transformed with plasmids. We didn’t have Liquid culture systems with UV light, so we used plate cultivation. We placed the dish on a 10-15W weak intensity UV lamp at a distance of about 30 cm and took samples every few hours. 22 hours later, we got the blue bacteria moss (Photo.3 ). The bacteria had been broken by ultrasonic crusher, and then we used the OD588 to measure the concentration of the blue protein (Photo. 4).

Photo 3. One of the plates after exposure to UV for 22 hours.

Photo 4. The mixture of bacteria after broken by ultrasonic crusher.

##### Simulation validation and result analysis

But unfortunately, collecting bacteria from plate may cause some system error, and because of this, reproducibility of OD588 was not so good. Using it to validate our simulation is meaningless. Hence, we decided to collaborate with SCUT-China_A and used their project's data to prove the accuracy of our simulation. Picture.2 and 3 show the simulation result and the experimental result of their project.

Picture. 2 The simulation result of our software.

Picture. 3 The experiment result, data from SCUT-China_A.

Picture. 4 Our software simulation shows preferable performance.

Compared with experimental data of SCUT-China_A, our software simulation showed preferable performance (Picture.4). The result proved that our simulation works. After 7 hours, the experiment data was higher than predicted value, potentially caused by bacterial reproduction.

## Demonstration

### Installability test

we successfully installed S-Din on Windows and Linux system.

We recommend users to follow the installation tutor on our github repository

#### Hardware requirements

 Basic Requirements Highly Recommended Storage 20G 20G SSD Memory 4G 6G CPU Intel I5-2500k or better

#### Windows

##### Environment

Windows 10 Enterprise edition with 8GB RAM and Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz, 64-bit operating system

##### Result

The installation on Windows system is relatively easy. You just need to click initial.bat and runserver.bat and it will finish soon as the images show.

#### Linux

##### Environment

GNOME ver 3.26.1 with 15.6G RAM and Intel Core i7-4720HQ CPU @ 2.60GHZ x8, 64-bit operating system

#### Mac OS Sierra

##### Environment

MacBook Air(13-inch, Early 2015) with maxOS Sierra (Version 10.12.6), 2.2GHz Intel Core i7 and 8GB 1600 MHz DDR3.

#### Successful installation result

After you follow our installation tutor and successfully install S-Din, you can open our software in browser.

### Demo

In this part，we will illustrate how to use S-Din to search Synthetic biology projects and desin circuits.

#### 1

If you want to use the design platform, Register an account first. But no need for search engine.

#### 2

Search for projects, enter some words in search bar, projects will be presented to you according to your input.

#### 4

Click "Design" to switch to desgin function, here you can start you design.

#### 5

You can search for parts and add them to the design canvas, parts info are presented.

#### 6

If you don't know what is the best match between two parts, we privoide parts interaction information to help you make a choice.

#### 7

After add parts to the design canvas, you can add relationship between them.

### User studies

The best user experience makes the best software, even though we've done enough preparation and investigation before initiating this project. User study is also important because it helps us adjust the details in our software to make it more user-friendly.

It's pity that we can't expand the scale of our user study due to the long development period and lack of coders. But we're happy that they offer their precious time to help us test out software and give us positive feedbacks.

##### Shen Dong

research assistent who help us on our wet-lab validation
study in molecular mechanism of interaction between pathogen and host.

"It's considerate that you add safety surveillance system in your design platform. I wonder how many High-risk parts are there in your database? Yeah, may not many, but this is still necessary right?"

"From a researcher's view, i need to say the software still needs to be improved in order to better fit in the need of expertise. Though the current version is good enough."

##### Haoquan Zhao

Ph.D. Student of bioinformatics , Columbia University

"The engagement of search and design is designed in an elegant way, I have to say I like this design."

"The design platform is fluent and easy to use indeed， but I think it can be more professional or more biological? There are lots of processes needed to reflect on circuits of pathway sketch."

"Beautiful UI, how much money did your team pay the designers?"

##### Jiajin Li

PhD student, University of California, Los Angeles

"You make a great attempt at solving problems like avoiding simply repeating the work that's already been done. It's a nice solution to standardize the data from different sources."

"There are things you need to consider like the updating of your database and how to encourage others to use your software. Well done but keep on!"

### Future works

##### 1. Expand the scale of our database.

Projects in iGEM and literature field are grow at a remarkable speed, thus it's a responsible for us to update the data in our database regularly so that our users can catch the latest ideas and utilize into reproduction and testing. In next year, every team and publisher will be invited to redraw their genetic circuit on our design platform for seamless reutilization in future works, which is the most fastest and accurate way we can imagine for expanding the database. For display aspect, a wiki plug-in tool of S-Din will be developed so that all the teams next year can using the same editable tools in their wiki so that no more PS will be needed and also benefit us for gathering data from the first hand.

##### 2. Evaluating an uncarried-on project by Artificial Intelligence.

In the future, we are going to adding a crowdsourcing feature in our software. To evaluate the project, a reliable auto-judging system will be created. The evaluation system will not only consist of simulation results but also variety of aspects that may affect the reliability of a project to function well (not only for Competition Awards but also for TRUE scientific & social effect it has). Machine learning and other method in big data mining area will definitely be used in the coming feature.

##### 3. More validation experiments and feedbacks.

According to basic principles of engineering, iteration of the design-test-redesign circle for a product could never be enough. More efficient and user friendly features will be added during our further human practice in Synthetic Biology field. In our human practice section, we have demonstrate how we gain the idea of S-Din and how we enhanced it during the summer. We will do the same job for future improvement of S-Din, to make S-Din more popular and well-featured.

##### 4. Dynamic data flow for synthetic biology.

During the brain storm, we want to add the feature that can provide dynamic data (news, latest paper) which can automatically upgraded everyday. Due to the limitation of time, we didn't merge this function into our software. In the future, these feature will be also added.

##### 5. System optimization of the whole genome with the inserted circuits.

As a software team in synthetic biology, we felt responsible to build a software that can truly utilize the power of computer and algorithm to manage synthetic biologists to accomplish important and complexed tasks. After circuits getting used seamlessly from search results, an consideration of the genome level are going into our sight. As the electronic system, performance and connection can be optimized to a most efficient or robustest way.

S-Din is the first step of our ambition to solve real synthetic biology problem based on which further project will have a ground foundation of data collection.

And more

contact
sysusoftware@126.com