Implementation
Server Part
The server of Biohub 2.0 are written in Python 3.6, which is an easy-to-write and cross-platform language and can lower the learning curve of developing plugins. We choose Django to be the basic framework, partly for it allows fast prototyping and large-scale deployment, partly for it already contains a matured plugin system (Django App Framework). However such plugin system is static, which means plugins cannot be loaded/unloaded during runtime. It will be a great deficiency for a website that may frequently alternate its components. To solve the problem, we build our own plugin system on the basis of the one in Django. We carefully monkey-patched Django's underlying implementation to make its core components (URL resolving, data model registration, and etc.) dynamically changeable, whilst avoiding possible memory leaks. We also supplemented Django by adding many new features:
- Hot Reload: Other processes can send
SIGUSR1
signal to Biohub 2.0's worker processes to inform them that the installed plugin list has changed and the server needs to be reloaded. By this way the server can be renewed without stopping. - WebSocket Routing: Websocket is a protocol providing full-duplex communication channels over a single TCP connection, which is supported by most of modern browsers. Biohub 2.0 develops a customized protocol over Websocket, making it easier to route Websocket packets. We have used this protocol to achieve real-time notification sending and notifying accomplishment events in ABACUS plugin.
- Background Tasks: As a website designed for synthetic biology, their must be many computation intensive tasks, which will blocks requests if handled inappropriately. Thus we split out and encapsulate this logic and make it an independent module. Background tasks can use websocket to notify certain events. Currently we have used in ABACUS computing.
The modules are all available for plugins, so developers can take advantages of them to build amazing artworks.
Biohub 2.0 is constructed on the base of Biobricks data, which comes from two channels. We use the data downloaded from offical interface as our initial data. The initial data is actually a snapshot of iGEM, containing most information of the bricks. However certain fields, such as group name or parameters, are missing in it, so we complement it by crawling iGEM's web pages. Such behaviour is "lazy", which means it will not be invoked before deploying, but before a specific brick's data is accessed for the first time. Fetched data will be cached in the database, and be updated per 10 days to keep updated.
The initial data will be imported into a separate database (by default it is named igem
). Biohub will link to it by creating virtual tables using MySQL's database view technology. Using database views may lower the speed of querying, but it provides more flexibility for data upgrading. If newer initial is available, we can upgrade the database simply by reloading igem
database, rather than dropping and recreating the tables in production environment. Also it can prevent data redundancy. If multiple instances with different database configuration are deployed on the same machine, only one copy of initial data needs to exist in the database, saving the space of disks.
Before ranking, we have to filter the initial data, since there exists many apparently useless bricks on iGEM. Such process will occur before deploying. We simply drop those bricks without DNA sequence, and dumped the filtered data into a new table (by default it is named igem.parts_filtered
). At the same time we will pre-process some fields, such as extracting subparts information from edit cache, and pre-calculating some components of the ranking weight. You may refer to updateparts.py to see this process.
Then we will rank the bricks for the first time, using multiple statistical methods. You may refer to refreshweight.py to see such process. At this stage, the bricks are completely ranked by initial data, without any factors from Forum.
After the server starts up, we will recalculate the ranking weights every 30 minutes. The reason for not evaluating them in real time is that the task may update the whole table and become time-consuming. From now on, Biohub will gradually correct the deviation in bricks ranking.
ABACUS is a plugin inherited from last year's project. It consumes great amount of memory during executing, and causes the prossibility to break down the main server. Such accident did happen in testing phase of USTC-Software 2016. To avoid it, we improved ABACUS and enable it to run in two ways: locally or distributedly. If no available executable file is detected, ABACUS will connect to remote slave servers and distribute the computation tasks. This design can largely save the expense of master server, and makes ABACUS infinitely scalable.
The Quine-McCluskey algorithm is a method used for minimizing of boolean functions that was developed by W.V.Quine and extended by Edward J.McCluskey.
Espresso heuristic logic minimizer is a computer program using heuristic and specific algorithms for efficiently reducing the complexity of digital electronic gate circuits.
Because Espresso is more efficient than Q-M algorithm, especially when inputs are more than 6, we choose Espresso for our software.
NetworkX is a Python language software package for the creation, manipulation and study of the structure, dynamics and functions of complex networks.
We use directed graph to characterize biocircuit, and NetworkX has make it more convenient to operate with graphs.
NumPy is the fundamental package for scientific computing with Python.
Pyeda is a Python library for electronic design automation. We once used pyeda to simplify Boolean formulas. However, we found bugs in the process, so we got in touch with its author Chris Drake and submitted the issue. Therefore, we didn't use pyeda directly but make some changes to his espresso model. We are very grateful to his excellent work.
Frontend Part
To improve the user experience, we build Biohub 2.0 as a single page application (or short for SPA), using the advanced MVVM framework Vue.js. Vue.js relies on webpack (not forcibly, but recommended) for pre-compiling, which encapsulating everything into a single file. This is inconvinient for a website with a plugin system, so we analyzed the code generated by webpack and developed an approach to load components dynamically. It is the theoretical basis of Biohub 2.0's plugin system. You can refer to Main.vue to see such mechanism.
We choose Bootstrap 3 as our basic framework. Bootstrap is a UI framework developed by Twitter team, with multiple elegant components and the ability to prototype quickly. Based on it, we added many customized styles and components to provides better experience.
To visualize the data, we use serveral data representing frameworks. For example, d3.js for DNA sequance displaying in Forum and ngl.js for protein structure illustration. Such frameworks transform certain data into graphics, and make it easier for users to learn the content.
Based on the customized Websocket protocol, we encapsulate a handy library to handle Websocket packets tranferring. You may refer to it via websocket.js.