SOFTWARE
Overview
About our software and why iGEM Nottingham chose to produce it
Along with modeling all the possible combinations, another issue the team faced was they wouldn't be able to compare the read fluorescence spectra and compare it back to the mother colony's spectra in the time and resources available, as that would mean they would need to create a brand new construction, test and implement.
One solution to this problem was to conduct the verification process through software; this meant reading the spectra into a fluorescence reader to generate a range of data points which could be graphed and then visually compared pixel by pixel or directly compared. Both options were implemented.
Another issue the team faced is identifying whether Key.Coli was a user friendly product. In order to test how user's would respond to having to use Key.Coli in practice, a virtual every day environment was developed within Unity which showed a person's everyday life. Actions were monitored and fed back to the team to see how easy it was to operate Key.Coli. as well as identifying any practical problems we would face.
Lastly, to put Key.Coli into practice, was to write a security layer on top of Linux on a Raspberry Pi which means it can only be accessed through the Key.Coli password. This protects the computer from intruders, hence completing the objective of Key.Coli
Image Comparison Software
Comparing images of spectra from two different colonies to check for similiarity
A major issue the team faced was the comparison between the fluorescence with the mother colony and the Key.Coli mechanism. One option considered was to use image comparison to check how similar the spectra were
Using data from wet lab, a graph can be produced. This graph could be compared using an image similarity algorithm to check the difference between a data set from a certain time point to another data set from another time point.This was very important to the project as it allowed us to compare the fluorescence spectra of one random construction at different time periods.
The image (a bitmap) is scanned pixel by pixel and written into a temporary file where it is checked for similarity with another image using the Damerau - Levenshtein Distance algorithm which was coded in C#. This can be represented as such 1 :
Figure 1
If the distance is below a threshold, this means the user is given access, as denoted by as "ACCESS GRANTED" message. Otherwise, they are shown an error message. A threshold value was proportional to the time after the first reading of specta; the longer the the sample has been away from the mother colony, the higher the threshold you'd need.
Figure 2 - Software when first opened
Figure 3 - When files are selected and "Compare" is clicked
Figure 4 - The files are too dissimilar! Access isn't allowed.
Figure 5 - Changing the threshold might accommodate this difference
One major issue with using Image Comparison is that the images were required to be exactly the same size for it to work and be bitmaps, which is an uncommon image type. Another issue is that if the images were very large, the time would take longer. This would be an improvement over that method as there wouldn't be any images involved
1 Image taken from Wikipedia
Key.Coli Verification
Comparing the raw data of two different colonies straight from the fluorescence reader
Another method of comparing fluorescence spectra is by taking raw data and comparing them cell by cell. During the development of the software, the team found that the data held the same format in terms of spacing when outputted by the fluorescence reader. This made it far easier to write a data comparison algorithm.
By using Java and working with the libraries which support the spreadsheet format, the team was able to directly compare sets of data by calling for values from each cell and calculating the difference. This was then checked with a threshold value; if it is above the threshold value, it fails the check and the user is locked out.
A threshold value is how much variation the colony can have from the mother colony before it isn't valid. An issue with this as time goes on, the threshold value will have to change to catch a larger variation because the longer the colony is away from the mother colony, the more different it becomes. In order to calculate a threshold value at any given time, a Polynomial Fit of Order 3 is calculated using the data from the mother colony. To calculate the Polynomial Fit, Figure 6 was translated into Java code.
Figure 6
$$ \color{white}{ y_{i} = \beta_{0} + \beta_{1}x_{i}+ \beta_{1}x^2_{i} + ... + + \beta_{m}x^m_{i} + \varepsilon_{i} (i = 1,2,...,n) } $$Which can be expanded as in Matrix Notation:
$$ \color{white}{ \begin{bmatrix} \\ y_1 \\ y_2 \\ y_3 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} 1 & x_1 & x^2_1 & ... & x^m_1\\ 1 & x_2 & x^2_2 & ... & x^m_2\\ 1 & x_3 & x^2_3 & ... & x^m_3 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_n & x^2_n & ... & x^m_3 \end{bmatrix} \begin{bmatrix} \\ \beta_1 \\ \beta_2 \\ \beta_3 \\ \vdots \\ \beta_n \end{bmatrix} + \begin{bmatrix} \\ \varepsilon _1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \vdots \\ \varepsilon_n \end{bmatrix} } $$Which can be simplified to
$$ \color{white}{ \vec{y} = X \vec{\beta} + \vec{ \varepsilon } }$$Where...
$$ \color{white}{ \varepsilon \text{ is the y-intercept} } $$ $$ \color{white}{ X \text{ represents a design matrix which holds a set of objects} } $$ $$ \color{white}{ y \text{ holds the value of the dependent variable} } $$ $$ \color{white}{ \beta \text{ denotes the slope of the line } } $$This was implemented through the use of For Loops to cycle through an array of data points. This was done only with the mother colony to create a threshold for each data point. This was done by using the Polynomial Fit to output an equation for the fit which used a variable X as it's input. X was simply substituted for the data point value that was used to create the fit to create a threshold value. This threshold value could be adjusted by adding or subtracting from it. The Key.Coli intensity was compared to this threshold value using Selection statements; if the key colony's data point wasn't within the upper limit or lower limit, they were locked out. The team decided it would be appropriate to use Polynomial Fitting as it was found to follow the points the closest when graphed in Excel.
Figure 8: The system won't let the user in as the colonies are too different
Figure 9: The system lets the user in as the colonies are nearly identical and falls within threshold
Fluorescence Spectra Simulation
Simulating fluorescence spectra from given protein concentrations
The software was written as an answer to the team wanting to be see how the spectra would look like after a certain amount of time as well as what the fluorescence would be at a certain time point. This is very useful as it allows the team to know what to expect during the constructions as well as be able to test multiple conditions in a short time
To find out more about how we developed this software using models, click here
The simulation was written in C and compiled to Linux binaries so it will only work on Linux systems. However, the source code is available so users can recompile it to their OS of choice (compiling code is inside.) In order to use it, navigate to the folder where you are keeping the program files on Terminal and type: ./loader which will activate the program. You then have to type in how much of each protein you are expecting for example, 0.1 micro-grams of sfGFP and then the wavelength of the laser you are using.
Figure 6
The simulation supports command line interface for these inputs. When parameters are set, the program will calculate the expected fluorescence over time. This will be outputted as shown on below.
Figure 7
The spectra shown should represent the spectra that would be expected to be produced at those concentrations and wavelengths. When comparing to real data, there were differences and the model had to be refined to accommodate this. This process is extensively discussed in the Modelling section 1 .
1 See Relationship between Max Fluorescence and Protein Concentration
Random Number Generation
Generating random numbers from our randomly constructed colonies
When speaking to our industry contacts about Key.Coli, they were very interested in seeing Key.Coli's capabilities as a Random Number Generation tool. After gaining results from the random constructions, the team set to finding out if the values could be used to generate a string of random numbers. The importance of Key.Coli's core value of informational security is the ability to produce a persistent but randomly generated state.
There were two ways to generate a string of random numbers from the colonies: either try to generate a string of numbers from a three colonies (one acting as a Minimum and another as a Maximum for range) or treat each colony as a random number by itself
On investigating the first method, out of 3 colonies, one was assigned the role of being the Minimum, having the lowest fluorescence intensity, and another a Maximum, having the highest. Using the equation: INT((MED[...] - MIN[...]) / MAX[...]) * 255), the team could generate a string of random numbers by inputting the fluorescence values over time. The INT(..) command sets the number to an Integer value, so no decimal points would appear. The MIN colony returns the colony with the lowest intensity, MAX returns the highest, and MED returns the median. The result was multiplied by 255 to produce a number out of 255, which is the largest value of a byte. The results are shown on Figure 8
Figure 8
For comparison, the other values are taken from other pseudo-random and random number generators 1
Looking at the graph, it shows that Key.Coli generated numbers tend to a lower range of numbers. This was confirmed when checking for a normal distribution; it was found the set of numbers were biased to the the top and bottom of the set ranges, which suggests that three colonies scaled over time cannot be used to generate a set of numbers.
In the other method, the team found that one colony can be used for one number (in our case, it was 18, 128 and 125.) Theoretically, these are random as the colonies were constructed in a random fashion using Brownian Motion 2 . However, due to time and resource constraints, it would be impossible to create 200,000 colonies required for testing currently but this maybe achievable through automation.
This is still very useful: it means it can be used as a random seed value for a random number generator. Furthermore, one way to get a set of numbers from one colony is to break it up. The team did this and found each colony, despite being genetically similar, had varying levels of fluorescence from each other.
However, future projects can feel free to use Key.Coli to generate true random numbers. In the future, the team would like to investigate the random nature of the key.coli system more thoroughly.
1 RAND was generated using =INT(NORM.INV(RAND(),XX,XX)) on Excel, Atmospheric Noise was taken from Random.org and Fortuna is used in some Unix based OS to generate security keys
2 See Modelling's Are Our Constructions Random
Linux Key.Coli Security Layer
Porting our comparison software to low end hardware to safeguard a system
As a final wrap up for the project, all the software and modeling was put together to create an additional security layer on top of Linux for the Raspberry Pi. A Raspberry Pi is a super low budget low-end computer which is favoured by enthusiasts and computer hobbyists which is designed to be programmed easily for, as the hardware comes unlocked. The reason we chose this was because it would give us the least issues when it came to editing the security protocols of Linux.
Figure 9
This was done to show people how Key.Coli could be used to secure your computer from strangers who don't have the Key.Coli but might know your password as well as giving us a physical demonstration of Key.Coli to show at the Jamboree.
The system was designed as a program that would load when Linux booted up. The system works by locking out the user from files where they can only access the a temporary file where the spectra from the colonies would go. This is compared 1 and if it matches the threshold, the computer unfreezes.
Using the Key.Coli Verification software developed and modifying it to support the file system on the Pi, this successfully happened. In order to "unlock" your computer, the user would need to connect the Raspberry Pi to two different fluorescent readers: one for the mother colony, and one for the Key.Coli mechanism. Both readings would be stored in a temporary file space and compared for similarity.
However, due to health and safety regulations, for the Jamboree, we read data from conditions in lab and stored them on USB sticks, acting as the Key.Coli and mother colony. This is similar to how the actual system would work, except it has USB drives instead of fluorescence readers