Difference between revisions of "Team:Heidelberg/Sandbox256"

Revision as of 13:21, 1 November 2017

Software

SafetyNet

When performing large scale, automated directed evolution experiments a manual assertion of every sequence in the library is impossible. However profound background and quality checks on sequences are crucial in the automated context as the experimentator has no direct control of the processes. This especially holds true for in silico evolution, where the immediate effect of a mutation is not assessable. In order to safeguard our in vivo and in silico directed evolution experiments we developed Safetynet. Safetynet is a web available, neural network based sequence check. It does not only infer the function and species of origin, but does also assert the safety level assigned to the origin species and the potential harm of an input sequence. We applied SafetyNet throughout our directed evolution experiments to ensure safe and flawless sequence improvement all the while preventing the unintended emergence of harmful traits.

Method

Safetynet is based on two algorithmic pillars. The first one is a BLAST search of the input sequence against the swissprot database, performed through the NCBI web API. The request is POSTed to the NCBI server and the result is catched with GET request. Subsequently the result is parsed for the protein IDs of all non redundant matches. Next, the retrieved protein IDs are used to send a GET request to the UniProt database, requesting the entry of the protein in question. The entry is again parsed for key information, this time returning the assigned GO-Terms, the species of origin and the gene of origin. Subsequently the collected information on each entry is combined and a lookup on the safetynet internal databases is performed. These comprehensive databases list GO-Terms associated with cytotoxic, viral or pathogenic functions or pathways. Further we included the functional terms for proteases and nucleases, to account for destructive intracellular potential. The biological safety level of the retrieved species of origin is investigated by a database lookup on the biosafety-database of the german ministry of consumer and food safety (the german FDA).
The second algorithmic column applies a DeeProtein implementation in the browser. Upon user request the neural network inference can additionally be enabled to support the BLAST search in function classification. This is especially useful as the neural network is able to detect latent or "hidden" potential as it learned the sequence to function relation accross the whole respective functional domain, whereas the BLAST search is limited to direct sequence identity.
The browser inetgrated neural network is implemented in DeeplearnJS and features GPU support. It is a ResNet30, similar to the Architecture of DeeProtein, asserting the class probaility for 886 classes. As the size of the ResNet-weigths is ~100MB we offer a selection mode to guarantee the use of the BLAST-part on mobile connections.
Finally the collected information is concatenated and presented in a easily understandable color coded scheme.

Parameter overview

Table 1: Variables and Parameters used for the calculation of the glucose and E. coli concentrations List of all paramters and variables used in the numeric solution of this model.

Symbol	Value and Unit	Explanation
\(c_{G_{T} }\)	[g/ml] or [mmol/ml]	Glucose concentration in Turbidostat
\(c_{G_{M} }\)	[g/ml] or [mmol/ml]	Glucose concentration in medium
\(c_{G_{L} }\)	[g/ml] or [mmol/ml]	Glucose concentration in lagoon
\(t\)	[min]	Time
\(\Phi_{T}\)	[ml/min]	Flow rate through Turbidostat
\(\Phi_{L}\)	[ml/min]	Flow rate through Lagoon
\(c_{E}\)	[cfu/ml] or OD600	E. coli concentration
\(q\)	\([g_{glucose} \: g_{DW}^{-1} h^{-1}]\)	Glucose consumption by E. coliNeubauer2001
\(t_{E}\)	[min]	E. coli generation time

       

            
    

    
        
            Get the ideal concentration
            
                
                
    
        
                    
    
        
            
                Glucose concentration \(c_{G} \: [g/l]\)
            
            
        
    

                    
    
        
            
                Glucose concentration \(c_{G} \: [mmol/l]\)
            
            
        
    

                    
    
        
            
                Flow rate \(\Phi_{T} \:[Volumes/h]\)
            
            
        
    

                    
    
        
            
                Generation time \(t_{E} \:[min]\)
            
            
        
    

                
        

                
    
        
                    
    
        
            
                E. coli titer \(c_{E} \:[OD600]\)
            
            
        
    

                    
    
        
            
                E. coli titer \(c_{E} \:[g_{DW}/l]\)
            
            
        
    

                    
    
        
            
                Lagoon volume \(V_{L} \:[ml]\)
            
            
        
    

                    
    
        
            
                Lagoon flow rate \(\Phi_{L} \:[ml/min]\)
            
            
        
    

                
        

                
    
                         
                    
    
        
            
                Glucose degradation \(q \: [g_{glucose} \: g_{DW}^{-1} h^{-1}]\)
            
            
        
    

                    
    
        
            
                Time \(t \: [min]\)
            
            
        
    

                    
    
        
            
                E. coli capacity \(c_{c} \: [g/L]\)
            
            
        
    

                    
    
        
            
                E. coli capacity \(c_{c} \: [OD600]\)
            
            
        
    


                
        

                
    
                              
                    
    
        
            
                 New
            
        
    

                    
    
        
            
                 Exponential
            
        
    

                
        

                
    
          
                    
    
        
            
        
    

                    
    
        
            
        
    

                    
    
        
            
        
    

                
        

                
    
          
                    
    
        
            
        
    

                    
    
        
            
        
    

                    
	

		
	
	
        
            
                    
                        
                        Choose File
                        
                    	
                
        
	


                
        

                                    
            
            
        
    


            
                
                    
                        
                    
                
    

    
        
             
            
                 
                   

                           Changes in E. coli and glucose concentration over time

           
    

    
        
            
            

Theory behind this tool

}}

@@ Line 20: / Line 20: @@
      }}
      {{Heidelberg/templateus/Contentsection|
-        {{#tag:html|
+             {{#tag:html|
-             {{Heidelberg/formblank|{{#tag:html|
+<h2>Method</h2>
-            <a href="https://2017.igem.org/Team:Heidelberg/Model/Mutagenesis_Induction#glucose">
+Safetynet is based on two algorithmic pillars. The first one is a BLAST search of the input sequence against the swissprot database, performed through the NCBI web API. The request is POSTed to the NCBI server and the result is catched with GET request. Subsequently the result is parsed for the protein IDs of all non redundant matches. Next, the retrieved protein IDs are used to send a GET request to the UniProt database, requesting the entry of the protein in question. The entry is again parsed for key information, this time returning the assigned GO-Terms, the species of origin and the gene of origin. Subsequently the collected information on each entry is combined and a lookup on the safetynet internal databases is performed. These comprehensive databases list GO-Terms associated with cytotoxic, viral or pathogenic functions or pathways. Further we included the functional terms for proteases and nucleases, to account for destructive intracellular potential. The biological safety level of the retrieved species of origin is investigated by a database lookup on the biosafety-database of the <a href"https://www.bvl.bund.de/DE/06_Gentechnik/gentechnik_node.html;jsessionid=2D686A5A41B8FAC09A9583968D64A398.1_cid340">german ministry of consumer and food safety</a> (the german FDA).<br>
-<img padding="50" height="30" src="https://static.igem.org/mediawiki/2017/thumb/2/2e/T--Heidelberg--Team_Heidelberg_2017_modeling-logo.png/320px-T--Heidelberg--Team_Heidelberg_2017_modeling-logo.png">
+The second algorithmic column applies a DeeProtein implementation in the browser. Upon user request the neural network inference can additionally be enabled to support the BLAST search in function classification. This is especially useful as the neural network is able to detect latent or "hidden" potential as it learned the sequence to function relation accross the whole respective functional domain, whereas the BLAST search is limited to direct sequence identity.<br>
-Theory behind this tool
+The browser inetgrated neural network is implemented in DeeplearnJS and features GPU support. It is a ResNet30, similar to the Architecture of DeeProtein, asserting the class probaility for 886 classes. As the size of the ResNet-weigths is ~100MB we offer a selection mode to guarantee the use of the BLAST-part on mobile connections.<br>
-</a>
+Finally the collected information is concatenated and presented in a easily understandable color coded scheme.
- }}|#005498|||}}
+}}
-            To save the user from having to convert values, some of the input is redudant. For the glucose concentration \(c_{G}\), <i>E. coli</i> concentration \(c_{E}\) and the <i>E. coli</i> capacity two unit are accepted. For the glucose concentration the value in mmol has the higher priority, for the <i>E. coli</i> values, the gram dryweight value has the higher priority.
-            <br>
-            The generation time \(t_{E}\) is redundant to the flow rate \(\Phi_{T}\) in the context of a turbidostat. Here, \(t_{E}\) is prioritised.
-            <br>
-            The different buttons trigger different calculations. The buttons first word specifies the glucose concentration of which vessel is calculated, the word in braces defines the vessel of which the glucose concentration is given in \(c_{G}\).
-            <br>
-            A new clean plot can be generated using the "New" option, that means all data plotted so far is lost.
-            <br>
-            The exponential mode of the model can be enabled witht he "Exponential" option. The default is logistic growth.
-            <br>
-            When a comma-separted file is uploaded via the "Choose File" button, data from the csv is plottet into the graph.
-            The csv must have the following format:<br>
-                            <code class="html">t;   E. coli 1; Glucose 1</code><br>
-                            <code class="html">0;   0.01;            1.0</code><br>
-                            <code class="html">20;  0.02;            0.9</code><br>
-                            <code class="html">40;  0.04;            0.8</code><br>
-                            <code class="html">60;  0.08;            0.7</code><br>
-                            <code class="html">80;  0.16;            0.6</code><br>
-                            <code class="html">100; 0.32;            0.5</code><br>
-            The first row is the header providing names for all colums. The first row is interpreted as the time. All other rows are interpreted as datapoints for the point in time that is specified in the first column. The separator is <code class="html">';'</code>, the row separator is newline.
              {{Heidelberg/boxopen|Parameter overview|

Difference between revisions of "Team:Heidelberg/Sandbox256"

Revision as of 13:21, 1 November 2017

Method

Parameter overview

References

Quote

Useful Links

Follow us on

Contact us