Our degradation model contained a set of biochemical reactions, as other metabolic models might be. We generated several ordinary differential equations from these reactions with unknown kinetic constants. These equation sets might not have analytic solutions and numeric methods always fail when dealing with unknown parameters in differential equations. One way to fix such problem is to make an approximation. For example, if a reaction X is not highly reversible, the kinetic constant kX for that reaction would be regarded as zero. However, if any of the reaction in a chain process is not that irreversible, such approximations could also fail. This year, we involved a parameter optimizing method and combined the optimization with numeric solvers, which allowed us to get more accurate values for the kinetic constants from an initial guess. As soon as we get the accurate values for the kinetic constants, the degradation model could then predict the degradation rate of our pollutant. We’ve developed this method (we call it a numeric solving – parameter optimizing method) for our future work – since the enzyme could deal with many kinds of pollutant (theoretically), we couldn’t test them all and fit the model with experimental data.
Once the population of the engineered E. coli reached a stable level (and the expression of enzymes would reach a stable level as well), the degradation rate of the pollutant could be measured. The degradation process contains the following progress:
$\mathrm{TfdB}+\Phi\rightleftharpoons\mathrm{TfdB}\Phi$ $k_{1+},k_{1-}$
$\mathrm{TfdB}\Phi\rightleftharpoons\mathrm{TfdB}\Phi'$ $k_{2+},k_{2-}$
$\mathrm{TfdB}\Phi'\rightleftharpoons\mathrm{TfdB}+\Phi'$ $k_{3+},k_{3-}$
$\mathrm{CphA_{-1}}+\Phi'\rightleftharpoons\mathrm{CphA_{-1}}\Phi'$ $k_+',k_-'$
$\mathrm{CphA_{-1}}\Phi'\rightarrow\mathrm{CphA_{-1}}+\Omega$ $k'$
Where Φ represents the pollutant, Φ' represents the intermediate product (the production from TfdB) and Ω represents the final production.
Parameters
Variable | Explanation |
$[\mathrm{substance}]$ | concentration of the substance |
$t$ | time |
Constant | Explanation |
$[\mathrm{TfdB}]$ | concentration of TfdB |
$[\mathrm{CphA_{-1}}]$ | concentration of CphA-1 |
$k_{?}$ | rate constant, see reaction equations |
Equations
$\begin{cases}
\dfrac{\mathrm{d}[\Phi]}{\mathrm{d}t}=k_{1-}[\mathrm{TfdB}\Phi]-k_{1+}[\mathrm{TfdB}][\Phi]\\\\
\dfrac{\mathrm{d}[\mathrm{TfdB}\Phi]}{\mathrm{d}t}=k_{1+}[\mathrm{TfdB}][\Phi]+k_{2-}[\mathrm{TfdB}\Phi']-(k_{2+}+k_{1-})[\mathrm{TfdB}\Phi]\\\\
\dfrac{\mathrm{d}[\mathrm{TfdB}\Phi']}{\mathrm{d}t}=k_{3-}[\mathrm{TfdB}][\Phi']+k_{2+}[\mathrm{TfdB}\Phi]-(k_{3+}+k_{2-})[\mathrm{TfdB}\Phi']\\\\
\dfrac{\mathrm{d}[\Phi']}{\mathrm{d}t}=k_{3+}[\mathrm{TfdB}\Phi']+k_-'[\mathrm{CphA_{-1}}\Phi']-k_+'[\mathrm{CphA_{-1}}][\Phi']-k_{3-}[\mathrm{TfdB}][\Phi']\\\\
\dfrac{\mathrm{d}[\mathrm{CphA_{-1}}\Phi']}{\mathrm{d}t}=k_+'[\mathrm{CphA_{-1}}][\Phi']-(k'+k_-')[\mathrm{CphA_{-1}}\Phi']\\\\
\dfrac{\mathrm{d}[\Omega]}{\mathrm{d}t}=k'[\mathrm{CphA_{-1}}\Phi']
\end{cases}$
This model could be used on a wide range of substances -- not only various kinds of phenol, but also indole and many other pollutants. For the model fitting part, our aim was not to fit this model for a specific kind of pollutant -- we aimed to involve a fitting method which would be used in our future work.
We combined Runge-Kutta method and simulated annealing method. Runge-Kutta method is a widely used numeric method for differential equation solving, this method could only work with given parameter values. Simulated annealing is a optimization method based on a physical process of annealing.
We planned to start the fitting with initial guesses for the kinetic constants -- we could get this guess from given homologous protein kinetic rules. The initial values of kinetic constants may not fit our datas well, however, we could randomly change the values and compare them with our experimental datas and calculate the degradition kinetic curve with the RK-4 algorithm.
Based on the least squares calculations, we could give up some of the changes (or, we can say 'annealing try'). The annealing tries lead to extremely big least squares values would be given up. The rule to judge whether an annealing try should be abandoned or not is to compare it with the former result.
If the former values for the kinetic constants lead to greater least squares, the current try would be accepted to take the place of former values; if the current try could not give smaller least squares, we have to calculate whether it is 'worse enough' -- we might accept it with a 'certain probability':
$p=e^{-\frac{(\Delta L)}{T}}$
where $p$ referes to the possibility, $e$ is the base of natural logs, $\Delta L$ is the difference between the least squares from the current values and the former values, $T$ is the 'current temperature'. Current temperature is involved here to avoid bad annealing tries for the last steps --
in the physical world, though the beginning annealing tries would move particals unexpectedly (to avoid the local best traps -- particals might get through energy barriers to reach their stable positions), particals might not move so far away to an unstable position again in the last several steps.
In the last several steps, since $T$ would have a quite small value ($T$ decrease with the progress of annealing), any of positive values of $\Delta L$ will cause a high possibility of rejection.
We designed our whole algorithm with Matlab®, however, to show the whole principle of our algorithm, we didn't use Matlab build-in functions for the key steps in our work. Click here to view our codes.
function [co,ls]=SAF_NFODE(varargin)
%%--------------------------------------------------------------------------------------------------------------------------------------------------
%
% SAF_NFODE - Simulated annealing fitting for coefficients in nonanalytic first-order ODE models.
% This function fits coefficients in first-order ordinary differential equations with given datas using simulated annealing method.
%
%--------------------------------------------------------------------------------------------------------------------------------------------------
%
% [ coef ] = SAF_NFODE ( eqns , var_list , var_ini , coef_list , coef_ini , der_list , region , NAME , RESULTS , ... , TAG , SET , ... )
% [ coef , l_squ ] = SAF_NFODE ( eqns , var_list , var_ini , coef_list , coef_ini , der_list , region , NAME , RESULTS , ... , TAG , SET , ... )
%
%--------------------------------------------------------------------------------------------------------------------------------------------------
%
% Inputs:
%
% eqns: first-order ordinary differential equations.
% E.g. { 'Dy1 == k1 * x + y1 * ( y2 + k2 )' , 'v2 == k2 * y1 + y2 - k1' } (in which Dy1 represents dy1/dx, v2 represents dy2/dx)
%
% var_list: list of variables, has to be symbolic, helps to tell variables from coefficients.
% E.g. { 'y1' , 'y2' , 'x' }
% (notice that 'RK4_STEP''RK4_TOL''SA_BGT''SA_EDT''SA_STEP''SA_TRYTIME''SA_CHANGE' are illegal variable names here)
% the independent variable has to be the last term on this list
%
% var_ini: initial values for all variables, helps to set the initial point for numerical solving.
% E.g. [ y1_val , y2_val , x_val ] (notice that values must be presented in the same order as var_list)
%
% coef_list: name list of coefficients, helps to tell coefficients from variables.
% E.g. { 'k1' , 'k2' }
%
% coef_ini: initial guess for the value of coefficients, not essential, but will help to reach the best fitting value faster.
% E.g. [ k1_val , k2_val ]
%
% der_list: list of derivatives, must be in the same order as var_list.
% E.g. { 'Dy1' , 'v2' }
%
% region: region for the independent variable, helps the ode solver to identify its' solving interval.
% E.g. [ x_min , x_max ]
%
% NAME: name list of measured variables. At least two terms should be measured in the experiment.
% RESULTS: experimental results.
% E.g. 'x' , [ x_1 , x_2 , x_3 ] , 'y1' , [ y1_1 , y1_2 , y1_3 ] (notice that RESULT must be given in the same order)
%
% TAG: additional options.
% SET: settings for options.
% E.g. 'RK4_STEP' , 0.005 -- sets the Runge-Kutta original step to 0.005.
% E.g. 'RK4_TOL' , 1E-14 -- sets the Runge-Kutta truncation error tolerance to 1E-14, step will be reset due to the tolerance
% E.g. 'SA_BGT' , 700 -- sets the simulated annealing beginning temperature to 700.
% E.g. 'SA_EDT' , 5 -- sets the simulated annealing ending temperature to 5.
% E.g. 'SA_STEP' , 0.5 -- sets the simulated annealing temperature changing to 0.5 per step.
% E.g. 'SA_TRYTIME' , 15 -- sets the simulated annealing try times to 15 before each step of cooling down.
% E.g. 'SA_CHANGE', 0.01 -- sets the maximum range of coefficient changings to 0.01
%
%--------------------------------------------------------------------------------------------------------------------------------------------------
%
% Outputs:
%
% coef: fitting results of the coefficients.
% l_squ: least squares for the distances from the experimental points to the fitting curves.
%
%--------------------------------------------------------------------------------------------------------------------------------------------------
%
% For more information, visit our GitHub: iGEM2017JLU.
% Written by Shuai Wang
% Jilin University iGEM 2017
%
%--------------------------------------------------------------------------------------------------------------------------------------------------
%% Robustness (input check)
if nargin<6
error('Input error: not enough arguments!');
else
odes=varargin{1};varName=varargin{2};varInitial=varargin{3};coefName=varargin{4};
if ~(iscell(odes)&&iscell(varName)&&isnumeric(varInitial)&&iscell(coefName))
error('Input error: illegal data type!');
end
if length(varName)~=length(varInitial)
error('Input error: unable to set initial point!');
end
end
if isnumeric(varargin{5})
coefInitial=varargin{5};
if nargin<7
error('Input error: not enough arguments!');
else
if ~iscell(varargin{6})
error('Input error: illegal data type!');
else
derName=varargin{6};
end
end
input_subscript=7;
else
warning('Input warning: unable to get coeficients initial values.');
coefInitial=rand(size(coefName));
if ~iscell(varargin{5})
error('Input error: illegal data type!')
else
derName=varargin{5};
end
input_subscript=6;
end
if isnumeric(varargin{input_subscript})&&(prod(size(varargin{input_subscript})==[1,2]))
reg=varargin{input_subscript};
else
error('Input error: illegal data type!');
end
rk4step=0.05;rk4tol=1E-6;sa_bgt=400;sa_edt=25;sa_step=5;sa_tt=15;sa_cr=0.002;
if mod(nargin-input_subscript,2)
error('Input error: inputs don''t match!');
else
input_subscript=input_subscript+1;
end
datacount=0;
for i=input_subscript:2:nargin
switch varargin{i}
case 'RK4_STEP'
rk4step=varargin{i+1};
case 'RK4_TOL'
rk4tol=varargin{i+1};
case 'SA_BGT'
sa_bgt=varargin{i+1};
case 'SA_EDT'
sa_edt=varargin{i+1};
case 'SA_STEP'
sa_step=varargin{i+1};
case 'SA_TRYTIME'
sa_tt=varargin{i+1};
case 'SA_CHANGE'
sa_cr=varargin{i+1};
otherwise
for j=1:length(varName)
if strcmp(varargin{i},varName{j})
datacount=datacount+1;
datasubscript(datacount)=j;
try
data(datacount,:)=varargin{i+1};
catch
error('Input error: unable to match up experimental results!');
end
end
end
end
end
%% Optimization
coef_o=coefInitial;
for T=sa_bgt:-sa_step:sa_edt
for j=1:sa_tt
coef_c=coef_o-sa_cr/2+rand(size(coef_o))*sa_cr;
%% solve the equations with rk4 (non-stiff equations only), and calculate the distance from experimental datas to the model curve.
var_value_o=varInitial;var_value_c=varInitial;x_NotAIndVarName=reg(1,1);len=length(var_value_o);steps=1;
clear('solution_list_o'); solution_list_o(steps,:)=var_value_o;
clear('solution_list_c'); solution_list_c(steps,:)=var_value_c;
while x_NotAIndVarNamerk4tol/15)&&(max(abs(derC_NotADerNameWm-derC_NotADerNameWd))>rk4tol);
flag_2=(max(abs(derO_NotADerNameWm-derO_NotADerNameWh))rand(1)
coef_o=coef_c; distance_o=distance_c;
end
end
end
co=coef_o; ls=distance_o;
end
function [derO_NotADerName,derC_NotADerName]=der_cal(odes,coefName,varName,derName,coef_o,coef_c,var_value_o,var_value_c)
for i=1:length(varName)
eval(['syms ',varName{i},';']);
end
for i=1:length(coefName)
eval(['syms ',coefName{i},';']);
end
for i=1:length(derName)
eval(['syms ',derName{i},';']);
end
cmdstr_o='der_o_NotADerName=solve('; cmdstr_c='der_c_NotADerName=solve(';
for i=1:length(odes)
cmdstr_o=[cmdstr_o,odes{i},',']; cmdstr_c=[cmdstr_c,odes{i},','];
end
for i=1:length(coefName)
cmdstr_o=[cmdstr_o,coefName{i},'==',num2str(coef_o(i)),','];
cmdstr_c=[cmdstr_c,coefName{i},'==',num2str(coef_c(i)),','];
end
for i=1:length(varName)
cmdstr_o=[cmdstr_o,varName{i},'==',num2str(double(var_value_o(i))),','];
cmdstr_c=[cmdstr_c,varName{i},'==',num2str(double(var_value_c(i))),','];
end
for i=1:length(derName)
cmdstr_o=[cmdstr_o,derName{i},',']; cmdstr_c=[cmdstr_c,derName{i},','];
end
for i=1:length(coefName)
cmdstr_o=[cmdstr_o,coefName{i},',']; cmdstr_c=[cmdstr_c,coefName{i},','];
end
for i=1:length(varName)-1
cmdstr_o=[cmdstr_o,varName{i},',']; cmdstr_c=[cmdstr_c,varName{i},','];
end
cmdstr_c=[cmdstr_c,varName{length(varName)},');'];
cmdstr_o=[cmdstr_o,varName{length(varName)},');'];
try
eval(cmdstr_o);
for i=1:length(derName)
eval(['derO_NotADerName(',num2str(i),')=der_o_NotADerName.',derName{i},';']);
end
catch
error(['Calculation Ends: unable to solve derivatives']);
end
try
eval(cmdstr_c);
for i=1:length(derName)
eval(['derC_NotADerName(',num2str(i),')=der_c_NotADerName.',derName{i},';']);
end
end
end