Team:Pasteur Paris/Model

Modeling

Introduction

Models guided us from the scientific design of the process to the actual design of our device. First, physics-based models helped us understand only indoor air pollution can be addressed through purifiers. Then, data analysis provided an ideal target for our proof of concept: volatile organic compounds.

We designed a machine learning-based algorithm to predict the level of indoor pollution when the outdoor pollution and other parameters are known. Mathematical modeling was built to scale our device for the domestic setting, given the size and the pollution level of a room. As the advection-diffusion equation ruling pollutant transport is impossible to solve analytically, numerical refinements were necessary to take air flow effects into account, and adapt the design of the device accordingly.

Finally, the data gathered through our application and analysis of the filters are to be processed in mathematical models to better localize risks and guide public policies.

I. Identifying our target

1. The choice of indoor air pollution

At the beginning of our project, we were not sure whether to target indoor or outdoor air pollution. It seemed to us that, as they share many common features, we could target both. There are many examples of attempts to design depolluting systems for outdoor use, from using physical processes to trapping carbon dioxide with algae. One depollution method was promised to great applications, namely titanium dioxide. It is a photocatalyst: when exposed to ultraviolet radiation, it allows the degradation of several pollutants. However, titanium dioxide nanoparticles are suspected to be toxic. On top of that, they are not fit for indoor air depollution, as their activity requires natural light. With the advance of our research of solutions to tackle pollution, it appeared to us clearer and clearer that, for outdoor air, there is only one real solution: reducing emissions dramatically.

Indeed, the main problem with outdoor air is its mobility: it would require huge surfaces to depollute it significantly. Meteorological study of the boundary layer of the atmosphere shows it is very well mixed. The underlying process is a phenomenon of turbulence, created by surface effects and temperature differences. The thickness of this turbulent layer varies dramatically: from 10 meters to 1,000 meters. Whatever the thickness of the layer may precisely be, turbulent phenomena stir very efficiently the surface layer of the atmosphere, where most of the pollutants are emitted. Members of the team conducted a side project on turbulent mixing with vortex networks, showing results in accordance with this theory.

Indoor air can also be turbulent, as shown in several papers dealing with the subject. However, to compare the two environments, we need to compare their sources – volume ratio. The sources of pollution are essentially a surface effect, both in indoor and outdoor environment. It is clear that the ratio surface/ turbulent volume is way higher indoors than outdoors. It means that indoor air is both well mixed and surface pollution is really important there because of the confinement. It is a rather general result that surface effects will hold a greater importance at smaller scale: this is why the small chunks burn before the big chunks are cooked, and this is how insects walk on water.

2. The choice of volatile organic compounds (VOC)

In order to decide the category of pollutants our proof-of-concept device would target, we made statistical models. They allowed us to determine a suitable target. We used a dataset provided by Gams, a company that works on indoor air monitoring solutions. The dataset we based on was composed of indoor and outdoor pollution concentrations (co, no2, o3_1h, o3_8h, pm10, pm25, so2) and data about the indoor conditions (temperature, humidity) hour by hour from November 2016 to March 2017 in Shanghai Jingan. We started our data analysis by visualizing the features of our data set to check if there were obvious correlations.

As we can see, indoor and outdoor PM2.5 concentrations seem to be correlated. We performed a linear regression for the concentrations of different pollutants and obtained the following results:

Conclusion: Those pollutants come from outside the house. This statement moved us to target pollutants mostly emitted inside the house: VOC, coming from heating/cooking with solid fuels, paintings …

Then, we tried to find the features influencing the VOC concentration and designed a basic machine learning algorithm to predict the indoor VOC concentration.

There is clearly a drop in VOC concentration at night, and an increase in the day. This parameter seems to be important.

It looks like there is a correlation between VOC concentration and humidity. The higher the humidity, the lower the concentration of VOC.

There are also variations in VOC concentration as a function of temperature. It seems that the higher the temperature, the higher the VOC concentration (apart from T = [18;19]).

With these three parameters, we designed a machine learning algorithm, using a gradient boosting method. We used the R coding language and designed a regression with the XGBoost package. Xgboost is short for eXtreme Gradient Boosting package. It is an efficient and scalable implementation of gradient boosting framework by @friedman2000additive and @friedman2001greedy. Two solvers are included:

linear model;

tree learning algorithm.

It supports various objective functions, including regression, classification and ranking. The package is made to be extendible, so that users are also allowed to define their own objective functions easily.

Our dataset was composed of 3 features: humidity, hour, temperature. Our purpose was to predict the VOC concentration. We trained the data on the first 100 000 samples of the dataset, and tested our algorithm on the 36 000 last samples.

Our algorithm allows us to make predictions from the dataset (that contains three features: hour, humidity, temperature). Its performances are quite satisfactory since the predicted values respect the pace. But it could be greatly improved (we had an MSE of 0.01 approximately) by adding new features. This is the objective of the beta test of our scenario: through a thorough study of the betatesters environment, we can add features to our dataset (type of housing, lifestyle, ventilation frequency, presence of an animal ...).

This data analysis allowed us to conclude that VOC are a relevant target for our device. The choice of polyaromatic hydrocarbons (PAHs) among VOC species was guided by our interaction with experts and our bibliographical effort.

II Modeling for design

1. Scaling our device

Uniform concentration model
Without any source, we modelled the activity of the filter by the following equation:
dC/dt = -Q r/V * C(t)
C(t)=C0 exp(-t/τ) with τ=V/(Q*r)

Where V is the volume of the room, Q is the volume flow rate of the device, and r represents the percentage of filtered pollutants. This model allowed us to aim for Q = 4 liters per second, which is the typical flow rate of a computer fan. As our designed filter is still at a pre-prototype level, we could not determine experimentally the depollution factor of the filter r. If we assume that r=10%, it means that 10 % of the particulate matter that enters in the filter gets trapped. We computed the model for a room whose dimensions are the followings: V = 4m*4m*2.5m=40m3. So, τ = 100 000s = 28 hours.

The emission of VOC is characterized by a constant flow rate on the surfaces: Jpoll = 0.015 µg/s/m². This hypothesis illustrates the emission of pollutants by the walls a freshly painted room, for instance.
We take C0 = 1000mg/m3, the health risk threshold value. The equation becomes: C(t)=C0 exp(-t/τ) + Jpoll S τ

J_poll S τ = 0.015*10-6 * (2.5*4*4+4*4) * 100 000 = 0.084 g/m³

2. Refinement: taking air flow into account

Our uniform concentration model allowed us to identify the right size and air flow of the device assuming a few very simple hypotheses. However, it is not fully parametrized to predict the behavior of the pollutants within the room, which is essential for the design of our device.
There was a good reason not to look at this behavior too closely in the first place: it is really complicated to predict. Indeed, the concentration of pollutants C in a room with an air flow v follows the advection-diffusion equation:
D ∆C = ∂C/∂t + v. grad(C)
Where D is the diffusion coefficient of the pollutant, and ∆ is the Laplacian operator.

This equation admits no analytical solution in the general case, so additional information is required to model the behavior of the pollutants. In particular, assumptions are needed concerning the sources of pollutants (boundary conditions). The air velocity field is unknown: it involves advanced Fluid Dynamics, beyond our reach. There may also be chemical reactions between pollutants: this is a complex field of study, which is also beyond our simple model.

Thus, we made the following assumptions in our model:

-No interaction between the pollutants

-The pollution sources are the walls, which release particles uniformly

-The air velocity field is created only by the ventilator included in our device

And we modelled it for a square box in two dimensions. (You can find the details of the model design here: PDF)

Here is what we observed:

When we do not consider the concentration in the room as uniform, we see that the number of particles absorbed by the unit of time decreases more rapidly. When we eliminate this hypothesis, which is not very realistic, the efficiency of our filter is worse. As one could see in the figure below, the concentration also follows a decreasing exponential path.

However, we can criticize this model for the fact that it does not consider the saturation of the filter and thus does not allow to evaluate its lifetime. Experiments and additional data will be needed to calibrate the model. We conceived our scenario and application with this idea: with the rise of Data Science, gathering data offers new opportunities to tackle air pollution.

References

1. https://tel.archives-ouvertes.fr/tel-01131382/file/2014Rahmeh57730.pdf

2. https://github.com/twairball/gams-dataset

3. Air pollution meteorology and dispersion SP Arya - 1999