Team:ZJU-China/Model/Coculture

Modeling

CoCulture

Overview

The VOC device is designed to judge whether the tobacco is heathy or gets infected. Since this is an inquiry experiment, algorithms in data analysis are widely use in our modeling. We do data preprocessing, data analysis, and algorithm optimization on the data collected by VOC device. Finally, we use Logistic regression and detect the infected tobacco with 91% confidence.

Data preprocessing

First we defragment the raw input data, and reorganize them into a matrix. 10 VOC factors are served as features, and the status(heathy or infected) is served as tag to be predicted.

Data analysis

Our target is to create a model and predict tobacco's status according to 10 input features. This is a classic two classification problem, and there are several algrithm to solve it. The sampling algorithm is cross validation and the scoring policy we apply is ridit test.

Decision Tree

First we use decision tree based on information theory. ID3 decision tree is used to reduce the most information gain, and CART tree is used to reduce the GINI index. The performance of these two algorithm is almost the same. R = 0.83

Summary

In this model, we try different algorithm to abttain a robust, interpretable, and accurate solution to predict whether the tobacco is infected only according to 4 features in 91% confidence. Since there are 6 VOC sensors are meaningless in this model, we the device can also be simplified by reduce them.