Forecast Air Pollution Due to Construction Using Machine Learning Concept

Diana George

3 years ago

The dorsum of every economy is based on its infrastructure upgrowth. But construction sites generate high levels of pollution and this can cause unpredictable and unbalancing issues over a long period of time. As per the Delhi Pollution Control Committee (DPCC) officials, 30% of air pollution is caused due to dust that emanates from construction sites. In this situation, we must seriously think about the most effective solution for air pollution and also balance its future after effects. Is banning construction the solution. No, Not at all. Then what? But, hazardous chemicals in the atmosphere that has an impact on the overall quality of people’s lives, particularly in metropolitan areas.

Here, from my study and analyses, I suggest one of the most effective and long-lasting solutions for the pollution caused due to construction.ie efficient data management. Being able to model, predict and monitor real-time air quality, is becoming more and more relevant. Hence coined efficient machine learning algorithms for the purpose of gathering the real-time air quality and its forecasting.

ML is one of the techniques which can be used to predict or suggest a solution for air pollutants. Nowadays, machine learning has become an integral part of the world of computer science. It is not possible to imagine a future without machine learning, and data, the backbone of machine learning, is the key to the future. This collection of raw data, which in itself is of not much use, once clubbed with various machine learning algorithms can be used to accomplish the task that has forever been a dream of the human species, i.e., predict the future. This can be made possible by using any of the algorithms like regression, classification etc. Due to the efficient track record of the machine learning method regression is used to forecast pollutant particulate levels and use random forest method to predict the air quality index (AQI). In an era of increasing connectivity, reducing costs and size of technology, the concept of the Internet of Things (IoT) is gaining significant importance. New architectures that are based on IoT sensors are being proposed here. Data gathered from sensors can play a vital role in managing and measuring air quality. With the help of sensors that generate data, decisions can be made much faster and easier than before.

Accurate and scalable architecture for obtaining real-time air pollutant concentration using sensors and using this data set to forecast the future concentration of such pollutants and their respective index series. Here, we use air quality data from two different sources. The first is from a sensor network that measures pollutant concentrations. The second source of data set is considered as the base referential dataset (from valid local sources). Due to the proven track record of success among other Machine Learning classification algorithms in dealing with real-time data, the random forest method is chosen to perform the task of classifying the monitored sensor data into its corresponding index range by mapped with the local dataset as the base. Then attempted to apply regression methodologies to predict the future concentration of such pollutants by considering observations from the previous sensor time steps as input. The results are promising and it was proven that implementation of these algorithms could be very efficient in predicting and classifying air quality index with an average accuracy of 94.475%.

Air Quality data is collected from a valid source and also from sensors. The data we get from different sources may contain inconsistent data, missing values, and repeated data. To get a proper prediction result, the data set must be cleaned, missing values must be taken care of either by deleting or by filling with mean values or some other method. Also, redundant data must be removed or eliminated so as to avoid errors of the results.

Data pre-processing may affect the way in which outcomes of the final data processing can be interpreted. This aspect should be carefully considered when the interpretation of the results is a key point, such as in the multivariate processing of chemical data.

The aim is to predict the required air pollutant rate and respective AQI Index based on 8 attribute dataset values like temperature, maximum temperature, minimum temperature, humidity, moisture content, visibility value, wind rate, and maximum wind rate.

The data generated after imputation is used for training the model. 70% of this data was used for training the model and the rest for testing the model.

The average pollutant concentrations over a particular period of time can be accessible through a Data API. This Data API is used by the scraper to build the data set. This is the main layer that is responsible for storing and analyzing data One of the efficient forecasting algorithm of machine learning, i.e., Linear Regression is used to predict the future concentration of pollutants.LR is an algorithm based on machine learning and depends on supervised learning which performs a regression task. Depending on independent variables linear regression gives a target prediction value which is most likely used for finding the relationship among variables and forecasting.

Training and trying out a model which is predicting effectively with minimal error. Tuned model concerned by way of tuned time to time with bettering the accuracy. Finally, for this project, it is to be optimized with a random forest prediction method for the final classification of regressor data into the corresponding air quality index. Random forest is a method of bagging, and not of boosting. The trees are running in parallel in random woods. There is no contact among those trees while the trees are being installed. It performs by constructing a multitude of decision trees during training time and outputting the class which is the particular trees class mode(classification) or average prediction. A random forest is a meta estimator (i.e., it combines the outcome of many predictions) that aggregates many decision trees, with some useful improvements. The number of functions at each node that can be split is limited to a certain percentage of the total (known as the hyperparameter). Every tree takes a random sample from when, the original data set it generates its splits, adding another element of randomness that prevents overfitting.

Finally, the model is capable of predicting, Given an input pollutant concentration,

the model predicts the pollutant concentration for the next hour to the coming years .and this forecasting value is used by a random forest classifier to classify the AQI.

After getting requisite forecast information about air pollution then plan the construction process accordingly. If the area is more prone to future pollution and its adverse effect either revert the plan or go with more environment-friendly procedures and allot the budget accordingly. For any construction work, the ultimate goal should be towards saving the environment and protecting its treasured assets. Based on machine learning outcomes Builders must explore solutions that have a minimum impact on the environment and the maximum energy saving capacity.