Development of gradient boosting-assisted machine learning data-driven model for free chlorine residual prediction
作者机构:School of Civil and Environmental EngineeringGeorgia Institute of TechnologyAtlantaGA 30332USA School of Ecological and Environmental SciencesEast China Normal UniversityShanghai 200241China
出 版 物:《Frontiers of Environmental Science & Engineering》 (环境科学与工程前沿(英文))
年 卷 期:2024年第18卷第2期
页 面:35-46页
核心收录:
学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 08[工学] 0835[工学-软件工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:supported by:US Department of Agriculture’s National Institute of Food and Agriculture,Agriculture and Food Research Initiative,Water for Food Production Systems(No.2018-68011-28371) National Science Foundation(USA)(Nos.1936928,2112533) US Department of Agriculture’National Institute of Food and Agriculture(No.2020-67021-31526) US Environmental Protection Agency(No.840080010)
主 题:Machine learning Data-driven modeling Drinking water treatment Disinfection Chlorination
摘 要:Chlorine-based disinfection is ubiquitous in conventional drinking water treatment (DWT) and serves to mitigate threats of acute microbial disease caused by pathogens that may be present in source water. An important index of disinfection efficiency is the free chlorine residual (FCR), a regulated disinfection parameter in the US that indirectly measures disinfectant power for prevention of microbial recontamination during DWT and distribution. This work demonstrates how machine learning (ML) can be implemented to improve FCR forecasting when supplied with water quality data from a real, full-scale chlorine disinfection system in Georgia, USA. More precisely, a gradient-boosting ML method (CatBoost) was developed from a full year of DWT plant-generated chlorine disinfection data, including water quality parameters (e.g., temperature, turbidity, pH) and operational process data (e.g., flowrates), to predict FCR. Four gradient-boosting models were implemented, with the highest performance achieving a coefficient of determination, R2, of 0.937. Values that provide explanations using Shapley’s additive method were used to interpret the model’s results, uncovering that standard DWT operating parameters, although non-intuitive and theoretically non-causal, vastly improved prediction performance. These results provide a base case for data-driven DWT disinfection supervision and suggest process monitoring methods to provide better information to plant operators for implementation of safe chlorine dosing to maintain optimum FCR.