刘婕, 郝舒欣, 万红燕, 刘悦, 徐东群. 三种机器学习模型用于空气质量等级预测的比较研究—以保定市为例[J]. 环境卫生学杂志, 2024, 14(3): 264-269,272. DOI: 10.13421/j.cnki.hjwsxzz.2024.03.013
    引用本文: 刘婕, 郝舒欣, 万红燕, 刘悦, 徐东群. 三种机器学习模型用于空气质量等级预测的比较研究—以保定市为例[J]. 环境卫生学杂志, 2024, 14(3): 264-269,272. DOI: 10.13421/j.cnki.hjwsxzz.2024.03.013
    LIU Jie, HAO Shu-xin, WAN Hong-yan, LIU Yue, XU Dong-qun. Comparison of three machine learning models for air quality level prediction: a case study of Baoding[J]. Journal of Environmental Hygiene, 2024, 14(3): 264-269,272. DOI: 10.13421/j.cnki.hjwsxzz.2024.03.013
    Citation: LIU Jie, HAO Shu-xin, WAN Hong-yan, LIU Yue, XU Dong-qun. Comparison of three machine learning models for air quality level prediction: a case study of Baoding[J]. Journal of Environmental Hygiene, 2024, 14(3): 264-269,272. DOI: 10.13421/j.cnki.hjwsxzz.2024.03.013

    三种机器学习模型用于空气质量等级预测的比较研究—以保定市为例

    Comparison of three machine learning models for air quality level prediction: a case study of Baoding

    • 摘要:
      目的 利用支持向量机(support vector machine, SVM)、随机森林(random forest,RF)和多层感知器(multilayer perceptron, MLP)三种机器学习
      方法 分别构建保定市未来三日空气质量等级预测模型,通过对参数调优和预测
      结果 比较选择三种模型中的最佳模型。
      方法 基于保定市2014—2022年的空气污染物日均浓度监测数据和同期气象数据,采用SVM、RF和MLP三种机器学习模型,利用前四日数据为未来三日分别构建了每日的空气质量等级预测模型并评估特征变量的重要性。对模型参数进行调优,采取十折交叉验证法进行验证,通过准确率和AUC等指标来评估模型性能。
      结果 SVM模型未来三日准确率分别为69.8%、63.5%、62.3%,AUC分别为77.4、70.8、70.7;RF模型未来三日准确率分别为75.9%、68.2%、67.1%,AUC分别为0.84、0.74、0.72;MLP模型未来三日准确率分别为73.2%、66.4%、65.7%,AUC为0.83、0.74、0.73,综合对比RF模型表现最优;空气质量特征变量重要性高于气象因素特征变量。
      结论 通过对比研究,RF机器学习模型能够相对有效地预测未来一日空气污染等级,并提供空气质量等级预警。

       

      Abstract:
      Objective To construct air quality level prediction models for the next three days in Baoding, China using the support vector machine (SVM), random forest (RF), and multilayer perceptron (MLP) independently, and to select the optimal model from the three models by tuning parameters and comparing the prediction result.
      Methods Based on the daily average concentration monitoring data of air pollutants and concurrent meteorological data in Baoding from 2014 to 2022, SVM, RF, and MLP models were constructed to forecast the air quality level for each of the next three days using the data of the previous four days, and the importance of feature variables was assessed. The model parameters were fine-tuned, and 10-fold cross-validation was performed. The performance of the models was evaluated using indicators including the accuracy rate and the area under the curve (AUC).
      Results For the SVM model, the accuracy rates for the next three days were 69.8%, 63.5%, and 62.3% respectively, and the AUC values were 77.4, 70.8, and 70.7, respectively. For the RF model, the accuracy rates for the next three days were 75.9%, 68.2%, and 67.1%, respectively, with AUC being 0.84, 0.74, and 0.72, respectively. For the MLP model, the accuracy rates for the next three days were 73.2%, 66.4%, and 65.7%, respectively, and the AUC values were 0.83, 0.74, and 0.73, respectively. The results indicated that the RF model showed the best performance. The importance of air quality feature variables was higher than that of meteorological feature variables.
      Conclusion Through comparison, the RF machine learning model can effectively predict the air pollution level for the next day and provide early warnings of air quality levels.

       

    /

    返回文章
    返回