周珍, 张亚一, 王情, 高祥伟, 马润美, 班婕, 陆开来. 基于随机森林模型的PM2.5成分NO3-浓度估算[J]. 环境卫生学杂志, 2022, 12(3): 177-183. DOI: 10.13421/j.cnki.hjwsxzz.2022.03.004
    引用本文: 周珍, 张亚一, 王情, 高祥伟, 马润美, 班婕, 陆开来. 基于随机森林模型的PM2.5成分NO3-浓度估算[J]. 环境卫生学杂志, 2022, 12(3): 177-183. DOI: 10.13421/j.cnki.hjwsxzz.2022.03.004
    ZHOU Zhen, ZHANG Ya-yi, WANG Qing, GAO Xiang-wei, MA Run-mei, BAN Jie, LU Kai-lai. Estimation of the concentration of PM2.5-bound composition NO3- based on random forest model[J]. Journal of Environmental Hygiene, 2022, 12(3): 177-183. DOI: 10.13421/j.cnki.hjwsxzz.2022.03.004
    Citation: ZHOU Zhen, ZHANG Ya-yi, WANG Qing, GAO Xiang-wei, MA Run-mei, BAN Jie, LU Kai-lai. Estimation of the concentration of PM2.5-bound composition NO3- based on random forest model[J]. Journal of Environmental Hygiene, 2022, 12(3): 177-183. DOI: 10.13421/j.cnki.hjwsxzz.2022.03.004

    基于随机森林模型的PM2.5成分NO3-浓度估算

    Estimation of the concentration of PM2.5-bound composition NO3- based on random forest model

    • 摘要:
      目的 以硝酸根离子(NO3-)为例, 建立基于随机森林算法的PM2.5成分浓度估算模型, 并获得对NO3-浓度影响较大的因子, 以及NO3-浓度的连续时间序列特征。
      方法 研究以2013—2017年气象、土地利用、排放清单和PM2.5、NO2、PM10、SO2、CO空气质量监测数据为自变量, 以NO3-浓度数据为因变量, 利用值提取至点、反距离权重插值和设置1 km缓冲区等方法将各类数据集标准化。构建随机森林模型, 并采用十折交叉法对模型拟合效果进行验证。
      结果 模型验证结果表明, 模拟值和真实值的拟合程度较高, 日均、月均和年均浓度R2分别为0.61, 0.77和0.83。由NO3-浓度的模型特征参数重要性排序可得, PM2.5质量浓度的重要性得分最高(0.387), 反照率滞后2日(lag2)、反照率滞后1日(lag1)、10 m经向风速,边界层高度等气象因素与NO3-浓度变化关系较密切。此外, 交通、民用、工业和电力部门排放的一次PM2.5源均排在重要性前20名。
      结论 多参数的随机森林模型在PM2.5成分模拟中有一定的优越性;PM2.5质量浓度、NO2、10 m经向风速、生活源和交通源的一次PM2.5源等因子对于NO3-浓度模拟影响较大;NO3-浓度存在一定的季节分布特征。

       

      Abstract:
      Objective To establish a PM2.5 component concentration estimation model based on random forest algorithm with NO3- as an example, and to investigate the large influencing factors for NO3- concentration and the continuous time series characteristics of NO3- concentration.
      Methods The study used the meteorological, land use, emission inventory, and air quality monitoring data of PM2.5, NO2, PM10, SO2 and CO between 2013 and 2017 as the independent variables and NO3- concentration data as the dependent variable, and various method such as value extraction to points, inverse distance weight interpolation, and setting of 1 km buffer area were used to standardize various data sets. A random forest model was established, and the fitting effect of the model was validated by the ten-fold crossover method.
      Results The result of model verification showed that there was a high degree of fitting between the simulated value and the monitoring value, and daily, monthly, and annual mean concentrations had R2 of 0.61, 0.77, and 0.83, respectively. According to the importance ranking of the feature parameters of NO3- concentration in the model, the mass concentration of PM2.5 had the highest importance score of 0.387, and meteorological factors such as albedo lagging for 2 days, albedo lagging for 1 day, 10 m longitudinal wind speed and height of the boundary layer were closely associated with the change in NO3- concentration. In addition, the primary PM2.5 sources emitted by transportation, residential, industry, and power sectors all ranked among the 20 most important sources.
      Conclusion The multi-parameter random forest model has certain advantages in PM2.5 composition simulation. Factors such as PM2.5 mass concentration, NO2, 10 m longitudinal wind speed, and primary PM2.5 sources emitted by residential and traffic sectors have a great influence on the simulation of NO3- concentration. NO3- concentration has the characteristics of seasonal distribution.

       

    /

    返回文章
    返回