张庆, 段丽瑶, 柳艳香, 蒋萍, 陈子煊, 刘博. 集成多种机器学习算法的哮喘疾病发病风险预测模型研究[J]. 环境卫生学杂志, 2024, 14(2): 113-120. DOI: 10.13421/j.cnki.hjwsxzz.2024.02.003
    引用本文: 张庆, 段丽瑶, 柳艳香, 蒋萍, 陈子煊, 刘博. 集成多种机器学习算法的哮喘疾病发病风险预测模型研究[J]. 环境卫生学杂志, 2024, 14(2): 113-120. DOI: 10.13421/j.cnki.hjwsxzz.2024.02.003
    ZHANG Qing, DUAN Li-yao, LIU Yan-xiang, JIANG Ping, CHEN Zi-xuan, LIU Bo. Study on the prediction model for asthma risk with integration of various machine learning algorithms[J]. Journal of Environmental Hygiene, 2024, 14(2): 113-120. DOI: 10.13421/j.cnki.hjwsxzz.2024.02.003
    Citation: ZHANG Qing, DUAN Li-yao, LIU Yan-xiang, JIANG Ping, CHEN Zi-xuan, LIU Bo. Study on the prediction model for asthma risk with integration of various machine learning algorithms[J]. Journal of Environmental Hygiene, 2024, 14(2): 113-120. DOI: 10.13421/j.cnki.hjwsxzz.2024.02.003

    集成多种机器学习算法的哮喘疾病发病风险预测模型研究

    Study on the prediction model for asthma risk with integration of various machine learning algorithms

    • 摘要:
      目的  基于集成四种机器学习算法建立哮喘疾病发病风险预测模型,为健康气象预报服务及公众防御提供依据
      方法  收集、整理2012—2018年天津市某三甲医院哮喘病患者逐日就诊数据以及同期气象因子、环境因子、花粉等数据资料,采用主成分分析法选取最优因子,应用Stacking集成学习方法集成决策树、随机森林、XGBoost、LightGBM等四种机器学习算法,通过调节最优风险等级阈值、时间滞后、分季节等手段优化模型性能。
      结果  随机森林建模预测效果好于决策树及XGBoost、LightGBM;基于四个子模型进行多模型集成,相比随机森林模型,在易发、多发等级的预报能力提升约13%;当选择滞后时间为2~3 d,且分季节建模后,模型预测能力有进一步提升。
      结论  综合考虑多种气象因子、环境因子和花粉因素的多模型集成方法可应用于哮喘疾病的气象预测业务和服务。

       

      Abstract:
      Objective  To establish a prediction model for asthma risk by integrating four machine learning algorithms, and provide a basis for healthy weather forecast services and public defense.
      Methods  The daily medical data of asthma patients from 2012 to 2018 were collected from a grade A tertiary hospital in Tianjin, as well as meteorological, environmental, and pollen data during the same period of time. A principal component analysis was used to select the optimal factors, and the Stacking integrated learning method was used to integrate the four machine learning algorithms of Decision Tree, Random Forest, XGBoost, and LightGBM. Model performance was optimized by adjusting the optimal risk level threshold, time lag, and seasonality.
      Results  Random forest modeling had a better predictive effect than Decision Tree, XGBoost, and LightGBM. Multi-model integration was performed based on the four sub-models, and compared with the Random Forest model, the integrated model was improved by about 13% in its forecasting ability for the grades of easy occurrence and multiple occurrence. In case of a time lag of 2-3 days and modeling for different seasons, the predictive ability of the model was further improved.
      Conclusion  The multi-model integration method that comprehensively considers various meteorological, environmental, and pollen factors can be applied to the meteorological forecasting business and services of asthma disease.

       

    /

    返回文章
    返回