苏宇腾, 吕世云, 谢文涵, 李元, 欧阳逸馨, 薛咏茜, 胡美玲, 李舒婷, 周航, 刘相佟. 基于LASSO回归与随机森林算法的2型糖尿病发病风险因素分析[J]. 环境卫生学杂志, 2023, 13(7): 485-495. DOI: 10.13421/j.cnki.hjwsxzz.2023.07.002
    引用本文: 苏宇腾, 吕世云, 谢文涵, 李元, 欧阳逸馨, 薛咏茜, 胡美玲, 李舒婷, 周航, 刘相佟. 基于LASSO回归与随机森林算法的2型糖尿病发病风险因素分析[J]. 环境卫生学杂志, 2023, 13(7): 485-495. DOI: 10.13421/j.cnki.hjwsxzz.2023.07.002
    SU Yu-teng, LYU Shi-yun, XIE Wen-han, LI Yuan, OU'YANG Yi-xin, XUE Yong-xi, HU Mei-ling, LI Shu-ting, ZHOU Hang, LIU Xiang-tong. A risk factor analysis for type 2 diabetes mellitus based on LASSO regression and random forest algorithm[J]. Journal of Environmental Hygiene, 2023, 13(7): 485-495. DOI: 10.13421/j.cnki.hjwsxzz.2023.07.002
    Citation: SU Yu-teng, LYU Shi-yun, XIE Wen-han, LI Yuan, OU'YANG Yi-xin, XUE Yong-xi, HU Mei-ling, LI Shu-ting, ZHOU Hang, LIU Xiang-tong. A risk factor analysis for type 2 diabetes mellitus based on LASSO regression and random forest algorithm[J]. Journal of Environmental Hygiene, 2023, 13(7): 485-495. DOI: 10.13421/j.cnki.hjwsxzz.2023.07.002

    基于LASSO回归与随机森林算法的2型糖尿病发病风险因素分析

    A risk factor analysis for type 2 diabetes mellitus based on LASSO regression and random forest algorithm

    • 摘要:
      目的 基于LASSO回归与随机森林算法分析2型糖尿病的发病风险因素,为临床决策提供参考。
      方法 以中国健康与养老追踪调查数据库2011年和2015年的数据进行队列研究,调查45岁及以上人群2型糖尿病的风险因素,共纳入3 803名研究对象。研究以人口学变量、生活习惯、血生化指标、2010—2015年气象数据、空气质量监测数据为自变量,糖尿病结局为因变量,采用LASSO回归与随机森林重要性排序进行特征变量选择,构建随机森林预测模型并进行效能评价。
      结果 经随机森林算法分析,空腹血糖、相对湿度、腰围、体质指数、黑碳、硝酸盐、风速、总胆固醇、温度和供暖燃料是2型糖尿病排名前10的重要危险因素。结合LASSO回归与随机森林变量重要性排序构建预测模型。经十折交叉验证,模型灵敏度为62.1%、特异度为98.8%、准确度为95.4%、阳性预测值为89.6%、阴性预测值为96.0%,AUC达84.8%。决策曲线结果显示,模型在阈值0~0.85范围内具有较高的净收益率。
      结论 高龄、女性、肥胖情况、血检监测数据异常、高血压或卒中病史和环境污染物暴露可能指示2型糖尿病的发生发展,为临床医生对糖尿病高危人群的早期干预提供一定参考依据。

       

      Abstract:
      Objective To analyze the risk factors for type 2 diabetes mellitus based on LASSO regression and random forest algorithm and to provide reference to clinical decision-making.
      Methods A cohort study was conducted based on the 2011 and 2015 data of China Health and Retirement Longitudinal Survey database to investigate the risk factors for type 2 diabetes mellitus in adults aged 45 years and above. A total of 3 803 participants were included. LASSO regression and random forest importance ranking were performed for feature variable selection by demographic variables, living habits, blood biochemical indicators, meteorological data and air quality monitoring data from 2010 to 2015, as the independent variables, and diabetes outcome as the dependent variable. A random forest prediction model was constructed and its performance was evaluated.
      Results Random forest analysis showed that fasting glucose, relative humidity, waist circumference, body mass index, black carbon, nitrates, wind speed, total cholesterol, temperature, and heating fuel were the top 10 risk factors for type 2 diabetes mellitus. A prediction model was constructed by combining LASSO regression with random forest variable importance ranking. Ten-fold cross-validation showed 62.1% sensitivity, 98.8% specificity, 95.4% accuracy, 89.6% positive predictive value, 96.0% negative predictive value, and 84.8% area under the receiver operating characteristic curve. The decision curve showed that the prediction model had a relatively high net benefit rate with the threshold values between 0 and 0.85.
      Conclusion Advanced age, female, obesity, abnormal blood test monitoring data, history of hypertension or stroke, and exposure to environmental pollutants may indicate the occurrence and progression of type 2 diabetes mellitus, which can provide clinicians with reference for early intervention of people at high risk for diabetes mellitus.

       

    /

    返回文章
    返回