SU Yu-teng, LYU Shi-yun, XIE Wen-han, LI Yuan, OU'YANG Yi-xin, XUE Yong-xi, HU Mei-ling, LI Shu-ting, ZHOU Hang, LIU Xiang-tong. A risk factor analysis for type 2 diabetes mellitus based on LASSO regression and random forest algorithm[J]. Journal of Environmental Hygiene, 2023, 13(7): 485-495. DOI: 10.13421/j.cnki.hjwsxzz.2023.07.002
    Citation: SU Yu-teng, LYU Shi-yun, XIE Wen-han, LI Yuan, OU'YANG Yi-xin, XUE Yong-xi, HU Mei-ling, LI Shu-ting, ZHOU Hang, LIU Xiang-tong. A risk factor analysis for type 2 diabetes mellitus based on LASSO regression and random forest algorithm[J]. Journal of Environmental Hygiene, 2023, 13(7): 485-495. DOI: 10.13421/j.cnki.hjwsxzz.2023.07.002

    A risk factor analysis for type 2 diabetes mellitus based on LASSO regression and random forest algorithm

    • Objective To analyze the risk factors for type 2 diabetes mellitus based on LASSO regression and random forest algorithm and to provide reference to clinical decision-making.
      Methods A cohort study was conducted based on the 2011 and 2015 data of China Health and Retirement Longitudinal Survey database to investigate the risk factors for type 2 diabetes mellitus in adults aged 45 years and above. A total of 3 803 participants were included. LASSO regression and random forest importance ranking were performed for feature variable selection by demographic variables, living habits, blood biochemical indicators, meteorological data and air quality monitoring data from 2010 to 2015, as the independent variables, and diabetes outcome as the dependent variable. A random forest prediction model was constructed and its performance was evaluated.
      Results Random forest analysis showed that fasting glucose, relative humidity, waist circumference, body mass index, black carbon, nitrates, wind speed, total cholesterol, temperature, and heating fuel were the top 10 risk factors for type 2 diabetes mellitus. A prediction model was constructed by combining LASSO regression with random forest variable importance ranking. Ten-fold cross-validation showed 62.1% sensitivity, 98.8% specificity, 95.4% accuracy, 89.6% positive predictive value, 96.0% negative predictive value, and 84.8% area under the receiver operating characteristic curve. The decision curve showed that the prediction model had a relatively high net benefit rate with the threshold values between 0 and 0.85.
      Conclusion Advanced age, female, obesity, abnormal blood test monitoring data, history of hypertension or stroke, and exposure to environmental pollutants may indicate the occurrence and progression of type 2 diabetes mellitus, which can provide clinicians with reference for early intervention of people at high risk for diabetes mellitus.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return