Abstract:
Objective To construct a stroke risk prediction model based on the random forest algorithm using the data of hospitalized patients with stroke in Nanjing, China, and identify risk factors for stroke, and to provide more references for early intervention and clinical treatment of the disease.
Methods A total of 5 357 hospitalized patients at a grade A tertiary hospital in Nanjing from May 2014 to December 2022 were included. Among them, there were 3 104 patients with a discharge diagnosis of stroke from the department of neurology (stroke group) and 2 253 non-stroke patients treated during the same period (control group). The cases were randomly divided at an 8∶2 ratio into training set (4 285 cases) and test set (1 072 cases). Demographic data, clinical and laboratory indicators, meteorological data, and environmental data were incorporated to identify stroke-associated risk factors. The model was optimized through 5-fold cross-validation and parameter tuning. The predictive performance of the model was assessed according to accuracy, precision, recall rate, F1 score, and the area under the curve (AUC). Shapley additive explanations values were used for feature quantification and attribution.
Results The top ten risk factors for stroke were systolic blood pressure, age, glucose, neutrophil count, albumin, potassium, diastolic blood pressure, total protein, total cholesterol, and apolipoprotein A1. The performance parameters of the random forest-based stroke risk prediction model were as followed: accuracy, 0.78; precision, 0.76; F1 score, 0.72; recall rate, 0.69; and AUC, 0.85.
Conclusion The random forest-based prediction model can assist in early identification and capture of patients with stroke to provide key information for timely intervention measures, which has ideal application value.