检验医学 ›› 2024, Vol. 39 ›› Issue (12): 1190-1195.DOI: 10.3969/j.issn.1673-8640.2024.12.010

• 论著 • 上一篇    下一篇

利用机器学习算法初步构建基于常规检验项目的脑卒中复发预测模型

沈展1, 卞晓波2, 黄莺1, 汪思阳1, 沈婷婷1, 张娴1, 宋云霄2, 谢连红1()   

  1. 1.上海市徐汇区中心医院老年科,上海 200237
    2.上海市徐汇区中心医院检验科,上海 200237
  • 收稿日期:2024-04-30 修回日期:2024-09-11 出版日期:2024-12-30 发布日期:2025-01-06
  • 通讯作者: 谢连红,E-mail:xielianhongly@163.com。
  • 作者简介:沈 展,女,1979年生,副主任医师,主要从事临床老年病诊治工作;
    卞晓波,女,1993年生,主管技师,主要从事临床生物化学和免疫学检验工作。第一联系人:沈展和卞晓波对本研究具有同等贡献,并列为第一作者。

Stroke recurrence prediction model based on machine learning algorithms using routine blood test

SHEN Zhan1, BIAN Xiaobo2, HUANG Ying1, WANG Siyang1, SHEN Tingting1, ZHANG Xian1, SONG Yunxiao2, XIE Lianhong1()   

  1. 1. Geriatrics Department,Shanghai Xuhui District Central Hospital,Shanghai 200237,China
    2. Department of Clinical Laboratory ,Shanghai Xuhui District Central Hospital,Shanghai 200237,China
  • Received:2024-04-30 Revised:2024-09-11 Online:2024-12-30 Published:2025-01-06

摘要:

目的 利用机器学习算法初步构建基于常规检验项目的脑卒中复发预测模型。方法 选取2010年1月—2023年12月上海市徐汇区中心医院脑卒中患者437例。对所有患者进行回顾性随访,随访期间再次发生脑卒中的患者纳入卒中复发组,未再次发生脑卒中的患者纳入卒中未复发组,按7∶3的比例随机分为训练集和验证集。检测所有患者初次发生脑卒中时的血脂和血常规。在训练集中采用5×交叉验证方法构建预测模型,机器学习算法包括随机森林(RF)算法、XGboost算法、Adaboost算法、K近邻(KNN)算法和Logistic回归(LR)算法。采用受试者工作特征(ROC)曲线和精确率-召回率曲线评估预测模型判断脑卒中复发的效能。结果 437例脑卒中患者的平均随访时间为6.2年,有184例患者再次发生脑卒中。在训练集中,卒中复发组红细胞(RBC)计数、血红蛋白(Hb)、红细胞平均体积(MCV)、淋巴细胞绝对数(LYMPH#)、总胆固醇(TC)和三酰甘油(TG)均高于卒中未复发组(P<0.05),其他指标2个组差异均无统计学意义(P>0.05)。在验证集中,卒中复发组RBC计数、Hb、MCV、TC和TG均高于卒中未复发组(P<0.05),其他指标2个组差异均无统计学意义(P>0.05)。在训练集中,XGboost算法判断脑卒中复发的ROC曲线的曲线下面积(AUC)和精确率-召回率曲线的曲线下面积(PRAUC)均高于RF算法、Adaboost算法、KNN算法和LR算法。在验证集中,XGboost算法构建的预测模型判断脑卒中复发的AUC为0.86,PRAUC为0.82。结论 基于血脂和血常规项目构建的脑卒中复发预测模型具有较好的临床应用价值。

关键词: 血脂, 血常规, 机器学习, 预测模型, 脑卒中, 复发

Abstract:

Objective To construct a prediction model for stroke recurrence based on machine learning algorithms using routine laboratory tests. Methods A total of 437 stroke patients admitted to Shanghai Xuhui District Central Hospital from January 2010 to December 2023 were retrospectively followed up. Patients with stroke recurrence during the follow-up period were classified as recurrence group,while those without stroke recurrence were classified as non-recurrence group. The dataset was randomly divided into a training set and a validation set in a 7∶3 ratio. Blood lipid and routine blood test parameters at the initial stroke occurrence were collected. A 5-fold cross-validation method was used to develop prediction model in the training set based on machine learning algorithms including random forest(RF),XGboost,Adaboost,K-nearest neighbors(KNN) and Logistic regression(LR). The predictive performance of stroke recurrence prediction model was evaluated using receiver operating characteristic(ROC) curves and precision-recall(PR) curves. Results The average follow-up duration for the 437 stroke patients was 6.2 years,which 184 patients experienced stroke recurrence. In the training set,red blood cell(RBC) count,hemoglobin(Hb),mean corpuscular volume(MCV),the absolute value of lymphocytes(LYMPH#),total cholesterol(TC) and triglyceride(TG) were higher in recurrence group than those in non-recurrence group(P<0.05). The other parameters showed no statistical significance(P>0.05). In the validation set,RBC count,Hb,MCV,TC and TG were higher in recurrence group(P<0.05),with no statistical significance observed in the other parameters(P>0.05). In the training set,the XGboost demonstrated superior performance in predicting stroke recurrence,with higher areas under curves(AUC) and the area under precision-recall curve(PRAUC) compared to RF,Adaboost,KNN and LR. In the validation set,the prediction model constructed using XGboost achieved an AUC of 0.86 and a PRAUC of 0.82. Conclusions The stroke recurrence prediction model based on blood lipid and routine blood test parameters demonstrates promising clinical application value.

Key words: Blood Lipid, Routine blood test, Machine learning, Prediction model, Stroke, Recurrence

中图分类号: