检验医学 ›› 2025, Vol. 40 ›› Issue (12): 1190-1196.DOI: 10.3969/j.issn.1673-8640.2025.12.009

• 论著 • 上一篇    下一篇

基于血液分析仪参数构建急性早幼粒细胞白血病机器学习预警模型

常楠, 魏亚丽, 路其凤, 李钿, 侯婷婷, 李远, 朱梦羽, 沈亚娟()   

  1. 山东第一医科大学附属省立医院临床医学检验部,山东 济南 250021
  • 收稿日期:2025-05-06 修回日期:2025-10-29 出版日期:2025-12-30 发布日期:2025-12-26
  • 通讯作者: 沈亚娟,E-mail:shenyajuanchina@126.com
  • 作者简介:常 楠,女,1989年生,硕士,主管技师,主要从事临床血液体液学人工智能研究。
  • 基金资助:
    山东省自然科学基金面上项目(ZR2021MH295);山东第一医科大学附属省立医院横向课题(1665387000050)

A machine learning early warning model for acute promyelocytic leukemia based on blood cell analyzer parameters

CHANG Nan, WEI Yali, LU Qifeng, LI Tian, HOU Tingting, LI Yuan, ZHU Mengyu, SHEN Yajuan()   

  1. Department of Clinical Laboratory,Shandong Provincial Hospital,Shandong First Medical University,Jinan 250021,Shandong,China
  • Received:2025-05-06 Revised:2025-10-29 Online:2025-12-30 Published:2025-12-26

摘要:

目的 基于血液分析仪参数构建急性早幼粒细胞白血病(APL)机器学习(ML)预警模型,并分析APL特有的参数特征,为临床早期诊断APL提供参考。 方法 收集2018年1月—2024年9月山东第一医科大学附属省立医院958例血液肿瘤[APL、其他急性髓系白血病(AML)、淋巴组织肿瘤]患者和985名健康对照者血液分析仪检测数据。将2018年1月—2023年12月的数据纳入建模集,2024年1—9月的数据纳入外部验证集。通过拉索、支持向量机递归特征消除、随机森林3种算法筛选建模参数。分别采用轻量梯度提升机、支持向量机、多层感知器、多项逻辑回归、随机森林算法构建APL预警模型。根据受试者工作特征曲线的曲线下面积(AUC)筛选最佳模型,并进行十折交叉验证和外部验证。通过SHAP解释模型特征贡献度。 结果 5种ML模型中,轻量梯度提升机算法构建的APL预警模型在测试集中综合性能最优(AUC为0.974),其在测试集中预测APL的AUC为0.976,精确率-召回率曲线下面积(PR-AUC)为0.906;十折交叉验证中的AUC和PR-AUC均>0.835。在外部验证集中,该模型预测APL的AUC为0.969、PR-AUC为0.718。SHAP解释结果显示,血小板(PLT)计数、单核细胞区域侧向散射光强度(MO_X)、平均红细胞血红蛋白浓度(MCHC)对APL预警模型预测贡献度最为显著。 结论 基于血液分析仪参数采用轻量梯度提升机算法构建的APL预警模型能有效支持APL的早期识别和诊断。

关键词: 血液分析仪参数, 急性早幼粒细胞白血病, 机器学习, 早期预警模型, 血液肿瘤

Abstract:

Objective To construct a machine learning(ML) early warning model for acute promyelocytic leukemia(APL) based on blood cell analyzer parameters,and to analyze the specific parameter characteristics of APL to provide a reference for the early clinical diagnosis of APL. Methods Blood cell analyzer data from 958 patients with APL,other acute myeloid leukemia(AML) and lymphoid tissue tumors,as well as 985 healthy subjects,were collected from January 2018 to September 2024 at Shandong Provincial Hospital of Shandong First Medical University. The data from January 2018 to December 2023 were collected in the modeling set,and the data from January 2024 to September 2024 were collected in the external validation set. Three algorithms,including Lasso,support vector machine recursive feature elimination and random forest,were used to screen the modeling parameters. Five algorithms,including light gradient boosting machine,support vector machine,multi-layer perceptron,multinomial logistic regression and random forest,were used to construct the APL early warning model. Receiver operating characteristic curve was drawn,and the model with optimal area under curve(AUC) was selected. Ten-fold cross-validation and external validation were performed. The contribution of model features was explained using SHAP. Results Among the 5 ML models,the APL early warning model constructed by the light gradient boosting machine had the optimal overall performance in the test set(AUC of 0.974). Its AUC for predicting APL in the test set was 0.976,and the area under the precision-recall curve(PR-AUC) was 0.906. The AUC and PR-AUC in ten-fold cross-validation were both >0.835. The AUC for predicting APL in the external validation set was 0.969,and the PR-AUC was 0.718. The SHAP explanation results showed that platelet(PLT) count,side scatter intensity in the monocyte region(MO_X) and mean corpuscular hemoglobin concentration(MCHC) had the most significant contribution to the APL early warning model. Conclusions The APL early warning model constructed based on blood cell analyzer parameters and the light gradient boosting machine algorithm can effectively support the early identification and diagnosis of APL.

Key words: Blood cell analyzer parameter, Acute promyelocytic leukemia, Machine learning, Early warning model, Hematologic malignancy

中图分类号: