检验医学 ›› 2024, Vol. 39 ›› Issue (7): 668-672.DOI: 10.3969/j.issn.1673-8640.2024.07.009

• 论著 • 上一篇    下一篇

基于机器学习算法构建血常规指标肺结核诊断模型

黄莺1, 周颖1(), 宋云霄2(), 茅俊杰1, 管超1, 赵金燕1, 倪佩青1   

  1. 1.上海市徐汇区中心医院全科医学科,上海 200237
    2.上海市徐汇区中心医院检验科,上海 200237
  • 收稿日期:2024-01-06 修回日期:2024-04-16 出版日期:2024-07-30 发布日期:2024-07-31
  • 通讯作者: 周 颖,E-mail:yingzhouwui@hotmail.com;宋云霄,E-mail:xzxsh@sina.com
  • 作者简介:黄 莺,女,1976年生,副主任医师,主要从事全科临床诊治工作。
  • 基金资助:
    江苏大学医教协同创新基金(JDYY2023091)

Pulmonary tuberculosis diagnosis model for blood routine test based on machine learning algorithms

HUANG Ying1, ZHOU Ying1(), SONG Yunxiao2(), MAO Junjie1, GUAN Chao1, ZHAO Jinyan1, NI Peiqing1   

  1. 1. Department of General Practice,Shanghai Xuhui Central Hospital,Shanghai 200237,China
    2. Department of Clinical Laboratory,Shanghai Xuhui Central Hospital,Shanghai 200237,China
  • Received:2024-01-06 Revised:2024-04-16 Online:2024-07-30 Published:2024-07-31

摘要:

目的 基于机器学习算法,利用血常规检验数据构建肺结核诊断模型,并分析其临床应用价值。方法 选取2019年1月—2022年12月上海市徐汇区中心医院469例初诊肺结核患者(肺结核组),以年龄、性别相匹配的506名体检健康者作为正常对照组。收集所有研究对象22项血常规检验数据和人口学参数。采用LASSO回归分析评估共线性。将数据集随机分为训练集(75%,用于机器学习模型构建)和测试集(25%,用于模型性能评估)。采用分布式随机森林(DRF)、深度学习、梯度提升机和广义线性模型这4种机器学习算法进行测试,采用5倍交叉法进行验证。采用受试者工作特征(ROC)曲线评估模型的诊断效能。结果 基于Logistic回归分析和LASSO回归分析结果进行模型特征重要性排序,共筛选出10个非共线性指标,结果显示,DRF是构建肺结核诊断的最佳机器学习算法。在训练集和测试集中,DRF模型的曲线下面积分别为0.992 1和0.847 4,敏感性分别为99.16%和92.04%,特异性分别为80.91%和55.22%,准确度分别为89.84%和72.06%。结论 基于机器学习算法构建的血常规检验数据肺结核诊断模型是一个有效的诊断工具,但其临床应用价值需要进一步验证。

关键词: 机器学习, 诊断模型, 肺结核, 血常规

Abstract:

Objective To construct a pulmonary tuberculosis diagnosis model based on machine learning algorithms for blood routine test,and to analyze its clinical application value. Methods Totally,469 newly diagnosed patients with pulmonary tuberculosis (pulmonary tuberculosis group) from Shanghai Xuhui Central Hospital from January 2019 to December 2022 were enrolled,and 506 healthy subjects matched by age and sex were enrolled as healthy control group. The data of 22 blood routine test items and demographic parameters of all the subjects were collected. The collinearity was analyzed by LASSO regression analysis. The datum set was randomly divided into 2 parts:75% was used as the training set for the construction of the machine learning model;25% was used as the test set for the performance evaluation of the model. Four machine learning algorithms,distributed random forest (DRF),deep learning,gradient elevator and generalized linear model,were used to test the model,and the diagnostic efficiency of the model was verified by 5-fold crossover method. The diagnostic performance of the model was evaluated by receiver operating characteristic (ROC) curve. Results Based on Logistic regression analysis and LASSO regression analysis,10 non-collinear indicators were selected. DRF was the opitmal machine learning algorithm for the construction of pulmonary tuberculosis diagnosis. In the training set and test set,the areas under curves of the DRF model were 0.992 1 and 0.847 4,the sensitivities were 99.16% and 92.04%,the specificities were 80.91% and 55.22%,and the accuracies were 89.84% and 72.06%,respectively. Conclusions The pulmonary tuberculosis diagnosis model based on machine learning algorithm is an effective diagnostic tool,but its clinical application value needs to be further verified.

Key words: Machine learning, Diagnostic model, Pulmonary tuberculosis, Blood routine test

中图分类号: