Application of a machine learning model based on routine inflammatory markers to distinguish the severity of community-acquired pneumonia

Abstract

Abstract:

Objective To develop and validate a machine learning model for assessing the severity of community-acquired pneumonia（CAP） based on 11 routinely used clinical inflammatory markers. Methods A total of 3 674 patients with newly diagnosed CAP at Xuhui District Central Hospital between January 2016 and November 2024 were retrospectively enrolled. The patients were classified into a training cohort（January 2016-December 2021，1 363 mild cases and 1 320 severe cases） and a validation cohort（January 2022-November 2024，563 mild cases and 428 severe cases） according to the time of diagnosis. The clinical data and the determination results of 11 inflammatory markers were collected for each patient. Six machine learning algorithms-including decision tree（DT），K-nearest neighbors（KNN），Logistic regression（LR），random forest（RF），support vector machine（SVM） and extreme gradient boosting（XGBoost） were used to the training cohort to construct models for distinguishing mild from severe CAP. Using receiver operating characteristic（ROC） curve，the optimal model was selected based on the area under curve（AUC），and it was further validated in the validation cohort. Results Significant differences were observed between mild and severe CAP groups in both the training and validation cohorts with respect to white blood cell（WBC）count，platelet（PLT）count，the absolute value of neutrophils（NEUT#），the absolute value of lymphocytes（LYMPH#），the absolute value of monocytes（MO#），neutrophil-to-lymphocyte ratio（NLR），platelet-to-lymphocyte ratio（PLR），lymphocyte-to-monocyte ratio（LMR），systemic immune-inflammation index（SII），C-reactive protein（CRP） and procalcitonin（PCT）（P<0.001）. No statistically significant differences were found for the other indicators（P>0.05）. Among the 6 models，XGBoost demonstrated the optimal performance in the training cohort with an AUC of 0.95 and an accuracy of 89%. In the validation cohort，XGBoost achieved an AUC of 0.91 for distinguishing mild and severe CAP，an accuracy of 86%，a sensitivity of 81% and a specificity of 90%. Subgroup analysis showed that XGBoost achieved an AUC of 0.92 for distinguishing severity in bacterial CAP and 0.90 in viral CAP. Conclusions The XGBoost model based on routine inflammatory markers can effectively differentiate between mild and severe CAP，offering a practical tool for clinical severity assessment.

Key words: Inflammatory marker, Community-acquired pneumonia, Disease severity, Machine learning, Model

CLC Number:

R446.1

GUAN Chao, HUANG Ying, SONG Yunxiao, ZHOU Ying. Application of a machine learning model based on routine inflammatory markers to distinguish the severity of community-acquired pneumonia[J]. Laboratory Medicine, 2025, 40(7): 680-686.

Figures/Tables 10

References 18

[1]	ALIBERTI S, DELA CRUZ C S, AMATI F, et al. Community-acquired pneumonia[J]. Lancet, 2021, 398(10303):906-919. DOI PMID
[2]	VAUGHN V M, DICKSON R P, HOROWITZ J K, et al. Community-acquired pneumonia:a review[J]. JAMA, 2024, 332(15):1282-1295.
[3]	EWIG S, BIRKNER N, STRAUSS R, et al. New perspectives on community-acquired pneumonia in 388 406 patients. Results from a nationwide mandatory performance measurement programme in healthcare quality[J]. Thorax, 2009, 64(12):1062-1069. DOI PMID
[4]	CHALMERS J D, SINGANAYAGAM A, AKRAM A R, et al. Severity assessment tools for predicting mortality in hospitalised patients with community-acquired pneumonia. Systematic review and meta-analysis[J]. Thorax, 2010, 65(10):878-883. DOI PMID
[5]	ZHU F M, XU J, HE Q Y, et al. Association of serum interleukin-2 with severity and prognosis in hospitalized patients with community-acquired pneumonia:a prospective cohort study[J]. Intern Emerg Med, 2024, 19(7):1929-1939.
[6]	WEN X, LENG P, WANG J, et al. Clinlabomics:leveraging clinical laboratory data by data mining strategies[J]. BMC Bioinformatics, 2022, 23(1):387.
[7]	ELLIS H L, WAN B, YEUNG M, et al. Complementing chronic frailty assessment at hospital admission with an electronic frailty index(FI-Laboratory)comprising routine blood test results[J]. CMAJ, 2020, 192(1):E3-E8.
[8]	OSMAN A D, HOWELL J, YEOH M, et al. Benefits of emergency department routine blood test performance on patients whose allocated triage category is not time critical:a retrospective study[J]. BMC Health Serv Res, 2024, 24(1):1252.
[9]	METLAY J P, WATERER G W, LONG A C, et al. Diagnosis and treatment of adults with community-acquired pneumonia. An official clinical practice guideline of the American Thoracic Society and Infectious Diseases Society of America[J]. Am J Respir Crit Care Med, 2019, 200(7):e45-e67.
[10]	TOGUN T, HOGGART C J, AGBLA S C, et al. A three-marker protein biosignature distinguishes tuberculosis from other respiratory diseases in Gambian children[J]. EBioMedicine, 2020,58:102909.
[11]	WANG X, JIAO J, WEI R, et al. A new method to predict hospital mortality in severe community acquired pneumonia[J]. Eur J Intern Med, 2017,40:56-63.
[12]	LIU J X, BAI J S, ZHANG Q, et al. A new prediction model for prolonged hospitalization in adult community-acquired pneumonia(CAP)patients[J]. Clin Lab, 2022, 68(11):2271-2277.
[13]	ZHENG X, HUANG Z, WANG D, et al. A new haematological model for the diagnosis and prognosis of severe community-acquired pneumonia:a single-center retrospective study[J]. Ann Transl Med, 2022, 10(16):881.
[14]	曾瑞璜, 王小林, 曾叶, 等. 基于数据挖掘模型分析CA125、NLR、PLR、hs-CRP联合检测对社区获得性肺炎伴胸腔积液的临床意义[J]. 检验医学, 2020, 35(11):1103-1107. DOI
[15]	WITTERMANS E, VAN DE GARDE E M, VOORN G P, et al. Neutrophil count,lymphocyte count and neutrophil-to-lymphocyte ratio in relation to response to adjunctive dexamethasone treatment in community-acquired pneumonia[J]. Eur J Intern Med, 2022,96:102-108.
[16]	孙康德, 虞中敏, 严育忠. 不同感染指标在细菌性血流感染早期诊断和预后评估中的价值[J]. 检验医学, 2024, 39(3):222-226. DOI
[17]	朱红, 朱宇清, 顾国宝. 正五聚体蛋白3在社区获得性肺炎临床诊断中的意义[J]. 检验医学, 2020, 35(5):424-427. DOI
[18]	李晓烂, 何永鸿, 邓俊, 等. PCT、NLR、CAR对社区获得性重症肺炎患者短期预后的预测价值[J]. 重庆医学, 2025, 54(1):86-90.

组别	例数	年龄/岁	性别		吸烟史/例	饮酒史/例	糖尿病史/例	高血压史/例	心力衰竭史/例
组别	例数	年龄/岁	男/例	女/例	吸烟史/例	饮酒史/例	糖尿病史/例	高血压史/例	心力衰竭史/例
轻症组	1 363	67.35±11.45	415	948	502	260	368	143	75
重症组	1 320	67.29±12.56	415	905	478	272	358	158	79
统计值			0.129	1.811	0.106	2.091	0.007	2.920	0.077
P值			0.538	0.178	0.745	0.148	0.936	0.087	0.781
组别		肝脏疾病史/例	肾脏疾病史/例	脑血管疾病史/例	神经系统疾病史/例	WBC计数/（×1012L^-1）	PLT计数/（×109L^-1）	LYMPH#/（×109L^-1）	MO#/ （×109L^-1）
轻症组		123	145	82	20	6.49±2.19	213.02±58.37	1.86±0.79	0.37±0.14
重症组		115	140	92	18	9.62±5.00	224.19±101.99	1.19±1.12	0.54±0.45
统计值		0.823	0.165	1.372	0.234	20.910	3.460	17.580	13.120
P值		0.364	0.685	0.242	0.629	<0.001	<0.001	<0.001	<0.001
组别			NEUT#/ （×109L^-1）	PLR	NLR	LMR	SII	CRP/（mg·L^-1）	PCT/（ng·L^-1）
轻症组			4.10±1.97	114.24±27.37	2.21±0.85	5.24±1.37	469.32±115.38	5.02±1.46	0.39±0.13
重症组			7.71±4.81	188.32±23.07	6.48±1.63	2.32±0.72	1 459.74±471.26	12.46±3.21	0.47±0.16
统计值			25.410	12.110	15.450	75.870	74.050	76.700	14.190
P值			<0.001	<0.001	<0.001	<0.001	<0.001	<0.001	<0.001

组别	例数	年龄/岁	性别		吸烟史/例	饮酒史/例	糖尿病史/例	高血压史/例	心力衰竭史/例
组别	例数	年龄/岁	男/例	女/例	吸烟史/例	饮酒史/例	糖尿病史/例	高血压史/例	心力衰竭史/例
轻症组	1 363	67.35±11.45	415	948	502	260	368	143	75
重症组	1 320	67.29±12.56	415	905	478	272	358	158	79
统计值			0.129	1.811	0.106	2.091	0.007	2.920	0.077
P值			0.538	0.178	0.745	0.148	0.936	0.087	0.781
组别		肝脏疾病史/例	肾脏疾病史/例	脑血管疾病史/例	神经系统疾病史/例	WBC计数/（×1012L^-1）	PLT计数/（×109L^-1）	LYMPH#/（×109L^-1）	MO#/ （×109L^-1）
轻症组		123	145	82	20	6.49±2.19	213.02±58.37	1.86±0.79	0.37±0.14
重症组		115	140	92	18	9.62±5.00	224.19±101.99	1.19±1.12	0.54±0.45
统计值		0.823	0.165	1.372	0.234	20.910	3.460	17.580	13.120
P值		0.364	0.685	0.242	0.629	<0.001	<0.001	<0.001	<0.001
组别			NEUT#/ （×109L^-1）	PLR	NLR	LMR	SII	CRP/（mg·L^-1）	PCT/（ng·L^-1）
轻症组			4.10±1.97	114.24±27.37	2.21±0.85	5.24±1.37	469.32±115.38	5.02±1.46	0.39±0.13
重症组			7.71±4.81	188.32±23.07	6.48±1.63	2.32±0.72	1 459.74±471.26	12.46±3.21	0.47±0.16
统计值			25.410	12.110	15.450	75.870	74.050	76.700	14.190
P值			<0.001	<0.001	<0.001	<0.001	<0.001	<0.001	<0.001

组别	例数	年龄/岁	性别		吸烟史/例	饮酒史/例	糖尿病史/例	高血压史/例	心力衰竭史/例
组别	例数	年龄/岁	男/例	女/例	吸烟史/例	饮酒史/例	糖尿病史/例	高血压史/例	心力衰竭史/例
轻症组	563	63.2±12.5	166	397	212	113	145	56	25
重症组	428	65.1±11.8	126	302	160	82	105	49	21
统计值			-1.615	0.532	0.000 4	0.023	0.749	0.817	1.672
P值			0.106	0.466	0.985	0.880	0.388	0.367	0.197
组别	肝脏疾病史/例		肾脏疾病史/例	脑血管疾病史/例	神经系统疾病史/例	WBC计数/（×1012L^-1）	PLT计数/（×109L^-1）	LYMPH#/（×109L^-1）	MO#/ （×109L^-1）
轻症组	47		37	28	6	6.49±2.19	213.02±58.37	1.86±0.79	0.37±0.14
重症组	34		40	26	5	9.62±5.00	224.19±101.99	1.19±1.12	0.54±0.45
统计值	0		0.533	0.350	0.367	15.327	2.326	12.573	9.124
P值	1		0.466	0.554	0.546	<0.001	0.020	<0.001	<0.001
组别	NEUT#/ （×109L^-1）			PLR	NLR	LMR	SII	CRP/（mg·L^-1）	PCT/（ng·L^-1）
轻症组	4.10±1.97			114.50±22.45	2.20±0.56	5.03±2.12	469.6±89.21	5.09±1.44	0.37±0.12
重症组	7.71±4.81			188.40±36.88	6.48±1.89	2.20±0.96	1 452.5±322.25	12.80±3.65	0.46±0.15
统计值	17.235			43.215	52.367	28.941	7.892	41.342	10.184
P值	<0.001			<0.001	<0.001	<0.001	<0.001	<0.001	<0.001

组别	例数	年龄/岁	性别		吸烟史/例	饮酒史/例	糖尿病史/例	高血压史/例	心力衰竭史/例
组别	例数	年龄/岁	男/例	女/例	吸烟史/例	饮酒史/例	糖尿病史/例	高血压史/例	心力衰竭史/例
轻症组	563	63.2±12.5	166	397	212	113	145	56	25
重症组	428	65.1±11.8	126	302	160	82	105	49	21
统计值			-1.615	0.532	0.000 4	0.023	0.749	0.817	1.672
P值			0.106	0.466	0.985	0.880	0.388	0.367	0.197
组别	肝脏疾病史/例		肾脏疾病史/例	脑血管疾病史/例	神经系统疾病史/例	WBC计数/（×1012L^-1）	PLT计数/（×109L^-1）	LYMPH#/（×109L^-1）	MO#/ （×109L^-1）
轻症组	47		37	28	6	6.49±2.19	213.02±58.37	1.86±0.79	0.37±0.14
重症组	34		40	26	5	9.62±5.00	224.19±101.99	1.19±1.12	0.54±0.45
统计值	0		0.533	0.350	0.367	15.327	2.326	12.573	9.124
P值	1		0.466	0.554	0.546	<0.001	0.020	<0.001	<0.001
组别	NEUT#/ （×109L^-1）			PLR	NLR	LMR	SII	CRP/（mg·L^-1）	PCT/（ng·L^-1）
轻症组	4.10±1.97			114.50±22.45	2.20±0.56	5.03±2.12	469.6±89.21	5.09±1.44	0.37±0.12
重症组	7.71±4.81			188.40±36.88	6.48±1.89	2.20±0.96	1 452.5±322.25	12.80±3.65	0.46±0.15
统计值	17.235			43.215	52.367	28.941	7.892	41.342	10.184
P值	<0.001			<0.001	<0.001	<0.001	<0.001	<0.001	<0.001

模型	AUC	敏感性/%	特异性/%	阳性预测值	阴性预测值	准确率/%	F1值
DT	0.82	82	85	0.84	0.83	83	0.83
KNN	0.91	81	90	0.89	0.83	86	0.85
RF	0.93	84	90	0.89	0.86	87	0.87
XGBoost	0.95	88	89	0.89	0.88	89	0.88
SVM	0.93	83	90	0.88	0.84	86	0.86
LR	0.93	78	92	0.90	0.81	85	0.83