检验医学 ›› 2025, Vol. 40 ›› Issue (3): 253-258.DOI: 10.3969/j.issn.1673-8640.2025.03.009

• 论著 • 上一篇    下一篇

基于多种血清肿瘤标志物的深度学习模型用于评估早期结直肠肿瘤患病风险

余佳杰1, 张智智1, 罗清琼1, 柯星2()   

  1. 1.上海市皮肤病医院检验科,上海 200443
    2.上海交通大学医学院附属新华医院检验科,上海 200092
  • 收稿日期:2024-04-02 修回日期:2024-12-26 出版日期:2025-03-30 发布日期:2025-04-10
  • 通讯作者: 柯 星,E-mail:satoshiohno@sjtu.edu.cn
  • 作者简介:余佳杰,男,1989年生,技师,学士,主要从事临床检验工作。

Predicting early colorectal tumor risk using a deep learning model based on multiple serum tumor markers

YU Jiajie1, ZHANG Zhizhi1, LUO Qingqiong1, KE Xing2()   

  1. 1. Department of Clinical Laboratory,Shanghai Dermatology Hospital,Shanghai 200443,China
    2. Department of Clinical Laboratory,Xinhua Hospital,Shanghai Jiao Tong University School of Medicine,Shanghai 200092,China
  • Received:2024-04-02 Revised:2024-12-26 Online:2025-03-30 Published:2025-04-10

摘要:

目的 利用多种血清肿瘤标志物建立基于深度学习的随机森林(RF)算法模型,并探讨该模型在评估早期结直肠肿瘤患病风险中的作用。方法 选取2023年3月—2024年2月上海交通大学医学院附属新华医院结直肠癌(CRC)患者176例(CRC组)、癌前病变患者110例(癌前病变组)、健康体检者72名(正常对照组)。检测所有研究对象血清肿瘤标志物癌胚抗原(CEA)、糖类抗原(CA)50、CA125、CA15-3、CA19-9、CA242、CA72-4、细胞角蛋白19片段(CYFRA 21-1)和甲胎蛋白(AFP)水平。将各组建模对象按7∶3的比例分别随机拆分为训练集和验证集。采用基于深度学习的RF算法构建模型。采用受试者工作特征(ROC)曲线评价模型筛查早期CRC和癌前病变的效能。结果 将CRC患者和癌前病变患者合并成结直肠肿瘤疾病组(286例)。结直肠肿瘤疾病组血清CA50、AFP、CEA、CA19-9、CA125、CA72-4、CYFRA21-1水平显著高于正常对照组(P<0.05)。通过特征处理和RF算法建模,利用CEA、CA50、CA15-3、CA242、CYFRA 21-1、CA72-4和性别这7个特征联合构建了CRC和癌前病变的综合诊断模型CR7。在训练集中,以正常对照组为对照,CR7模型诊断结直肠肿瘤疾病的曲线下面积(AUC)为0.997;在验证集中,AUC为0.931。CR7模型区分早期CRC患者和正常对照者的AUC为0.983,区分癌前病变患者和正常对照者的AUC为0.991。结论 利用RF算法构建的结直肠肿瘤疾病筛查模型为实验室辅助诊断提供了一种新的有效方法,可用于评估早期CRC和癌前病变的患病风险。

关键词: 肿瘤标志物, 深度学习, 随机森林, 早期诊断, 结直肠癌

Abstract:

Objective A random forest(RF) algorithm model based on deep learning has been established using multiple serum tumor markers to investigate the role of the model in assessing the risk of early colorectal tumors. Methods A total of 176 patients with colorectal cancer(CRC)(CRC group),110 patients with precancerous lesions(precancerous lesion group) and 72 healthy subjects (healthy control group) from Xinhua Hospital of Shanghai Jiao Tong University School of Medicine from March 2023 to February 2024 were enrolled. The levels of serum tumor markers,including carcinoembryonic antigen(CEA),carbohydrate antigen(CA) 50,CA125,CA15-3,CA19-9,CA 242,CA72-4,cytokeratin 19-fragment(CYFRA 21-1) and alpha-fetoprotein(AFP) were determined. The subjects were randomly divided into a training set and a validation set at a ratio of 7∶3. A model was constructed using the RF algorithm based on deep learning. The performance of the model in screening CRC and precancerous lesions was evaluated by receiver operating characteristic(ROC) curve. Results A cohort of 286 patients,comprising patients diagnosed with CRC and those with precancerous lesions,were collectively categorized as colorectal tumor group. Serum levels of CA50,AFP,CEA,CA19-9,CA125,CA72-4 and CYFRA21-1 in colorectal tumor group were higher than those in healthy control group(P<0.05). Utilizing feature processing and RF algorithm model,a comprehensive diagnostic model named CR7 was developed,incorporating 7 key features:CEA,CA50,CA15-3,CA242,CYFRA21-1,CA72-4 and gender. In the training set,when using healthy control group as control,the CR7 model achieved an area under curve(AUC) of 0.997 for diagnosing colorectal tumor diseases. In the validation set,the AUC was 0.931. The CR7 model demonstrated an AUC of 0.983 for differentiating early CRC patients from healthy subjects and an AUC of 0.991 for differentiating patients with precancerous lesions from healthy subjects. Conclusions The colorectal tumor disease screening model constructed by the RF algorithm provides a new and effective method for laboratory-assisted diagnosis and can be used to assess the risk of early CRC and precancerous lesions.

Key words: Tumor marker, Deep learning, Random forest, Early diagnosis, Colorectal cancer

中图分类号: