Laboratory Medicine ›› 2025, Vol. 40 ›› Issue (3): 253-258.DOI: 10.3969/j.issn.1673-8640.2025.03.009

Previous Articles     Next Articles

Predicting early colorectal tumor risk using a deep learning model based on multiple serum tumor markers

YU Jiajie1, ZHANG Zhizhi1, LUO Qingqiong1, KE Xing2()   

  1. 1. Department of Clinical Laboratory,Shanghai Dermatology Hospital,Shanghai 200443,China
    2. Department of Clinical Laboratory,Xinhua Hospital,Shanghai Jiao Tong University School of Medicine,Shanghai 200092,China
  • Received:2024-04-02 Revised:2024-12-26 Online:2025-03-30 Published:2025-04-10

Abstract:

Objective A random forest(RF) algorithm model based on deep learning has been established using multiple serum tumor markers to investigate the role of the model in assessing the risk of early colorectal tumors. Methods A total of 176 patients with colorectal cancer(CRC)(CRC group),110 patients with precancerous lesions(precancerous lesion group) and 72 healthy subjects (healthy control group) from Xinhua Hospital of Shanghai Jiao Tong University School of Medicine from March 2023 to February 2024 were enrolled. The levels of serum tumor markers,including carcinoembryonic antigen(CEA),carbohydrate antigen(CA) 50,CA125,CA15-3,CA19-9,CA 242,CA72-4,cytokeratin 19-fragment(CYFRA 21-1) and alpha-fetoprotein(AFP) were determined. The subjects were randomly divided into a training set and a validation set at a ratio of 7∶3. A model was constructed using the RF algorithm based on deep learning. The performance of the model in screening CRC and precancerous lesions was evaluated by receiver operating characteristic(ROC) curve. Results A cohort of 286 patients,comprising patients diagnosed with CRC and those with precancerous lesions,were collectively categorized as colorectal tumor group. Serum levels of CA50,AFP,CEA,CA19-9,CA125,CA72-4 and CYFRA21-1 in colorectal tumor group were higher than those in healthy control group(P<0.05). Utilizing feature processing and RF algorithm model,a comprehensive diagnostic model named CR7 was developed,incorporating 7 key features:CEA,CA50,CA15-3,CA242,CYFRA21-1,CA72-4 and gender. In the training set,when using healthy control group as control,the CR7 model achieved an area under curve(AUC) of 0.997 for diagnosing colorectal tumor diseases. In the validation set,the AUC was 0.931. The CR7 model demonstrated an AUC of 0.983 for differentiating early CRC patients from healthy subjects and an AUC of 0.991 for differentiating patients with precancerous lesions from healthy subjects. Conclusions The colorectal tumor disease screening model constructed by the RF algorithm provides a new and effective method for laboratory-assisted diagnosis and can be used to assess the risk of early CRC and precancerous lesions.

Key words: Tumor marker, Deep learning, Random forest, Early diagnosis, Colorectal cancer

CLC Number: