检验医学 ›› 2021, Vol. 36 ›› Issue (1): 60-68.DOI: 10.3969/j.issn.1673-8640.2021.01.013

• • 上一篇    下一篇

基于TCGA SpliceSeq数据库可变剪接事件构建结肠癌患者预后风险模型

雷鸣, 郭萌月, 王若颖, 倪小梅, 石琼()   

  1. 云南省肿瘤医院 昆明医科大学第三附属医院 云南省癌症中心,云南 昆明 650118
  • 收稿日期:2019-09-22 出版日期:2021-01-30 发布日期:2021-02-05
  • 作者简介:null
    作者简介:雷 鸣,男,1978年生,硕士,主管技师,主要从事免疫学检验工作。
  • 基金资助:
    国家自然科学基金(81760426);云南省基础研究基金[2018FE001(-314)];云南省教育厅科学基金(2019J1276)

Construction of colon cancer prognostic model based on TCGA SpliceSeq alternative splicing events

LEI Ming, GUO Mengyue, WANG Ruoying, NI Xiaomei, SHI Qiong()   

  1. Medical Clinical Laboratory,The Tumor Hospital of Yunnan Province,The Third Affiliated Hospital of Kunming Medical University,Kunming 650118,Yunnan,China
  • Received:2019-09-22 Online:2021-01-30 Published:2021-02-05

摘要:

目的 通过全基因组分析构建预后风险模型,预测结肠癌(COAD)患者预后。方法 从癌症基因组计划(TCGA)数据库中下载COAD患者RNA-seq数据和临床信息;从TCGA SpliceSeq数据库下载7种类型的可变剪接事件;剪接因子(SF)数据从SpliceAid 2数据库中下载。用单因素Cox回归分析确定预后相关可变剪接事件(PASE),采用Lasso回归分析筛选变量,多因素Cox回归分析用于计算风险值并构建风险模型。用Cytoscape Reactome FI插件构建互作网络,寻找核心节点;用基因本体(GO)富集和KEGG通路进行基因功能注释和通路分析,Kaplan-Meier和受试者工作特征(ROC)曲线用于PASE风险模型的评估;用SF与其他基因的PASE构建预后相关互作网络。结果 398例COAD患者中共有9 085个基因发生了35 391次可变剪接事件,有1 811个基因发生了2 015次PASE。由8个PASE构成的预后风险模型中,以0.919作为最佳临界值将患者分为高风险组和低风险组,2组间比较差异有统计学意义(P<0.001),ROC曲线下面积是0.860(1年生存率)。在单因素Cox回归分析中,患者肿瘤浸润、淋巴结转移、远处转移、临床分期、预后风险模型都与患者总体生存时间呈显著负相关(P<0.001)。经过多因素调整后,预后风险模型依然与患者总体生存时间呈显著负相关(P<0.001)。预后风险模型中,8个PASE与其对应基因mRNA的表达量无相关性(P>0.05)。结论 通过TCGA-COAD全基因组分析了PASE对预后的影响,构建了可用于预测COAD预后风险模型。

关键词: 结肠癌, 可变剪接事件, 剪接因子, 预后

Abstract:

Objective To establish a genome-wide alternative splicing event risk model to predict the prognosis of colon adenocarcinoma(COAD). Methods COAD RNA-Seq data and clinical information were downloaded from the Cancer Genome Atlas(TCGA),and alternative splicing events were obtained from TCGA SpliceSeq database. Splice factors were downloaded from the SpliceAid 2 database. Univariate Cox regression analysis was employed to determine prognostic related alternative splicing events(PASE). Lasso regression was used to screenvariables. Multivariate Cox regression was performed to calculate risk score and constructmodels. Cytoscape Reactome FI plug-in was used to build an interactive network and find core nodes. GO and KEGG were implemented for gene function annotation and pathway analysis. Kaplan-Meier and receiver operating characteristic (ROC)curve were used for the evaluation of PASE risk model. PASE of SF and other genes were constructed to form prognosis-related interactive network. Results A total of 35 391 mRNA alternative splicing events occurred in 9 085 genes from 398 COAD patients,and 2 015 surviving PASEs occurred in 1 811 genes. In the risk model made by 8 PASE,patients were divided into high-risk group and low-risk group according to the optimal cut-off value of 0.919,and there were statistical difference between groups (P<0.001). The area under the ROC curve of the risk model was 0.860 (one year survival rate). In univariate Cox regression analysis,tumor invasion,lymph node metastasis,distant metastasis,clinical stages and risk model were significantly negatively correlated with overall survival time(P<0.001). After adjusting for multiple factors,the risk model still showed a significant negative correlation with the overall survival time of patients(P<0.001). In the risk model,there was no correlation between 8 PASE and mRNA expressions of their corresponding genes(P>0.05). Conclusion The effect of PASE on COAD prognosis is investigated by TCGA-COAD genome-wide analysis and a risk model has been constructed that can be used to predict clinical prognosis.

Key words: Colon adenocarcinoma, Alternative splicing event, Splicing factor, Prognosis

中图分类号: