Construction and analysis of a lysosome-dependent cell death score-based prediction model for non-small cell lung cancer 基于溶酶体依赖性细胞死亡评分的非小细胞肺癌预测模型的构建与分析
Background Non-small cell lung cancer (NSCLC) is the most common type of tumor globally and the leading cause of cancer-related deaths. Although treatment strategies such as immune checkpoint inhibitors and chemotherapy have advanced, the heterogeneity among NSCLC patients results in significant variability in treatment outcomes. Studies have shown that certain patients respond poorly to immune checkpoint inhibitors, indicating that treatment response is closely related to multiple factors. Therefore, it is necessary to develop predictive models to stratify patients based on gene expression and clinical characteristics, aiming for precision therapy. Objective This study aims to construct a stratified prognostic model for NSCLC patients based on lysosome-dependent cell death (LDCD) scoring by integrating single-cell RNA sequencing (scRNA-seq) and bulk RNA sequencing data. By analyzing the immune-related characteristics of high-risk and low-risk groups, we further explored the impact of cell death patterns on lung cancer and identified potential therapeutic targets. Methods This study obtained single-cell RNA sequencing data and gene expression data of NSCLC patients and normal lung tissues from the GEO and TCGA databases. We used RR packages such as Seurat and CellChat for data preprocessing and analysis, and performed dimensionality reduction and visualization through Principal Component Analysis (PCA) and UMAP algorithms. LASSO regression analysis was used to construct the predictive model, followed by cross-validation and ROC curve analysis. The model’s effectiveness was validated through survival analysis and immune microenvironment analysis. Results The study showed a significant increase in the proportion of monocytes in NSCLC tissues, suggesting their important role in cancer progression. Cell communication analysis indicated that macrophages, smooth muscle cells, and myeloid cells exhibit strong intercellular communication during cancer progression. Using the constructed prognostic 背景 非小细胞肺癌 (NSCLC) 是全球最常见的肿瘤类型,也是癌症相关死亡的主要原因。尽管免疫检查点抑制剂和化疗等治疗策略已经取得进展,但 NSCLC 患者之间的异质性导致治疗结果的显著差异。研究表明,某些患者对免疫检查点抑制剂反应不佳,表明治疗反应与多种因素密切相关。因此,有必要开发预测模型,根据基因表达和临床特征对患者进行分层,以实现精准治疗。目的 本研究旨在通过整合单细胞 RNA 测序 (scRNA-seq) 和大量 RNA 测序数据,构建基于溶酶体依赖性细胞死亡 (LDCD) 评分的 NSCLC 患者分层预后模型。通过分析高危和低危人群的免疫相关特征,我们进一步探讨了细胞死亡模式对肺癌的影响,并确定了潜在的治疗靶点。方法 本研究从 GEO 和 TCGA 数据库中获取 NSCLC 患者和正常肺组织的单细胞 RNA 测序数据和基因表达数据。我们使用 Seurat 和 CellChat 等 RR 软件包进行数据预处理和分析,并通过主成分分析 (PCA) 和 UMAP 算法进行降维和可视化。采用 LASSO 回归分析构建预测模型,然后进行交叉验证和 ROC 曲线分析。该模型的有效性通过生存分析和免疫微环境分析得到验证。 结果 研究表明 NSCLC 组织中单核细胞的比例显着增加,表明它们在癌症进展中起重要作用。细胞通讯分析表明,巨噬细胞、平滑肌细胞和骨髓细胞在癌症进展过程中表现出很强的细胞间通讯。使用构建的 prognostic
model based on 12 LDCD-related genes, we found significant differences in overall survival and immune microenvironment between the high-risk and low-risk groups. 模型基于 12 个 LDCD 相关基因,我们发现高危组和低危组的总生存期和免疫微环境存在显著差异。
Keywords Non-small cell lung cancer • Lysosome-dependent cell death • Single-cell 关键词 非小细胞肺癌 • 溶酶体依赖性细胞死亡 • 单细胞
1 Introduction 1 引言
Lung cancer is the most common tumor globally and the leading cause of cancer-related deaths. Non-small cell lung cancer (NSCLC) accounts for 85%85 \% of all lung cancers. According to WHO guidelines, lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) are the most common subtypes [1, 2]. Many factors contribute to the progression of lung cancer, including age, gender, living environment, and smoking status. With ongoing research into NSCLC, treatment strategies have evolved, encompassing immune checkpoint-based immunotherapy and chemotherapy [3]. However, the heterogeneity of NSCLC patients significantly impacts treatment outcomes, with studies showing that some patients exhibit minimal response to immune checkpoint inhibitors [4]. This suggests that treatment response is closely linked to various factors, and not all NSCLC patients benefit from current treatment strategies. Therefore, it is necessary to develop predictive models for patient stratification considering gene expression and clinical characteristics. Through patient stratification, we can identify responses to different treatment strategies and implement appropriate treatments for different patient groups, aligning with the principles of precision therapy and rational drug use. 肺癌是全球最常见的肿瘤,也是癌症相关死亡的主要原因。非小细胞肺癌 (NSCLC) 占 85%85 \% 所有肺癌。根据 WHO 指南,肺腺癌 (LUAD) 和肺鳞状细胞癌 (LUSC) 是最常见的亚型 [1, 2]。许多因素会导致肺癌的进展,包括年龄、性别、生活环境和吸烟状况。随着对 NSCLC 的研究不断深入,治疗策略不断发展,包括基于免疫检查点的免疫治疗和化疗 [3]。然而,NSCLC 患者的异质性会显著影响治疗结局,研究表明一些患者对免疫检查点抑制剂的反应很小 [4]。这表明治疗反应与各种因素密切相关,并非所有 NSCLC 患者都受益于当前的治疗策略。因此,有必要开发考虑基因表达和临床特征的患者分层预测模型。通过患者分层,我们可以识别对不同治疗策略的反应,并针对不同的患者群体实施适当的治疗,符合精准治疗和合理用药的原则。
Lysosome-dependent death is a unique mode of cell death that has great significance for cellular life activities. Lysosomes, as cellular recycling centers, are filled with many hydrolytic enzymes that can degrade most cellular macromolecules. Lysosomal membrane permeabilization and the consequent leakage of lysosomal contents into the cytoplasmic lysate leads to so-called “lysosome-dependent death”. This form of cell death is mainly carried out by lysosomal organizing proteases and can have necrotic, apoptotic, or apoptosis-like features depending on the extent of leakage and the cellular context [5]. Many studies have demonstrated that lysosomal-dependent death has an important role in the therapeutic process of tumors [6], and that tumorigenesis can be inhibited by inducing lysosomal-dependent death in tumors. 溶酶体依赖性死亡是一种独特的细胞死亡模式,对细胞生命活动具有重要意义。溶酶体作为细胞回收中心,充满了许多可以降解大多数细胞大分子的水解酶。溶酶体膜透化和随之而来的溶酶体内容物泄漏到细胞质裂解物中导致所谓的“溶酶体依赖性死亡”。这种形式的细胞死亡主要由溶酶体组织蛋白酶进行,根据渗漏的程度和细胞环境,可以具有坏死、凋亡或凋亡样特征 [5]。许多研究表明,溶酶体依赖性死亡在肿瘤的治疗过程中具有重要作用 [6],并且可以通过诱导肿瘤溶酶体依赖性死亡来抑制肿瘤发生。
Single-cell RNA sequencing (scRNA-seq) reveals the highly complex cellular composition of the tumor microenvironment (TME) with high resolution [7]. This technique can uncover developmental changes and cell interaction information within tumor cells with extreme precision, providing new insights into tumor bioinformatics. It is also a powerful tool for future exploration of common characteristics and key differences among various immune cell subsets in the TME [8]. Meanwhile, machine learning, an important branch of artificial intelligence (AI), focuses on enabling computer systems to learn from data and make predictions or decisions. By developing and applying algorithms, machine learning allows computers to recognize patterns and regularities in data, thus improving and enhancing performance without explicit programming. In the biomedical field, researchers use machine learning to analyze clinical data and develop diagnostic and prognostic models for diseases [9]. By combining the critical tumor microenvironment revealed by single-cell RNA sequencing technology with machine learning, We can build stable prognostic models based on the clinical characteristics of non-small cell lung cancer (NSCLC) patients to explore the role of lysosome-dependent cell death in lung carcinogenesis.In summary, the aim of this study was to construct a prognostic model for stratifying NSCLC patients based on lysosomal-dependent cell death scores by integrating clinical features such as single-cell RNA sequencing (scRNA-seq) and bulk RNA sequencing (bulk RNA-seq) data. Specifically, we analyzed scRNA-seq and bulk RNA-seq data separately and performed detailed comparisons and analyses of high-risk and low-risk groups, such as those related to immune responses. This approach enabled us to gain deeper insights into the impact of different cell death modes on lung cancer and to identify potential therapeutic targets. 单细胞 RNA 测序 (scRNA-seq) 以高分辨率揭示了肿瘤微环境 (TME) 的高度复杂细胞组成 [7]。该技术可以极其精确地揭示肿瘤细胞内的发育变化和细胞相互作用信息,为肿瘤生物信息学提供新的见解。它也是未来探索 TME 中各种免疫细胞亚群之间的共同特征和关键差异的有力工具 [8]。同时,机器学习是人工智能 (AI) 的一个重要分支,专注于使计算机系统能够从数据中学习并做出预测或决策。通过开发和应用算法,机器学习使计算机能够识别数据中的模式和规律,从而在没有显式编程的情况下改进和增强性能。在生物医学领域,研究人员使用机器学习来分析临床数据并开发疾病的诊断和预后模型 [9]。通过将单细胞 RNA 测序技术揭示的关键肿瘤微环境与机器学习相结合,我们可以根据非小细胞肺癌 (NSCLC) 患者的临床特征构建稳定的预后模型,以探索溶酶体依赖性细胞死亡在肺癌发生中的作用。综上所述,本研究的目的是通过整合单细胞 RNA 测序 (scRNA-seq) 和批量 RNA 测序 (bulk RNA-seq) 数据等临床特征,构建基于溶酶体依赖性细胞死亡评分对 NSCLC 患者进行分层的预后模型。具体来说,我们分别分析了 scRNA-seq 和大量 RNA-seq 数据,并对高风险和低风险群体进行了详细的比较和分析,例如与免疫反应相关的人群。 这种方法使我们能够更深入地了解不同细胞死亡模式对肺癌的影响,并确定潜在的治疗靶点。
Through this multi-level data integration and analysis, we were able not only to predict the prognosis of NSCLC patients more accurately but also to reveal the specific mechanisms of lysosome-dependent cell death in lung cancer progression. This provides an important basis for the development of personalized treatment plans and helps to discover new and effective therapeutic targets, thereby improving the treatment outcomes and quality of life for NSCLC patients. In conclusion, this study offers a new perspective on the prognostic evaluation of NSCLC and provides important theoretical and practical foundations for exploring the impact of cell death modes on cancer development. 通过这种多层次的数据整合和分析,我们不仅能够更准确地预测 NSCLC 患者的预后,而且能够揭示溶酶体依赖性细胞死亡在肺癌进展中的具体机制。这为制定个性化治疗计划提供了重要依据,有助于发现新的有效治疗靶点,从而改善 NSCLC 患者的治疗结果和生活质量。综上所述,本研究为 NSCLC 的预后评估提供了新的视角,为探讨细胞死亡模式对癌症发展的影响提供了重要的理论和实践基础。
2 Methods and materials 2 方法和材料
2.1 Data collection and preprocessing 2.1 数据收集和预处理
From GEO GSE198099 data set in database (https://www.ncbi.nlm.nih.gov/) for patients with non-small cell lung cancer (GSM5938737, GSM5938738) and normal lung tissue (GSM5938739, GSM5938740) single-celled RNA sequencing data. In addition, from the TCGA database (https://portal.gdc.cancer.gov/) and GEO GSE30219 data set in the database, respectively for 585 cases and 272 cases of patients with non-small cell lung cancer gene expression profile, Their clinical characteristics such as survival status, survival time, and TMN stage are shown in Supplementary Tables 1 and 2. The combined data were batch corrected using the “ComBat” function from"limma" (PMC4402510) and “sva” R package. TCGA was used as the training set and GSE30219 as the test set. 来自 GEO GSE198099数据库 (https://www.ncbi.nlm.nih.gov/) 中非小细胞肺癌 (GSM5938737, GSM5938738) 和正常肺组织 (GSM5938739, GSM5938740) 患者单细胞 RNA 测序数据的数据集。此外,来自 TCGA 数据库 (https://portal.gdc.cancer.gov/) 和 GEO GSE30219数据库中的数据集,分别为 585 例和 272 例非小细胞肺癌基因表达谱患者,其临床特征如生存状态、生存时间和 TMN 分期显示在补充表 1 和 2 中。使用“limma” (PMC4402510) 和 “sva” R 包中的 “ComBat” 函数对组合数据进行批量校正。以 TCGA 为训练集,GSE30219 为测试集。
2.2 Processing of scRNA-seq data 2.2 scRNA-seq 数据的处理
Single-cell RNA sequencing data were read from 10X files, and a “Seurat” object was created. Cells with low quality were filtered out based on criteria of minimum 200 genes, maximum 4000 genes, and mitochondrial gene proportion of 20%. Differential expression genes were selected using the “FindVariableFeatures()” function, and a plot of these genes was generated using the “VariableFeaturePlot()” function. The data were standardized using the “ScaleData()” function to remove batch effects in gene expression levels. The top 10 differentially expressed genes were labeled on the plot for further analysis. Principal component analysis (PCA) was performed to reduce dimensionality, and highly variable genes were selected as features. Dimensionality reduction visualization was carried out using “tSNE” and “UMAP” algorithms. The “createCellChat” function creates a “CellChat” object for cell communication analysis, identifies overexpressed genes and ligand-receptor pairs, and maps ligands and receptors onto the protein-protein interaction network. 从 10X 文件中读取单细胞 RNA 测序数据,并创建一个“Seurat”对象。根据最少 200 个基因、最多 4000 个基因和 20% 的线粒体基因比例的标准过滤掉低质量的细胞。使用 “FindVariableFeatures()” 函数选择差异表达基因,并使用 “VariableFeaturePlot()” 函数生成这些基因的曲线图。使用 “ScaleData()” 函数对数据进行标准化,以消除基因表达水平的批量效应。在图上标记前 10 个差异表达基因以供进一步分析。进行主成分分析 (PCA) 以降低维度,并选择高度可变的基因作为特征。使用 “tSNE” 和 “UMAP” 算法进行降维可视化。“createCellChat” 函数创建一个 “CellChat” 对象用于细胞通讯分析,识别过表达的基因和配体-受体对,并将配体和受体映射到蛋白质-蛋白质相互作用网络上。
2.3 The identification of mononuclear cells and communication analysis 2.3 单核细胞的鉴定和通讯分析
Single cell RNA sequencing (scRNA-seq) data using the Seurat packages were analyzed, and the first to use t—are initially dimension reduction, SNE in visualization of cells depending on the type of organization. Monocytes were isolated and reclustered by scale analysis and principal component analysis (PCA). Key myeloid marker genes were identified using dot plots and cell types were annotated accordingly to obtain monocyte subsets. Non-small cell lung cancer (non-small cell lung cancer, NSCLC) organization of mononuclear cells are integrated into monocytes data set for further analysis. Using CellChat package analysis intercellular communication, mainly analyzes the interaction of secretion signal. We identified the excessive expression of genes and interaction, and calculate the probability of communication, and use the network diagram visualization. We passed the heat map analysis and visualization centricity index and the signal function, highlight the outgoing and incoming signal model. 使用 Seurat 软件包分析了单细胞 RNA 测序 (scRNA-seq) 数据,第一个使用 t—最初是降维,SNE 在细胞可视化中取决于组织类型。通过规模分析和主成分分析 (PCA) 分离单核细胞并进行重新聚集。使用点图鉴定关键髓系标记基因,并相应地注释细胞类型以获得单核细胞亚群。将单核细胞的非小细胞肺癌 (non-small cell lung cancer, NSCLC) 组织整合到单核细胞数据集中以供进一步分析。使用 CellChat 包分析细胞间通讯,主要分析分泌信号的相互作用。我们确定了基因的过度表达和相互作用,并计算了通信的概率,并使用网络图可视化。我们通过热图分析和可视化中心度指数和信号功能,突出出入信号模型。
2.4 Analysis of monocyte subpopulations 2.4 单核细胞亚群分析
Software packages such as “reshape2”, “ggplot2” and “dplyr” were used to organize and visualize the data, obtain statistical analysis of cell types and generate bar charts. Subsequently, we used the Seurat software package for dimensionality reduction, clustering, and identification of monocyte subsets. Correction mass effect, the use of “harmonious” algorithm using UMAP algorithm dimensionality of data visualization. For cell communication analysis, we constructed cell communication networks using the “CellChat” package and identified and analyzed ligand-receptor pairs. In addition, we also performed visual analysis of the topology and signaling pathways of the cellular communication network. 使用 “reshape2” 、 “ggplot2” 和 “dplyr” 等软件包对数据进行组织和可视化,获得细胞类型的统计分析并生成条形图。随后,我们使用 Seurat 软件包进行单核细胞亚群的降维、聚类和鉴定。校正质量效应,采用“和谐”算法,采用 UMAP 算法实现数据维数可视化。对于细胞通讯分析,我们使用 “CellChat” 包构建了细胞通讯网络,并识别和分析了配体-受体对。此外,我们还对蜂窝通信网络的拓扑和信号通路进行了可视化分析。
2.5 Modularization and network analysis of monocyte scRNA-seq data using "hdWGCNA" method 2.5 使用“hdWGCNA”方法对单核细胞 scRNA-seq 数据进行模块化和网络分析
We preprocessed and cleaned the raw data using the “hdWGCNA” and “Seurat” packages in R. Subsequently, we filtered genes expressed in at least 5% of cells and constructed “metacells”, followed by normalization of the “metacell” expression matrix. Next, we determined the appropriate soft power based on testing soft threshold values and constructed a coexpression network. Based on this network, we generated a dendrogram of the co-expression network and obtained the TOM matrix for subsequent advanced analysis. We also calculated the module eigengenes and performed inter-modular 我们使用 R 中的“hdWGCNA”和“Seurat”包对原始数据进行预处理和清理。随后,我们过滤了在至少 5% 的细胞中表达的基因并构建了“元细胞”,然后对“元细胞”表达矩阵进行标准化。接下来,我们根据测试软阈值确定了合适的软实力,并构建了一个共表达网络。基于这个网络,我们生成了共表达网络的树状图,并获得了 TOM 矩阵用于后续的高级分析。我们还计算了模块特征基因并进行了模块化
Fig. 1 A flow chart was used to illustrate the main ideas and design steps of the study 图 1 使用流程图说明研究的主要思想和设计步骤
connectivity analysis. Additionally, we generated module feature plots ranked by gene kME and identified hub genes. Also, we saved key analysis processes and performed various visualizations, including correlation plots between modules and “dotplot” of module features. In the end, we all chose green, blue, and the most significant difference in turquoise module of the 30 genes, a total of 90 genes as a core. 连通性分析。此外,我们生成了按基因 kME 排序的模块特征图,并确定了枢纽基因。此外,我们还保存了关键分析流程并执行了各种可视化,包括模块之间的相关图和模块特征的“点图”。最后,我们都选择了绿色、蓝色和绿松石模块中差异最显著的 30 个基因,共计 90 个基因作为核心。
2.6 Pseudo-temporal analysis of monocytes and enrichment analysis of differentially expressed genes 2.6 单核细胞的伪时间分析和差异表达基因的富集分析
In addition to modularization and network analysis, we conducted pseudo-temporal analysis to better understand the gene expression changes of monocytes across different states. Initially, leveraging single-cell transcriptomic data, we performed pseudo-temporal analysis using the “monocle” package. Through this analysis, we elucidated the distribution of monocytes along states and pseudo-time trajectories. Furthermore, we integrated the results of pseudo-temporal analysis with the previously constructed co-expression network to further explore key genes associated with pseudotemporal dynamics. Ultimately, we generated a pseudo-temporal heatmap illustrating the expression patterns of key genes associated with pseudo-temporal dynamics in monocytes. Moreover, we conducted enrichment analysis of 90 differentially expressed genes using the gprofiler website (https://biit.cs.ut.ee/gprofiler/gost). 除了模块化和网络分析外,我们还进行了伪时间分析,以更好地了解不同状态下单核细胞的基因表达变化。最初,利用单细胞转录组数据,我们使用 “monocle” 包进行伪时间分析。通过这项分析,我们阐明了单核细胞沿状态和伪时间轨迹的分布。此外,我们将伪时间分析的结果与先前构建的共表达网络相结合,以进一步探索与伪时间动力学相关的关键基因。最终,我们生成了一个伪颞热图,说明了与单核细胞中伪颞叶动力学相关的关键基因的表达模式。此外,我们使用 gprofiler 网站 (https://biit.cs.ut.ee/gprofiler/gost) 对 90 个差异表达基因进行了富集分析。
2.7 Using LASSO analysis to obtain model 12-LDCDRGs and validation 2.7 使用 LASSO 分析获得模型 12-LDCDRGs 并验证
From the results of the “hdWGCNA” analysis, 150 differentially expressed genes were obtained. The “GetHubGenes” function was utilized to retrieve important genes. Genes were then selected based on module specification, followed by single-factor logistic regression to identify significant genes. Single-factor Cox proportional hazards regression analysis was conducted on candidate genes in the training set to select feature genes associated with prognosis. Variables with pp-values < 0.05<0.05 were included in the least absolute shrinkage and selection operator (LASSO) regression analysis. This 从 “hdWGCNA” 分析结果中,获得了 150 个差异表达基因。“GetHubGenes” 函数用于检索重要基因。然后根据模块规格选择基因,然后进行单因素 logistic 回归以识别重要基因。对训练集中的候选基因进行单因素 Cox 比例风险回归分析,以选择与预后相关的特征基因。具有 -values < 0.05<0.05 的 pp 变量包含在最小绝对收缩和选择运算符 (LASSO) 回归分析中。这
analysis was performed using the “glmnet” package in R (PMC2929880) to reduce the number of genes in the final risk model. The prognosis model was constructed based on the following formula: Risk score == Gene exp 1xx beta1+\exp 1 \times \beta 1+ Gene exp 2xx beta2+dots+\exp 2 \times \beta 2+\ldots+ Gene expression n xx beta nn \times \beta n (where gene expression represents the expression value of the gene, and beta\beta represents the corresponding LASSO regression coefficient). Standardization was applied to both the training and testing sets. Classification task objects were created using the “mlr3” package in RR, initializing various learning algorithms such as logistic regression, linear discriminant analysis, support vector machines, etc. Cross-validation was performed to evaluate the performance of each learning algorithm, visualizing the performance of different algorithms including AUC curves and box plots. The naïve_bayes model with the best performance was selected for final testing, and ROC curves were plotted with AUC values calculated. 使用 R (PMC2929880) 中的 “glmnet” 包进行分析,以减少最终风险模型中的基因数量。预后模型基于以下公式构建: 风险评分 == 基因 exp 1xx beta1+\exp 1 \times \beta 1+ 基因 exp 2xx beta2+dots+\exp 2 \times \beta 2+\ldots+ 基因表达 n xx beta nn \times \beta n (其中基因表达代表基因的表达值, beta\beta 代表相应的 LASSO 回归系数)。标准化应用于训练集和测试集。分类任务对象是使用 中的“mlr3”包创建的 RR ,初始化各种学习算法,例如 Logistic 回归、线性判别分析、支持向量机等。执行交叉验证以评估每种学习算法的性能,可视化不同算法的性能,包括 AUC 曲线和箱形图。选择性能最佳的 naïve_bayes 模型进行最终测试,并使用计算的 AUC 值绘制 ROC 曲线。
2.8 Prognostic analysis and nomogram construction for the 12-LDCDRGs model 2.8 12-LDCDRGs 模型的预后分析和列线图构建
Survival analysis was conducted using the “survival” and “survminer” packages in R. Survival curves between high-risk and low-risk groups in both the training and testing sets were compared, and relevant pp-values were calculated. The “bioForest” function was employed to create a risk forest plot, and the"indep" function was used to perform univariate and multivariate Cox regression analyses, generating corresponding risk forest plots. Subsequently, the “coxph” function was utilized to fit the Cox proportional hazards model, and risk curve plots were generated. The model was then calibrated, and calibration curve plots for 1 year, 3 years, and 5 years were constructed. Finally, a nomogram was constructed based on the Cox proportional hazards model to visualize the predictive ability of the 12-LDCDRGs model for 1 year, 3 years, and 5 years survival probabilities. 使用 R 中的 “survival” 和 “survminer” 包进行生存分析,比较训练集和测试集中高危组和低危组之间的生存曲线,并计算相关 pp -值。使用 “bioForest” 函数创建风险森林图,使用 “indep” 函数进行单变量和多变量 Cox 回归分析,生成相应的风险森林图。随后,利用 “coxph” 函数拟合 Cox 比例风险模型,并生成风险曲线图。然后对模型进行校准,并构建 1 年、 3 年和 5 年的校准曲线图。最后,基于 Cox 比例风险模型构建列线图,可视化 12-LDCDRGs 模型对 1 年、 3 年和 5 年生存概率的预测能力。
2.9 Analysis of immune microenvironment 2.9 免疫微环境分析
First of all, we to TCGA dataset of gene expression data preprocessing, excluding the expressed genes, using the ‘voom’ function in limma package remaining data standardization. Then, cell type deconvolution was performed using the normalized data using the CIBERSORT algorithm with 1000 permutations and quantile normalization. The resulting cell type proportions were screened to include only samples with a pp-value less than 0.05 . These proportions were further analyzed using boxplots to visualize low-risk and high-risk category differences. In addition, using GSVA packages for a single sample enrichment analysis (ssGSEA) gene set. ssGSEA scores were standardized and correlated with risk scores. In addition, we to the gene expression analysis, gene PLEKHM1 its expression level components associated with the immune cells and use the ggplot2 results visualization. 首先,我们对 TCGA 数据集的基因表达数据进行预处理,排除表达的基因,使用 'voom' 函数在 limma 包中剩余数据标准化。然后,使用 CIBERSORT 算法的归一化数据进行细胞类型反卷积,具有 1000 次排列和分位数归一化。筛选所得细胞类型比例,以仅包括 pp -value 小于 0.05 的样本。使用箱线图进一步分析这些比例,以可视化低风险和高风险类别差异。此外,使用 GSVA 包进行单样本富集分析 (ssGSEA) 基因集。ssGSEA 评分是标准化的,并与风险评分相关。此外,我们进行基因表达分析,基因PLEKHM1其与免疫细胞相关的表达水平成分,并使用 ggplot2 结果可视化。
2.10 Analysis of enrichment 2.10 富集分析
We extracted symbolic gene sets from the MSigDB database and performed a detailed gene set Variation Analysis (GSVA). To identify statistically significant subgroup differences, adjusted p < 0.05p<0.05. We used the Metascape website (https://metascape.org/) for enrichment analysis of 12-LDCDRGs. 我们从 MSigDB 数据库中提取了符号基因集,并进行了详细的基因集变异分析 (GSVA)。为了确定具有统计意义的子组差异,调整了 p < 0.05p<0.05 .我们使用 Metascape 网站 (https://metascape.org/) 对 12-LDCDRGs 进行富集分析。
2.11 PPI was used to screen core genes and single gene survival analysis 2.11 PPI 用于筛选核心基因和单基因存活分析
We collected protein-protein interaction data using the string website (https://string-db.org/) and visualized the topology of the PPI network using Cytoscape software. The top 10 core genes were screened by degree. In addition, we performed survival analysis of these 10 genes using the “survival” and “survminer” R packages. 我们使用字符串网站 (https://string-db.org/) 收集蛋白质-蛋白质相互作用数据,并使用 Cytoscape 软件可视化 PPI 网络的拓扑结构。按度筛选前 10 个核心基因。此外,我们使用 “survival” 和 “survminer” R 软件包对这 10 个基因进行了生存分析。
2.12 Statistical analysis 2.12 统计分析
Statistical analyses were performed with the use of RR software, versions 4.3.1 and R 4.1.3. We obtained data from TCGA and GEO databases for quality control and batch effect correction, used a variety of techniques for data reduction and visualization, and applied different algorithms for cell subpopulation analysis, modularity and network analysis, feature selection and model building, and immune infiltration analysis. Finally, machine learning methods were used to build prediction models, including kknn, Ida, log_reg, naive_bayes, ranger, rpart and svm. Kaplan-Meier (KM) survival curve and log-rank test were used to compare the overall survival (OS) between high-risk group and low-risk group. We used 使用 4.3.1 和 R 4.1.3 版 RR 软件进行统计分析。我们从 TCGA 和 GEO 数据库获取数据进行质量控制和批次效应校正,使用各种技术进行数据缩减和可视化,并应用不同的算法进行细胞亚群分析、模块化和网络分析、特征选择和模型构建以及免疫浸润分析。最后,使用机器学习方法构建预测模型,包括 kknn、Ida、log_reg、naive_bayes、ranger、rpart 和 svm。采用 Kaplan-Meier (KM) 生存曲线和对数秩检验比较高危组和低风险组的总生存期 (OS)。我们使用
Fig. 4 We extracted differential subpopulations of monocytes from the NSCLC group and the control group, analyzing the communication between NSCLC-associated endothelial cell groups and other cell groups. We processed the data of the endothelial cell groups. The differences between these subpopulations are visually displayed through bar plots, highlighting potential biomarkers or functional characteristics specific to NSCLC-associated monocytes (A). Bar plots were used to compare the number of cells in the cell groups of the NSCLC group and the normal group ( B\mathbf{B} and C\mathbf{C} ). We processed the data of the endothelial cell groups to obtain the PCA dimensionality reduction visualization and elbow plot (D), and generated UMAP plots for the CT and NSCLC groups (E). Additionally, we used bar plots to show the expression differences of cell subpopulations between these two groups ( F\mathbf{F}. We extracted subpopulations 0,1,4,5,6,70,1,4,5,6,7, and 8 as characteristic subpopulations of NSCLC and performed quantitative and weighted communication analysis with other cell groups (G), displaying the specific pathways of intercellular communication through a heatmap (H). The communication within different subpopulations in the MIF signaling pathway is shown (I). The communication within different subpopulations in the TNF signaling pathway is shown (J). The input and output of various signals in different subpopulations are represented using a heatmap (K). Furthermore, we used a bubble plot to show the input and output of different cells (L) 图 4 我们从 NSCLC 组和对照组中提取了单核细胞的差异亚群,分析了 NSCLC 相关内皮细胞组与其他细胞组之间的通讯。我们处理了内皮细胞组的数据。这些亚群之间的差异通过条形图直观地显示,突出显示了 NSCLC 相关单核细胞特有的潜在生物标志物或功能特征 (A)。条形图用于比较 NSCLC 组和正常组 ( B\mathbf{B} 和 C\mathbf{C} ) 细胞组中的细胞数。我们处理了内皮细胞组的数据,获得了 PCA 降维可视化和肘部图 (D),并为 CT 和 NSCLC 组生成了 UMAP 图 (E)。此外,我们使用条形图来显示这两组 ( F\mathbf{F} .我们提取了 0,1,4,5,6,70,1,4,5,6,7 亚群和 8 作为 NSCLC 的特征亚群,并与其他细胞群 (G) 进行了定量和加权通讯分析,通过热图 (H) 展示了细胞间通讯的特定途径。显示了 MIF 信号转导通路中不同亚群内的通讯 (I)。显示了 TNF 信号通路中不同亚群内的通讯 (J)。不同亚群中各种信号的输入和输出使用热图 (K) 表示。此外,我们使用气泡图来显示不同单元格的输入和输出 (L)
the Wilcoxon test to explore differences in the performance of tumor-infiltrating immune cells and immune function between the two cohorts. Analysis of variance was used for statistical analysis, p value and false discovery rate (FDR) q value less than 0.05 were considered statistically significant. These methods provide comprehensive data processing and analysis support for research and contribute to a deeper understanding of the pathogenesis and potential therapeutic targets of non-small cell lung cancer. Wilcoxon 检验,以探讨两个队列之间肿瘤浸润免疫细胞性能和免疫功能的差异。采用方差分析进行统计分析,p 值和错误发现率 (FDR) q 值小于 0.05 被认为具有统计学意义。这些方法为研究提供了全面的数据处理和分析支持,有助于更深入地了解非小细胞肺癌的发病机制和潜在治疗靶点。
3 Result 3 结果
3.1 Single cell data quality control 3.1 单细胞数据质量控制
We have detailed the entire process of our study in Fig. 1. Four samples from the normal and NSCLC tumor tissues of patients were collected from public databases. The Seurat package was used for quality control of the single-cell data, retaining cells with a total RNA count greater than 200, less than 4000, and a mitochondrial RNA proportion less than 15%15 \% (Fig. 2A, B), to exclude low-quality or damaged cells and those potentially affected by excessive mitochondrial RNA or cellular stress. Next, based on principal component clustering analysis (Fig. 2C), the cells were divided into 25 clusters (Fig. 2D). Throughout the annotation process of these 25 cell clusters, we identified ten different cell types: T cells, monocytes, B cells, macrophages, endothelial cells, plasma cells, myeloid cells, epithelial cells, smooth muscle cells, and mast cells (Fig. 2E-G). Additionally, we conducted extensive research on intercellular communication within each cell type (Fig. 2H). Our findings indicated that macrophages, smooth muscle cells, and myeloid cells exhibited strong intercellular communication with other cells, highlighting their activity in cancer progression. We then mapped the distribution of different cell types in normal and cancerous tissues, allowing us to visualize each cell type (Fig. 21). It was observed that the number of immune cells, such as monocytes, increased in NSCLC tissues, indicating significant immune infiltration in the cancerous tissues. 我们在图 1 中详细介绍了我们研究的整个过程。从公共数据库中收集了 4 例来自患者正常和 NSCLC 肿瘤组织的样本。Seurat 包装用于单细胞数据的质量控制,保留总 RNA 计数大于 200、小于 4000 且线粒体 RNA 比例小于的细胞 15%15 \% (图 2A、B),以排除低质量或受损的细胞以及可能受过量线粒体 RNA 或细胞应激影响的细胞。接下来,根据主成分聚类分析(图 2C),将细胞分为 25 个簇(图 2D)。在这 25 个细胞簇的整个注释过程中,我们确定了十种不同的细胞类型:T 细胞、单核细胞、B 细胞、巨噬细胞、内皮细胞、浆细胞、髓样细胞、上皮细胞、平滑肌细胞和肥大细胞(图 2E-G)。此外,我们对每种细胞类型内的细胞间通讯进行了广泛的研究(图 2H)。我们的研究结果表明,巨噬细胞、平滑肌细胞和骨髓细胞与其他细胞表现出很强的细胞间通讯,突出了它们在癌症进展中的活性。然后,我们绘制了正常组织和癌组织中不同细胞类型的分布图,使我们能够可视化每种细胞类型(图 21)。据观察,NSCLC 组织中免疫细胞(如单核细胞)的数量增加,表明癌组织中有明显的免疫浸润。
3.2 Exploring monocyte communication patterns 3.2 探索单核细胞通讯模式
To explore the immune role in non-small cell lung cancer (NSCLC), we conducted an in-depth analysis of monocytes. We extracted the monocyte population for PCA dimensionality reduction clustering (Fig. 3A). The results identified 11 clusters of monocytes (Fig. 3B). By identifying the expression levels of marker genes in each cluster (Fig. 3C), we categorized these 11 clusters into four types of cells: monocytes, monocyte-derived macrophages, alveolar macrophages, and mast cells (Fig. 3D). To investigate the differences between these four cell types and the previously identified macrophage communities, we performed cell communication analysis by identifying the number of communication receptors and ligands for each cell type (Fig. 3E). The results showed strong communication between mast cells and both macrophages and alveolar macrophages. We then analyzed the communication intensity between them (Fig. 3F), which also indicated more pronounced communication between mast cells and different macrophages compared to other cells. To further explore the communication patterns of mast cells, we used a heatmap to display the incoming and outgoing communication patterns of cells under different signaling pathways (Fig. 3G). It can be seen that mast cells exhibit a large amount of signal output under different signaling pathways, and they predominantly receive signals through MIF, EGF, CXCL, and IL1. Finally, we analyzed the proportion of each cell type in the NSCLC and control groups using bar charts (Fig. 4A, B). The bar charts showed a significant increase in the proportion of monocytes in the NSCLC group compared to the 为了探索免疫在非小细胞肺癌 (NSCLC) 中的作用,我们对单核细胞进行了深入分析。我们提取了用于 PCA 降维聚类的单核细胞群(图 3A)。结果确定了 11 个单核细胞簇(图 3B)。通过鉴定每个簇中标记基因的表达水平(图 3C),我们将这 11 个簇分为四种类型的细胞:单核细胞、单核细胞衍生的巨噬细胞、肺泡巨噬细胞和肥大细胞 (Fig. 3D)。为了研究这四种细胞类型与先前确定的巨噬细胞群落之间的差异,我们通过鉴定每种细胞类型的通讯受体和配体的数量来进行细胞通讯分析(图 3E)。结果显示肥大细胞与巨噬细胞和肺泡巨噬细胞之间有很强的通讯。然后,我们分析了它们之间的通讯强度(图 3F),这也表明与其他细胞相比,肥大细胞和不同巨噬细胞之间的通讯更明显。为了进一步探索肥大细胞的通讯模式,我们使用热图来显示不同信号通路下细胞的传入和传出通讯模式(图 3G)。可以看出,肥大细胞在不同信号通路下表现出大量的信号输出,主要通过 MIF、EGF、CXCL 和 IL1 接收信号。最后,我们使用条形图分析了 NSCLC 和对照组中每种细胞类型的比例(图 4A、B)。条形图显示,与
3.3 Revealing the total cellular communication landscape 3.3 揭示整个蜂窝通信格局
Similarly, given the significant increase in epithelial cells in NSCLC samples (Fig. 4A), we extracted epithelial cells for PCA dimensionality reduction clustering (Fig. 4B-D). Using UMAP, we displayed the clustering results of epithelial cells from different sources (Fig. 4E) and visualized the proportion differences among different clusters (Fig. 4F). To further reveal the landscape of cell communication, we conducted a cell communication analysis of all cell types. Figure 4G shows the number of receptors and ligands as well as the intensity of cell communication between different cell types, highlighting the strong interactions between epithelial cells and other cells. To explore the details of cell communication further, we used a heatmap to display each cell type under various communication pathways (Fig. 4H). We found that the MIF pathway mediated strong cell communication, while the TNF pathway showed a decrease in communication intensity, indicating the critical roles of the MIF and TNF pathways in cell communication during NSCLC development. 同样,鉴于 NSCLC 样本中上皮细胞的显着增加(图 4A),我们提取了上皮细胞用于 PCA 降维聚类(图 4B-D)。使用 UMAP,我们展示了来自不同来源的上皮细胞的聚类结果(图 4E),并可视化了不同聚类之间的比例差异(图 4F)。为了进一步揭示细胞通讯的前景,我们对所有细胞类型进行了细胞通讯分析。图 4G 显示了受体和配体的数量以及不同细胞类型之间的细胞通讯强度,突出了上皮细胞与其他细胞之间的强烈相互作用。为了进一步探索细胞通讯的细节,我们使用热图来显示各种通讯途径下的每种细胞类型(图 4H)。我们发现 MIF 通路介导强细胞通讯,而 TNF 通路显示通讯强度降低,表明 MIF 和 TNF 通路在 NSCLC 发育过程中细胞通讯中的关键作用。
In Fig. 41 and J, we detailed the cell interactions under the MIF and TNF pathways. In the MIF pathway, monocytes associated with NSCLC were the primary signal transmitters. In the TNF pathway, B cells mediated the signal transmission. We used a heatmap to illustrate the incoming and outgoing communication patterns of all cell types under different signaling pathways (Fig. 4K). In signal transmission, epithelial cells, macrophages, myeloid cells, and NSCLC-associated monocytes played major roles. In signal reception, B cells, macrophages, and myeloid cells were the main cell types receiving signals. Figure 4 L provides a more intuitive quantification of the cells involved in signal reception, showing that macrophages and myeloid cells predominantly contributed to signal communication. 在图 41 和 J 中,我们详细介绍了 MIF 和 TNF 通路下的细胞相互作用。在 MIF 通路中,与 NSCLC 相关的单核细胞是主要的信号递质。在 TNF 通路中,B 细胞介导信号传递。我们使用热图来说明不同信号通路下所有细胞类型的传入和传出通信模式(图 4K)。在信号传递中,上皮细胞、巨噬细胞、髓系细胞和 NSCLC 相关单核细胞起主要作用。在信号接收中,B 细胞、巨噬细胞和髓系细胞是接收信号的主要细胞类型。图 4 L 提供了对参与信号接收的细胞的更直观量化,表明巨噬细胞和骨髓细胞主要有助于信号传递。
3.4 Analysis of hdWGCNA in epithelial cells 3.4 上皮细胞中 hdWGCNA 的分析
To explore the key role of epithelial cells, we conducted hdWGCNA analysis to identify potential markers of epithelial cells. After setting the soft threshold to 8, we identified 5 modules (Fig. 5A). As shown in Fig. 5B and C, a total of 6 gene modules were obtained, and the 10 most influential genes were listed according to hdWGCNA. Additionally, the yellow module exhibited a strong positive correlation with the brown module (Fig. 5D). Moreover, the UMAP plot displayed the distribution of the turquoise and green modules in epithelial cells, which overlapped significantly with the epithelial cells in subcluster 5 (Fig. 5E). Interestingly, we found that the turquoise and blue modules were highly expressed in the epithelial cells of subcluster 5 (Fig. 5F). Therefore, we propose that the turquoise and green modules may represent 为了探索上皮细胞的关键作用,我们进行了 hdWGCNA 分析以确定上皮细胞的潜在标志物。将软阈值设置为 8 后,我们确定了 5 个模块(图 5A)。如图 5B 和 C 所示,共获得 6 个基因模块,并根据 hdWGCNA 列出了 10 个最具影响力的基因。此外,黄色模块与棕色模块表现出很强的正相关(图 5D)。此外,UMAP 图显示了上皮细胞中绿松石色和绿色模块的分布,这与子簇 5 中的上皮细胞显着重叠(图 5E)。有趣的是,我们发现绿松石色和蓝色模块在亚簇 5 的上皮细胞中高度表达(图 5F)。因此,我们建议 turquoise 和 green 模块可以代表
3.5 Proposed temporal analysis of epithelial cells 3.5 上皮细胞的时间分析
To identify the characteristics of epithelial cell marker genes at different developmental stages, we conducted pseudotime analysis. Cells with similar states were grouped together, and branch points divided the cells into different states. Notably, epithelial cells in clusters 2,3 , and 4 were primarily located at the end of the pseudotime trajectory (Fig. 6A, B). Additionally, the changes in potential marker genes during differentiation were detected based on the gene expression levels of different epithelial cell subclusters (Fig. 6C). To further explore the functional roles behind the regulation of NSCLC by epithelial cells, we used enrichment analysis to investigate the module genes identified by hdWGCNA (Fig. 6D). 为了确定不同发育阶段上皮细胞标志基因的特征,我们进行了伪时间分析。具有相似状态的细胞被分组在一起,分支点将细胞划分为不同的状态。值得注意的是,簇 2、3 和 4 中的上皮细胞主要位于伪时间轨迹的末端(图 6A、B)。此外,根据不同上皮细胞亚簇的基因表达水平检测分化过程中潜在标记基因的变化 (图 6C)。为了进一步探索上皮细胞调控 NSCLC 背后的功能作用,我们使用富集分析来研究 hdWGCNA 鉴定的模块基因(图 6D)。
We found that epithelial cells play major roles in pathways such as enzyme binding, enzyme inhibitor activity, protein binding, and oxidoreductase activity, suggesting their potential impact on protein functions in NSCLC, particularly affecting enzyme-mediated energy metabolism processes. 我们发现上皮细胞在酶结合、酶抑制剂活性、蛋白质结合和氧化还原酶活性等途径中起主要作用,表明它们对 NSCLC 中蛋白质功能的潜在影响,特别是影响酶介导的能量代谢过程。
3.6 Machine learning reveals prognostic value of epithelial cells 3.6 机器学习揭示上皮细胞的预后价值
To further explore the clinical value of epithelial cell marker genes, we obtained 585 NSCLC samples from the TCGA database and 272 NSCLC patient samples from GEO. First, we obtained lysosomal autophagy-related genes by LASSO regression analysis of 90 genes from the hdWGCNA module by taking the intersection set with the lysosome-dependent death gene set, and 12 marker genes related to lysosome-dependent death of epithelial cells were obtained by LASSO regression screening (Fig. 7A, B). Next, we used seven machine learning methods to model the 12 genes and compared their performance. The results showed that the naive_bayes model had the largest AUC area compared to other machine learning methods, proving that analyzing the 12 selected genes using the naive_bayes model provides the most prognostic value (Fig. 7C). We further demonstrated the prognostic value of the naive_bayes model by plotting ROC curves (Fig. 7D and E). 为了进一步探讨上皮细胞标志基因的临床价值,我们从 TCGA 数据库中获得了 585 例 NSCLC 样本,从 GEO 中获得了 272 例 NSCLC 患者样本。首先,我们采用与溶酶体依赖性死亡基因集的交集,对 hdWGCNA 模块中的 90 个基因进行 LASSO 回归分析,获得溶酶体自噬相关基因,通过 LASSO 回归筛选获得 12 个与上皮细胞溶酶体依赖性死亡相关的标记基因(图 7A、B)。接下来,我们使用 7 种机器学习方法对 12 个基因进行建模并比较它们的性能。结果表明,与其他机器学习方法相比,naive_bayes 模型具有最大的 AUC 面积,证明使用 naive_bayes 模型分析 12 个选定的基因可提供最有价值的预测值(图 7C)。我们通过绘制 ROC 曲线进一步证明了 naive_bayes 模型的预后价值 (图 7D 和 E)。
In this step, we integrated transcriptome samples and used machine learning to establish a robust 12-gene NSCLC prognostic model to explore the clinical value of single-cell analysis. To evaluate the accuracy of the prognostic model, we plotted survival curves and performed calibration curve analysis at different time points. The results consistently indicated that the survival rate of the high-risk group was significantly lower than that of the low-risk group at various time points, with p < 0.05p<0.05 indicating statistical significance. The constructed model effectively distinguished between high-risk and low-risk groups (Fig. 8A, B, and F). Additionally, calibration curves at different time points showed high predictive accuracy of the model. 在此步骤中,我们整合了转录组样本,并使用机器学习建立了一个强大的 12 基因 NSCLC 预后模型,以探索单细胞分析的临床价值。为了评估预后模型的准确性,我们绘制了生存曲线并在不同时间点进行了校准曲线分析。结果一致表明,高危组在不同时间点的生存率均显著低于低风险组,具有 p < 0.05p<0.05 统计学意义。构建的模型有效地区分了高风险组和低风险组(图 8A、B 和 F)。此外,不同时间点的校准曲线显示出模型的高预测准确性。
We conducted univariate and multivariate analyses to assess the impact of different clinical indicators on outcomes (Fig. 8C and D), and the results indicated that tumor stage was a significant risk factor. We also created a nomogram to visualize the results of our model (Fig. 8E). 我们进行了单变量和多变量分析,以评估不同临床指标对结果的影响 (图 8C 和 D),结果表明肿瘤分期是一个重要的危险因素。我们还创建了一个列线图来可视化我们的模型结果(图 8E)。
3.7 Immune cell infiltration and analysis of the tumour microenvironment 3.7 免疫细胞浸润和肿瘤微环境分析
To explore the relationship between the model, immune cell infiltration, and the tumor microenvironment, we performed deconvolution analysis using the CIBERSORT algorithm. This analysis revealed significant differences between the highrisk and low-risk groups in terms of plasma cells, macrophages M2, resting CD4 memory T cells, macrophages M0, CD8 T cells, naive B cells, Tregs, activated CD4 memory T cells, monocytes, and resting mast cells (Fig. 9A). Additionally, there were differences in immune functions, such as type II IFN response, between the two risk groups (Fig. 9B). 为了探索模型、免疫细胞浸润和肿瘤微环境之间的关系,我们使用 CIBERSORT 算法进行了反卷积分析。该分析揭示了高危组和低危组在浆细胞、巨噬细胞 M2、静息 CD4 记忆 T 细胞、巨噬细胞 M0、CD8 T 细胞、幼稚 B 细胞、Tregs、活化的 CD4 记忆 T 细胞、单核细胞和静息肥大细胞方面的显著差异(图 9A)。此外,两个风险组之间的免疫功能存在差异,例如 II 型 IFN 反应(图 9B)。
We also analyzed the 12 feature genes included in the model. Each gene was divided into high and low expression groups, and the differences in immune cells between these groups were observed to reveal the immune functions of these genes (Fig. 9C-J). Furthermore, we identified pathway differences between the high-risk and low-risk groups through KEGG enrichment analysis, which showed significant differences between the two groups. GO enrichment analysis indicated functional differences in regulating mast cell degranulation, immune processes, and bone marrow follicular B cells, highlighting the importance of immune function differences between the two groups (Fig. 10A-F). 我们还分析了模型中包含的 12 个特征基因。将每个基因分为高表达组和低表达组,观察这些组间免疫细胞的差异,以揭示这些基因的免疫功能(图 9C-J)。此外,我们通过 KEGG 富集分析确定了高危组和低危组之间的通路差异,显示两组之间存在显著差异。GO 富集分析表明,在调节肥大细胞脱颗粒、免疫过程和骨髓滤泡 B 细胞方面存在功能差异,突出了两组免疫功能差异的重要性(图 10A-F)。
Finally, we depicted the association of the 12 feature genes through PPI network analysis, with the results showing that BLK and KIT genes held key positions among all the genes (Fig. 11A, B). Additionally, survival analysis of individual genes revealed the crucial roles of these 12 key genes in tumor prognosis (Fig. 11C-K). 最后,我们通过 PPI 网络分析描述了 12 个特征基因的关联,结果表明 BLK 和 KIT 基因在所有基因中占据关键位置(图 11A、B)。此外,单个基因的生存分析揭示了这 12 个关键基因在肿瘤预后中的关键作用 (图 11C-K)。
4 Discussion 4 讨论
Non-small cell lung cancer (NSCLC) is a complex disease involving the interplay of genetic background and environmental factors, and is associated with various abnormalities such as metabolic dysregulation and apoptosis. Epithelial cells play a crucial role in the progression of NSCLC. They act as surface barriers and perform secretory functions, with some epithelial cells undergoing morphological changes in response to stimuli as a reaction to external disturbances. This process, known as epithelial-mesenchymal transition (EMT), is significant for cancer processes such as tumor invasion and metastasis. Through EMT, epithelial cells acquire invasive potential, transforming into migratory mesenchymal cells associated with tumor cell invasion [10]. EMT also leads to the upregulation of anti-apoptotic signals, making the cells more tumorigenic and less responsive to treatment. Concurrently, extracellular vesicles released by NSCLC cells drive the invasion and permeability of non-tumorigenic lung epithelial cells [11]. 非小细胞肺癌 (NSCLC) 是一种涉及遗传背景和环境因素相互作用的复杂疾病,与代谢失调和细胞凋亡等各种异常有关。上皮细胞在 NSCLC 的进展中起着至关重要的作用。它们充当表面屏障并执行分泌功能,一些上皮细胞在刺激下发生形态变化,作为对外部干扰的反应。这个过程被称为上皮-间充质转化 (EMT),对于肿瘤侵袭和转移等癌症过程具有重要意义。通过 EMT,上皮细胞获得侵袭潜力,转化为与肿瘤细胞侵袭相关的迁移性间充质细胞 [10]。EMT 还导致抗凋亡信号的上调,使细胞更具致瘤性,对治疗的反应性降低。同时,NSCLC 细胞释放的细胞外囊泡驱动非致瘤性肺上皮细胞的侵袭和通透性 [11]。
In this study, we described the communication of epithelial cells through single-cell analysis. We found that epithelial cells predominantly output signals rather than receive them in cell communication, with a focus on the MK and ANNEXIN signaling pathways. This suggests that epithelial cells have a unique communication pattern in the progression of NSCLC. In the hdWGCNA analysis, we identified modules most associated with NSCLC epithelial cells, thereby obtaining functional gene groups. Enrichment analysis revealed that these genes are concentrated in functions such as enzyme-linked reactions, protein maturation, and extracellular vesicles, highlighting the critical role of epithelial cells in protein synthesis and enzymatic reactions. 在这项研究中,我们通过单细胞分析描述了上皮细胞的通讯。我们发现上皮细胞在细胞通讯中主要输出信号而不是接收信号,重点是 MK 和 ANNEXIN 信号通路。这表明上皮细胞在 NSCLC 的进展中具有独特的通讯模式。在 hdWGCNA 分析中,我们确定了与 NSCLC 上皮细胞最相关的模块,从而获得功能基因组。富集分析显示,这些基因集中在酶联反应、蛋白质成熟和细胞外囊泡等功能中,突出了上皮细胞在蛋白质合成和酶促反应中的关键作用。
Using machine learning, we constructed a prognostic model comprising 12 lysosomal-dependent death genes. Studies have reported that the BLK gene is ectopically expressed in various malignancies, including breast cancer, kidney cancer, and lung cancer, and may serve as a potential therapeutic target [12]. The activation of CD84 leads to the expression of PD-L1, inhibiting T cell function and acting as a regulator of the immunosuppressive microenvironment [13]. Knockdown of FUCA1 significantly alleviates p53-dependent, chemotherapy-induced apoptotic death [14]. Silencing GAB2 suppresses the proliferation and invasion of NSCLC cells [15], and regulating KIT expression reduces NSCLC cell resistance to cisplatin [16]. All of these lysosomal-dependent death genes are closely associated with cancer development, regulation of the tumor microenvironment, while other marker genes were newly associated with NSCLC in our study. Enrichment analysis of all marker genes indicated that these genes are mainly involved in various immune processes, including mast cell degranulation. 使用机器学习,我们构建了一个包含 12 个溶酶体依赖性死亡基因的预后模型。研究报道,BLK 基因在各种恶性肿瘤中异位表达,包括乳腺癌、肾癌和肺癌,可能是一个潜在的治疗靶点 [12]。CD84 的激活导致 PD-L1 的表达,抑制 T 细胞功能并作为免疫抑制微环境的调节剂 [13]。敲低 FUCA1 可显著减轻 p53 依赖性化疗诱导的凋亡性死亡 [14]。沉默 GAB2 抑制 NSCLC 细胞的增殖和侵袭 [15],调节 KIT 表达可降低 NSCLC 细胞对顺铂的耐药性 [16]。所有这些溶酶体依赖性死亡基因都与癌症发展、肿瘤微环境的调节密切相关,而在我们的研究中,其他标志基因与 NSCLC 新相关。所有标记基因的富集分析表明,这些基因主要参与各种免疫过程,包括肥大细胞脱颗粒。
Collectively, our results suggest that epithelial cells influence NSCLC progression by affecting multiple immune processes and protein changes. Characterizing the tumor microenvironment (TME) at single-cell resolution can provide insights into potential novel therapeutic targets. By integrating transcriptome multi-modal analysis and combining single-cell and tissue transcriptomics, we aim to identify key mechanisms within the complex network of NSCLC. Understanding the extent to which tumor cells shape their microenvironment and how the microenvironment influences tumor cells is crucial, as these mechanisms determine the clinical response to targeted or immunotherapy. In the future, single-cell methods incorporating spatial information and surface protein expression will help complete this picture [17]. 总的来说,我们的结果表明,上皮细胞通过影响多种免疫过程和蛋白质变化来影响 NSCLC 进展。在单细胞分辨率下表征肿瘤微环境 (TME) 可以深入了解潜在的新型治疗靶点。通过整合转录组多模式分析并结合单细胞和组织转录组学,我们旨在确定 NSCLC 复杂网络中的关键机制。了解肿瘤细胞塑造其微环境的程度以及微环境如何影响肿瘤细胞至关重要,因为这些机制决定了对靶向或免疫治疗的临床反应。未来,结合空间信息和表面蛋白表达的单细胞方法将有助于完成这一图景 [17]。
In summary, we first identified a critical role for epithelial cells in non-small cell lung cancer through lysosome-dependent death. On one hand, we did not thoroughly explore the expression patterns of these genes in epithelial cells. Additionally, we lack a large clinical cohort to investigate the diagnostic value of these characteristic genes. Nevertheless, our findings may provide new insights into the prognosis and tumor microenvironment of NSCLC. Further research on the specific mechanisms and regulatory pathways of epithelial cell-mediated immunity in NSCLC development will enhance our understanding of the pathogenesis of NSCLC [18-20]. 总之,我们首先通过溶酶体依赖性死亡确定了上皮细胞在非小细胞肺癌中的关键作用。一方面,我们没有彻底探索这些基因在上皮细胞中的表达模式。此外,我们缺乏一个大型的临床队列来研究这些特征基因的诊断价值。尽管如此,我们的研究结果可能为 NSCLC 的预后和肿瘤微环境提供新的见解。进一步研究上皮细胞介导的免疫在 NSCLC 发展中的特异性机制和调节途径,将增强我们对 NSCLC 发病机制的理解 [18-20]。
5 Conclusion 5 总结
This study constructed a prognostic model for NSCLC based on LDCD scoring by integrating single-cell RNA sequencing and machine learning techniques. This model can effectively predict the prognosis of NSCLC patients, providing an important basis for precision therapy and rational medication. Additionally, the results indicate the critical role of monocytes in NSCLC progression, providing a theoretical foundation for the future development of novel therapeutic targets. 本研究通过整合单细胞 RNA 测序和机器学习技术,构建了基于 LDCD 评分的 NSCLC 预后模型。该模型可有效预测 NSCLC 患者的预后,为精准治疗和合理用药提供重要依据。此外,结果表明单核细胞在 NSCLC 进展中的关键作用,为未来开发新的治疗靶点提供了理论基础。
Author contributions JF, YC, JL, MT and FW conceived the study. RL, JW, GW, YR, FW, YG, MB, PWand FW drafted the manuscript. JF, YC, JL and FW performed the literature search and collected the data. JF, YC, JL and MT analyzed and visualized the data. YR, FW, YG, MB and FW helped with the final revision of this manuscript. All authors reviewed and approved the final manuscript. 作者投稿 JF、YC、JL、MT 和 FW 构思了这项研究。RL、JW、GW、YR、FW、YG、MB、PW和 FW 起草了手稿。JF 、 YC 、 JL 和 FW 进行文献检索并收集数据。JF、YC、JL 和 MT 对数据进行了分析和可视化。YR、FW、YG、MB 和 FW 帮助完成了这份手稿的最终修订。所有作者都审查并批准了最终手稿。
Funding The Scientific Research Fund of Technology Bureau in Dazhou (22ZDYF0020), the Scientific Research Fund of Technology Bureau in Dazhou (22YYJC0014). 资助 达州市科技局科研基金(22ZDYF0020)、达州市科技局科研基金(22YYJC0014)。
Data availability Single-cell RNA sequencing (scRNA-seq) data from patients were retrieved from the Gene Expression Omnibus (GEO) database. Batch RNA-seq data were obtained from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/) and GEO database. 数据可用性 从基因表达综合 (GEO) 数据库中检索患者的单细胞 RNA 测序 (scRNA-seq) 数据。批量 RNA-seq 数据来自癌症基因组图谱 (TCGA) 数据库 (https://portal.gdc.cancer.gov/) 和 GEO 数据库。
Declarations 声明
Competing interests The authors declare that this research was conducted in the absence of any commercial or fifinancial relationships that could be construed as a potential confflict of interest. 利益争夺 作者声明,这项研究是在没有任何可能被解释为潜在利益冲突的商业或金融关系的情况下进行的。
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommons.org/licenses/by-nc-nd/4.0/. 开放获取本文根据 Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License 进行许可,该许可允许以任何媒体或格式进行任何非商业用途、共享、分发和复制,前提是您给予原作者和来源适当的署名,提供指向 Creative Commons 许可的链接,并说明您是否修改了许可材料。根据本许可,您无权共享源自本文或其部分的改编材料。本文中的图像或其他第三方材料包含在文章的知识共享许可中,除非在材料的致谢行中另有说明。如果材料未包含在文章的 Creative Commons 许可中,并且您的预期用途未被法律法规允许或超出允许的用途,您将需要直接从版权所有者处获得许可。要查看此许可证的副本,请访问 http:// creativecommons.org/licenses/by-nc-nd/4.0/。
References 引用
Bloom GS. Amyloid- beta\beta and tau: the trigger and bullet in Alzheimer disease pathogenesis. JAMA Neurol. 2014;71:505-8. 布鲁姆 GS。淀粉样蛋白和 beta\beta tau 蛋白:阿尔茨海默病发病机制的触发因素和子弹。美国医学会神经学杂志。2014;71:505-8。
Duma N, Santana-Davila R, Molina JR. Non-small cell lung cancer: epidemiology, screening, diagnosis, and treatment. Mayo Clin Proc. 2019;94:1623-40. Duma N, Santana-Davila R, Molina JR. 非小细胞肺癌:流行病学、筛查、诊断和治疗。梅奥临床论文集 2019;94:1623-40。
Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2021. CA Cancer J Clin. 2021;71:7-33. Siegel RL、Miller KD、Fuchs HE、Jemal A. 癌症统计,2021 年。CA 癌症 J Clin。2021;71:7-33.
Huang MY, Jiang XM, Wang BL, Sun Y, Lu JJ. Combination therapy with PD-1/PD-L1 blockade in non-small cell lung cancer: strategies and mechanisms. Pharmacol Ther. 2021;219:107694. 江非小细胞肺癌 PD-1/PD-L1 阻断联合治疗:策略和机制。药理学 Ther.2021;219:107694.
Aits S, Jäättelä M. Lysosomal cell death at a glance. J Cell Sci. 2013;126:1905-12. Aits S, Jäättelä M. 溶酶体细胞死亡一目了然。细胞科学杂志 2013;126:1905-12。
Kundu ST, Grzeskowiak CL, Fradette JJ, Gibson LA, Rodriguez LB, Creighton CJ, Scott KL, Gibbons DL. TMEM106B drives lung cancer metastasis by inducing TFEB-dependent lysosome synthesis and secretion of cathepsins. Nat Commun. 2018;9:2731. Kundu ST、Grzeskowiak CL、Fradette JJ、Gibson LA、Rodriguez LB、Creighton CJ、Scott KL、Gibbons DL。TMEM106B 通过诱导 TFEB 依赖性溶酶体合成和组织蛋白酶分泌来驱动肺癌转移。Nat Commun.2018;9:2731.
Huang WZ, Luo MZ. A cellulose acetate membrane counter immunoelectrophoresis test for identification of the host source of mosquito blood meals. Ji Sheng Chong Xue Yu Ji Sheng Chong Bing Za Zhi. 1986;4:186-8. 黄 WZ, 罗 MZ.用于鉴定蚊子血粉宿主来源的醋酸纤维素膜抗免疫电泳试验。Ji Sheng Chong Xue Yu Ji Sheng Chong Bing Za Zhi.1986;4:186-8.
Lambrechts D, Wauters E, Boeckx B, Aibar S, Nittner D, Burton O, Bassez A, Decaluwé H, Pircher A, Van den Eynde K, Weynand B, Verbeken E, De Leyn P, Liston A, Vansteenkiste J, Carmeliet P, Aerts S, Thienpont B. Phenotype molding of stromal cells in the lung tumor microenvironment. Nat Med. 2018;24:1277-89. Lambrechts D, Wauters E, Boeckx B, Aibar S, Nittner D, Burton O, Bassez A, Decaluwé H, Pircher A, Van den Eynde K, Weynand B, Verbeken E, de Leyn P, Liston A, Vansteenkiste J, Carmeliet P, Aerts S, Thienpont B. 肺肿瘤微环境中基质细胞的表型塑造。国家医学 2018;24:1277-89。
Xu J, Gou S, Huang X, Zhang J, Zhou X, Gong X, Xiong J, Chi H, Yang G. Uncovering the impact of aggrephagy in the development of Alzheimer’s disease: insights into diagnostic and therapeutic approaches from machine learning analysis. Curr Alzheimer Res. 2023;20:618-35. 徐杰, 苟思, 黄 X, 张 J, 周 X, 龚 X, 熊 J, 池 H, 杨 G. 揭示自噬对阿尔茨海默病发展的影响:从机器学习分析中洞察诊断和治疗方法。Curr 阿尔茨海默病研究 2023;20:618-35。
Hasan H, Sohal IS, Soto-Vargas Z, Byappanahalli AM, Humphrey SE, Kubo H, Kitdumrongthum S, Copeland S, Tian F, Chairoungdua A, Kasinski AL. Extracellular vesicles released by non-small cell lung cancer cells drive invasion and permeability in non-tumorigenic lung epithelial cells. Sci Rep. 2022;12:972. Hasan H, Sohal IS, Soto-Vargas Z, Byappanahalli AM, Humphrey SE, Kubo H, Kitdumrongthum S, Copeland S, Tian F, Chairoungdua A, Kasinski AL. 非小细胞肺癌细胞释放的细胞外囊泡驱动非致瘤性肺上皮细胞的侵袭和通透性。科学代表 2022;12:972。
Petersen DL, Berthelsen J, Willerslev-Olsen A, Fredholm S, Dabelsteen S, Bonefeld CM, Geisler C, Woetmann A. A novel BLK-induced tumor model. Tumour Biol. 2017;39:1010428317714196. 彼得森 DL、贝瑟尔森 J、威勒斯列夫-奥尔森 A、弗雷德霍尔姆 S、达贝尔斯汀 S、博内菲尔德 CM、盖斯勒 C、沃特曼 A。一种新的 BLK 诱导的肿瘤模型。肿瘤生物学 2017;39:1010428317714196。
Lewinsky H, Gunes EG, David K, Radomir L, Kramer MP, Pellegrino B, Perpinial M, Chen J, He TF, Mansour AG, Teng KY, Bhattacharya S, Caserta E, Troadec E, Lee P, Feng M, Keats J, Krishnan A, Rosenzweig M, Yu J, Caligiuri MA, Cohen Y, Shevetz O, Becker-Herman S, Pichiorri F, Rosen S, Shachar I. CD84 is a regulator of the immunosuppressive microenvironment in multiple myeloma. JCI Insight. 2021;6:e141683. Lewinsky H, Gunes EG, David K, Radomir L, Kramer MP, Pellegrino B, Perpinial M, Chen J, He TF, Mansour AG, Teng KY, Bhattacharya S, Caserta E, Troadec E, Lee P, Feng M, Keats J, Krishnan A, Rosenzweig M, Yu J, Caligiuri MA, Cohen Y, Shevetz O, Becker-Herman S, Pichiorri F, Rosen S, Shachar I. CD84 是多发性骨髓瘤免疫抑制微环境的调节因子。JCI 洞察。2021;6:e141683.
Baudot AD, Crighton D, O’Prey J, Somers J, Gonzalez PS, Ryan KM. p53 directly regulates the glycosidase FUCA1 to promote chemotherapyinduced cell death. Cell Cycle. 2016;15:2299-308. p53 直接调节糖苷酶 FUCA1 以促进化疗诱导的细胞死亡。细胞周期。2016;15:2299-308.
Yu S, Geng S, Hu Y. Mir-486-5p inhibits cell proliferation and invasion through repressing GAB2 in non-small cell lung cancer. Oncol Lett. 2018;16:3525-30. Yu S, Geng S, 胡 Y. Mir-486-5p 通过抑制非小细胞肺癌中的 GAB2 抑制细胞增殖和侵袭。Oncol Lett. 2018;16:3525-30。
Li P, Ma L, Zhang Y, Ji F, Jin F. MicroRNA-137 down-regulates KIT and inhibits small cell lung cancer cell proliferation. Biomed Pharmacother. 2014;68:7-12. MicroRNA-137 下调 KIT 并抑制小细胞肺癌细胞增殖 LiP, 马 L, Zhang Y, Ji F, Jin F. MicroRNA-137 下调 KIT 并抑制小细胞肺癌细胞增殖。生物医学药剂师。2014;68:7-12.
Fan T, Jiang L, Zhou X, Chi H, Zeng X. Deciphering the dual roles of PHD finger proteins from oncogenic drivers to tumor suppressors. Front Cell Dev Biol. 2024;12:1403396. 范 T, 江 L, 周 X, Chi H, Zeng X. 破译 PHD 手指蛋白从致癌驱动因子到肿瘤抑制因子的双重作用。前细胞开发生物学2024;12:1403396。
You Y , et al. Mediation role of recreational physical activity in the relationship between the dietary intake of live microbes and the systemic immune-inflammation index: a real-world cross-sectional study. Nutrients. 2024;16:6777. https://doi.org/10.3390/nu16060777. You Y 等人。娱乐性体育活动在活微生物饮食摄入量与全身免疫炎症指数之间关系中的中介作用:一项真实世界的横断面研究。营养素。2024;16:6777.https://doi.org/10.3390/nu16060777。
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. 出版商注:施普林格·自然 (Springer Nature) 对已出版地图中的管辖权主张和机构隶属关系保持中立。
Jiangping Fu, Yaohua Chen, Jie Li and Ming Tan have contributed equally to this work. 傅江平、陈耀华、李杰和谭明对这项工作做出了同样的贡献。