Abstract
Although immunotherapy combined with chemotherapy (ICT) is the standard treatment for advanced non-small cell lung cancer (NSCLC), identification of reliable prognostic biomarkers remains challenging. In this multicenter study, we performed next-generation sequencing of tumor samples from 162 patients receiving first-line ICT at the Chinese PLA General Hospital and collected their pathological image information. First, we established a model to predict the risk of tumor progression based on genomic characteristics. Furthermore, a deep learning method was employed to recognize different cell types from pathological images, which significantly improved the accuracy of progression-free survival (PFS) and overall survival (OS) prediction. In summary, we constructed a Prognostic Multimodal Classifier for Progression (PMCP) that possesses the capability to precisely forecast PFS and OS. Patients with the PMCP1 subtype exhibit a low risk of progression and demonstrate a higher proportion of epithelial cells. PMCP highlighted the potential value of multimodal biomarkers in guiding clinical decisions regarding ICT. The area under curve (AUC) for predicting PFS was 0.807. This study revealed the importance of integrating genomic and pathological data to improve prognostic accuracy and enable personalized treatment for patients with advanced NSCLC.
Similar content being viewed by others
Introduction
Based on statistics from 2022, 2,480,301 newly diagnosed cases of lung cancer were reported worldwide, along with 1,817,172 deaths attributed to the disease1. Lung cancer is recognized as a major global health issue and is predominantly non-small cell lung cancer (NSCLC), which accounts for approximately 80â85% of all cases2. Regrettably, the diagnosis of NSCLC often occurs when the disease has already progressed to an advanced stage, resulting in a 5-year survival rate of typically less than 10%3.
The traditional treatment options for patients with advanced NSCLC are limited to cytotoxic chemotherapy. However, these treatments typically lead to a median overall survival (OS) of approximately 8â12 months4. Moreover, chemotherapy often leads to systemic toxicity and may become less effective over time, owing to the development of drug resistance5. The emergence of immunotherapy has gradually improved treatment outcomes for patients with lung cancer compared to the traditional era of chemotherapy and radiation therapy. Multiple clinical trials have demonstrated that programmed cell death ligand 1 (PD-L1) expression6 and tumor mutation burden (TMB)7 can predict the efficacy of immune checkpoint inhibitors (ICI) in NSCLC. However, it is worth noting that only a minority of patients benefit significantly from ICI monotherapy. Patients with high TMB and PD-L1 expression of 50% or higher exhibited an objective response rate (ORR) to programmed death 1 (PD-1) /PD-L1 inhibition as high as 57%, in contrast to the mere 8.7% ORR observed in patients with low TMB and PD-L1 expression of <1%8. Despite their potential, biomarkers such as PD-L1 expression and TMB currently exhibit limitations in their clinical utility for patient selection in ICI therapy9,10.
Combining immunotherapy with chemotherapy (ICT) not only directly targets tumor cells, but also enhances their immunogenicity, overcomes immune suppression, and modulates immune responses, thereby exerting anti-tumor effects. Immunotherapy, with or without platinum-based doublet chemotherapy, has become the established standard initial treatment for most patients with advanced-stage NSCLC lacking molecular driver alterations11,12. The results from the KEYNOTE-18912 and KEYNOTE-40713 trials revealed that pembrolizumab combined with chemotherapy significantly increased ORR compared with chemotherapy alone (48.3% vs 19.9%; 57.9% vs 38.4%) and extended progression-free survival (PFS) (9.0 vs. 4.9 months, hazard ratio (HR)â=â0.50; 6.4 vs. 4.8 months, HRâ=â0.56) and OS (22.0 months vs. 10.6 months, HRâ=â0.60; 15.9 vs. 11.3 months, HRâ=â0.64). In addition, pembrolizumab combined with platinum-containing chemotherapy resulted in better long-term survival than platinum-containing chemotherapy alone. The 5-year OS rates of patients with advanced non-squamous NSCLC and squamous NSCLC who received first-line treatment were 19.4% and 18.4%, respectively. The chemotherapy rates were 11.3% and 9.7%, respectively. Results of the AK105-302 study showed that penpulimab combined with chemotherapy significantly increased ORR (69.7% vs. 26.3%) and prolonged PFS (7.0 vs. 4.2 months; HRâ=â0.6, 95% CI: 0.29â0.54)14. To facilitate comparison, the relevant research data mentioned above are summarized in Supplementary Table 1.
The utilization of combination chemotherapy and immunotherapy is extended to a wider patient population than monotherapy with ICIs. This broader applicability encompasses patients irrespective of high TMB or PD-L1 expression levels, resulting in a greater proportion of first-line treatment in advanced NSCLC. Although ICT has become the predominant strategy for the initial treatment of advanced NSCLC, the factors influencing its efficacy remain poorly understood. Advancements in genomic sequencing technology, bioinformatics, and artificial intelligence (AI) have significantly enhanced the identification of clinically relevant genetic biomarkers15,16,17,18. With the continuous growth of tumor sequencing data from clinical research articles, the cBioPortal platform, and The Cancer Genome Atlas (TCGA), an increasing number of studies have focused on exploring biomarkers and signatures associated with cancer prognosis and treatment to address clinical challenges and advance personalized medicine.
Recent studies have investigated the predictive power of genomic mutations on the efficacy of immunotherapy in NSCLC. Bai et al. developed a genomic mutation signature (GMS) to predict the response to anti-PD-L1 therapy19. Similarly, Pan et al. identified a compound mutational signature from a 52-gene panel that predicted responses to immune checkpoint blockade therapy20. Additionally, the PathwayTMB score, which is based on mutational pathways, has been proposed as a predictor of survival outcomes in cancer immunotherapy. These DNA-based omics approaches offer the potential for personalized treatment strategies by enhancing the predictive accuracy of immunotherapy responses in NSCLC patients21.
With the shift towards using immunotherapy as the first-line treatment for most patients with NSCLC, assessment of the tumor microenvironment (TME) has become paramount. The quantity, density, type, functional state, and spatial distribution of immune cells within the TME are closely associated with the occurrence, progression, prognosis, and anti-tumor immune response in NSCLC22. Traditional methods for studying the TME are often time-consuming, expensive, complex to interpret, and may involve sample destruction. Advancements in AI technology have enabled researchers to use pathological images to identify the TME. DuCote et al. utilized AI techniques to identify six different cell types in hematoxylin and eosin (H&E)-stained tumor slides to study the heterogeneity of the TME in NSCLC23.
We enrolled 162 patients with advanced NSCLC lacking EGFR/ALK driver alterations who underwent first-line ICT treatment. Traditional indicators of immunotherapy efficacy, including TMB and PD-L1 expression, showed limited predictive value in our cohort. By integrating genomics and cellular features identified by H&E-stained pathological images, we established a prognostic multimodal classifier for progression (PMCP), which enables the effective prediction of prognosis in patients with advanced NSCLC receiving first-line ICT (Fig. 1A).
A The analysis workflow of the study. B Mutation oncology of 162 NSCLC patients for top 30 genes with alternation frequency. Top histogram: number of mutations per sample; Middle track: clinical information per sample. HRS high-risk, LRS low-risk, H-Epi high proportion of epithelial cells, L-Epi low proportion of epithelial cells, HED HLA class I evolutionary divergence, NE not evaluated, SD stable disease, PR partial response, PD progressive disease, NX unknown.
Results
Genomic and clinicopathological characteristics in the immunotherapy combined with chemotherapy multicenter biomarker study (ICMBS)
This prospective multicenter immune-positive cohort was designated the Immunotherapy combined with Chemotherapy Multicenter Biomarker Study (ICMBS). A total of 162 Chinese patients with advanced NSCLC were enrolled after receiving first-line ICT (Fig. 1B). Of these, 7 patients were not treated at our center, and follow-up PFS data were unavailable for them. Table 1 presents the clinical characteristics of the patients. Among the 162 NSCLC samples, 77 (47%) were diagnosed as lung adenocarcinoma and 79 (49%) as lung squamous cell carcinoma. Additionally, 6 cases (4%) were classified as indeterminate or unclear subtypes from a pathological perspective. The details of the clinicopathological characteristics are summarized in Supplementary Table 2.
The ICMBS samples were performed with ChosenOne® testing, which utilizes a panel of 1123 genes. The top 10 mutated genes in ICMBS cohort were TP53 (80%), LRP1B (38%), KMT2D (28%), KRAS (26%), PTPRD (25%), ARID1A (17%), KEAP1 (17%), CDKN2A (16%), APTA1 (15%), and CACNA1E (14%) (Fig. 1B). In patients with ICMBS, 4 de novo signatures were identified using non-negative matrix factorization (NMF). The cosine similarity between these 4 signatures and the 30 Catalogue of Somatic Mutations in Cancer (COSMIC) signatures were shown in Supplementary Fig. 1A. Signature 1 exhibited the strongest resemblance (cosine similarity of 0.811) to COSMIC_2, which is linked to the activity of the AID/APOBEC family of cytidine deaminases. Signature 2 showed the highest similarity (cosine similarity of 0.921) to COSMIC_4 and was linked to smoking. Signature 3 was similar to COSMIC_5 (cosine similarity of 0.743), which is widely observed in most cancer samples; however, its etiology remains unclear. Signature 4 was similar to COSMIC_10 (cosine similarityâ=â0.849). It has been suggested that the mutational process driving this signature involves modified activity of the error-prone polymerase POLE (Supplementary Fig. 1B). Kaplan-Meier (KM) curve survival analysis revealed no notable differences in OS (Supplementary Fig. 1C) or PFS (Supplementary Fig. 1D) among 4 signature groups. In addition, we found that the TMB in APOBEC and smoking signatures were relatively high (Supplementary Fig. 1E), but the expression level of PD-L1 was relatively low in smoking (Supplementary Fig. 1F). The APOBEC signature was significantly enriched in females, and there were no significant differences in mutation signatures among smoking, drinking, age, human leukocyte antigen (HLA)-group, and HLA class I evolutionary divergence (HED) groups (Fig. 2A). In this cohort, those with the best therapeutic outcomes assessed as complete response (CR) and partial response (PR) are defined as the âResponseâ group (N =â107), while those with progressive disease (PD) or stable disease (SD) are defined as the ânonResponseâ group (Nâ=â44). The median TMB was higher in the âResponseâ group compared to the ânonResponseâ group (12.02 vs. 10.3 mut/mb). The âResponseâ was enriched with a higher proportion of TMB high (TMB-H) individuals (64.4%) compared to the ânonResponseâ (50.8%) (Fig. 2B). The TMB-H group exhibited significantly better PFS than TMB low (TMB-L) (HR =â0.52; 95% CI: 0.32â0.87; pâ=â0.01, Fig. 2C). However, there were no notable differences in OS between the TMB-H and TMB-L groups (Fig. 2D). The âResponseâ group was enriched with a higher proportion of tumor proportion score (TPS) â¥â50% individuals (42.3%) compared to the ânonResponseâ group (3%) (Fig. 2E). Moreover, PD-L1 expression emerged as a robust predictor of the efficacy of immune checkpoint therapy and did not show significant prognostic significance in terms of PFS (Fig. 2F) or OS (Fig. 2G) in the ICMBS cohort.
A Clustering of NSCLC patients based on proportions of mutation signatures. Each cluster group is named according to the dominant mutation signature (APOBEC, unknown, smoking, and POLE). B TMB analysis between the âResponseâ group and the ânonResponseâ group. KaplanâMeier survival analysis for PFS (C) and OS (D) between the TMB-H group and TMB-L group. E PD-L1 analysis between the âResponseâ group and the ânonResponseâ group. KaplanâMeier survival analysis for PFS (F) and OS (G) in different PD-L1 groups. *pâ<â0.05.
We selected 85 samples for whole-exome sequencing and analyzed them to identify significant copy number alterations (CNA) encompassing both amplified and deleted peaks at the arm level in each sample (Supplementary Table 3 and Supplementary Fig. 2A). CNA-peakNum is defined as the total number of genomic regions showing peaks of variation, including amplifications and deletions. In ICMBS patients, the median CNA-peakNum was 80. Patients with a CNA-peakNum greater than or equal to 80 were classified as the âCNA-peakNum Highâ group, while those with a CNA-peakNum less than 80 were classified as the âCNA-peakNum Lowâ group. Survival analysis revealed no statistically differences in PFS (HRâ=â0.78, 95% CI: 0.38â1.59; p =â0.5, Supplementary Fig. 2B), and the OS was significantly longer in the high group compared with the low group (HRâ=â0.32; 95% CI: 0.11â0.92; pâ=â0.026; Supplementary Fig. 2C). In the present study, no significant correlation was observed between TMB/PD-L1 and CNV (Supplementary Fig. 2D, E).
In the ICMBS study, fusion genes were identified using DNA- and RNA-based methods. DNA detection methods revealed three fusion genes in the three patients: KMT2A-TMPRSS4, MYC-SPIDR, and ALK-KIF13AR (Supplementary Table 4). These were newly discovered fusion genes in this study. Based on the RNA testing, 87 fusions were identified in 50 patients. Patient 148 was found to have a class I SDC4-ROS1 (exon32-exon35) fusion and did not undergo targeted therapy, thus meeting the inclusion criteria for the ICMBS study. Additionally, patient 154 exhibited a class II FGFR3 (exon17) - TACC3 fusion, whereas the remaining 81 fusions were categorized as class III fusion variants (Supplementary Table 4).
Association of genomic mutations with treatment response and survival outcomes in advanced NSCLC patients receiving ICT
We initially compared the mutation landscape of the âResponseâ and ânonResponseâ groups (Supplementary Fig. 3A). Using univariate logistic regression, we discovered differences in ARID1A (mutation [MT], Nâ=â27), LZTR1 (MT, Nâ=â5), and SERPINB3 (MT, N =â6) (pâ<â0.05) between the two groups (Supplementary Fig. 3B and Supplementary Table 5). Furthermore, we explored the association between the response and PD-L1 or gene mutation status. Multivariate logistic regression revealed that ARID1A-MT and high PD-L1 were significantly associated with ICT response (Supplementary Table 6), In the ARID1A-MT group, 77.8% of patients were classified in the TMB-H subgroup, with 50% of patients exhibiting PD-L1 levels â¥50%. In contrast, the ARID1A wild-type (WT) group showed lower proportions of 54.8% and 28%, respectively (Supplementary Fig. 3C-D).
Univariate and survival analyses of the mutation statuses of all genes in relation to OS (Supplementary Table 7) and PFS (Supplementary Table 8) were conducted. KDM6A (MT, Nâ=â8), NBN (MT, Nâ=â5), PTPN14 (MT, Nâ=â1), and CDKN1A (MT, N =â1) were associated with OS, with a false discovery rate (FDR) of less than 0.05. Meaningfully, the population with KDM6A-MT had significantly shorter OS (HR =â12.928, 95% CI: 3.33â50.147; pâ<â0.0001; Supplementary Fig. 4A) and poor PFS (HR =â2.628, 95% CI: 0.801â8.624; p =â0.097; Supplementary Fig. 4B) compared with KDM6A-WT. In the KDM6A-MT group, 87.5% of patients were categorized into the TMB-H subgroup, and 37.5% of patients exhibited PD-L1 expression ⥠50%. Conversely, in the KDM6A-WT group, these proportions were relatively low at 57.1% and 31.2%, respectively (Supplementary Fig. 4C-D). Moreover, KDM6A-MT patients had shorter OS (HR =â2.006, 95% CI: 0.995â4.046; p =â0.047, Supplementary Fig. 4E) and higher TMB (p =â7.8e-07, Supplementary Fig. 4F) in independent cohorts from the MSKCC (MSK, Nature Medicine 2022, Nâ=â738)24.
Impact of RTK-RAS pathway mutations and co-mutation patterns on treatment response and survival in advanced NSCLC patients receiving ICT
Supplementary Table 9 shows the mutation frequencies of genes and samples within cancer-related pathways. Univariate logistic regression analysis revealed that RTK-RAS was significantly associated with ICT response (Fig. 3A). A total of 125 patients harbored mutations in 39 genes within the RTK-RAS pathway, of which KRAS was the most frequently mutated (Supplementary Fig. 5). In the Response group, 82.2% of patients had at least one mutation in the RTK-RAS pathway genes (Fig. 3B). Compared to the RTK-RAS WT patients, the RTK-RAS MT patients had significantly longer PFS (HR =â0.46; 95% CI: 0.26â0.81; p =â0.0059, Fig. 3C), and there was no significant relationship between MT and OS (HR =â0.59; 95% CI, 0.27â1.28; p =â0.18; Fig. 3D). Moreover, RTK-RAS MT and TMB exhibited a significant positive correlation (Fig. 3E), but there was no correlation with the expression level of PD-L1 (Fig. 3F).
A The number of samples exhibiting genetic alterations in 21 pathways between the âResponseâ group and ânonResponseâ group. B Comparing the frequency of genetic alterations in the RTK-RAS pathway between the âResponseâ group and ânonResponseâ group. KaplanâMeier survival analysis for PFS (C) and OS (D) between the RTK-RAS_MT group and RTK-RAS_WT group. The distribution of TMB (E) and PD-L1 (F) between the RTK-RAS_MT group and RTK-RAS_WT group.
To analyze the co-mutation patterns in ICMBS, we performed survival and co-mutation pattern analyses of the top 20 high-frequency genes and 21 core pathways (Supplementary Table 10). The results are presented in Supplementary Fig. 6A. The occurrence of co-mutations between ARID1B and the DNA damage response (DDR) pathway is notably associated with poorer PFS (HR =â3.03; 95% CI: 1.48â6.22; p =â0.0015; Supplementary Fig. 6B) and OS (HR =â3.01; 95% CI: 1.24â7.33; p =â0.011; Supplementary Fig. 6C) compared to cases where such co-mutations do not occur.
Development and validation of a prognostic genomic mutation risk score(RS) model predicting PFS of ICMBS cohort
To explore the genomic mutation features associated with PFS in ICMBS, we examined 208 genes (with mutations observed in >3 patients) and 21 core pathways as potential predictors for univariate Cox regression analysis. A total of 9 features with a significance level of p <â0.05 were identified (Supplementary Table 11). Subsequently, a LASSO regression model was applied to further refine the selection process, ultimately identifying the optimal hub signature set (Fig. 4A, B), which included ARID1B, ABCC2, NIPBL, DYNC2H1, SETD2, FAF1, and RTK-RAS pathways. Figure 4C depicts the relationship between the prognostic signature and PFS identified through univariate analysis. The PFS-related RS for each patient was calculated using the following Eq. 1:
A LASSO coefficient profiles of mutation genes and pathways. B Partial likelihood deviance for the LASSO coefficient profiles. C Forest plot depicting the relationship between the prognostic signature and PFS identified through univariate analysis. Data are presented in the form of hazard ratios, with error bars indicating 95% confidence intervals. HR were calculated using Cox proportional hazards regression modeling. D Time-dependent ROC curves and AUC values of the model for predicting survival status in 6-, 8-, 12-, 16-, and 24-months. E PFS improvements for LRS versus HRS. F OS improvements for LRS versus HRS.
(Supplementary Table 12). Based on the median value (â0.097), the 162 advantaged patients were divided into two subgroups: high-risk (HRS) (Nâ=â72) and low-risk (LRS) (N =â90). We used time-dependent receiver operating characteristics (ROC) at 6, 8, 12, 16, and 24 months and calculated the areas under the curve (AUC) for the RS model as 0.706, 0.791,0.709,0.734, and 0.777, respectively (Fig. 4D). As depicted in Fig. 4E and Fig. 4F, patients in the HRS group exhibited shorter PFS (HR =â3.44; 95% CI: 2.04â5.82; p <â0.0001) and OS (HR =â2.09; 95% CI: 1.05â4.18; p =â0.032).
Owing to the outstanding performance of the RS model, we validated its independent predictive capability. Univariate Cox regression analysis demonstrated a significant correlation between PFS and both TMB and RS scores (p <â0.05) (Fig. 5A). Conversely, TMB demonstrated limited predictive power in the multivariate Cox regression analysis (Fig. 5B). It is noteworthy that the RS model consistently achieved a significant p-value of < 0.001. The multivariate model for TMB combined with RS, RS versus TMB alone, and the results of the log-likelihood ratio test comparing the nested Cox models are presented in Supplementary Table 13. These results highlight the significant independent effect of RS with a p-value of 3.95e-06.
Univariate COX regression analysis (A) and multivariate regression analysis (B) for PFS of RS and clinicopathological features. A Hazard ratio > 1 indicates that the feature is a risk factor, and a hazard ratio <1 indicates that the feature is a protective factor. pâ<â0.05 was considered statistically significant.
To ascertain the prognostic specificity of the first-line ICT based on the RS model, an additional retrospective dataset derived from our centerâs electronic medical record system, named ICMBS_TEST cohort (nâ=â35), was included as an internal validation set. We observed that the median PFS (6.1 months vs 9.4 months, HR =â2.32; 95% CI, 1.05â5.09; p =â0.032; Supplementary Fig. 7A) and the median OS (15.7 months vs 27.2 months, HR =â4.1; 95% CI: 1.11â15.11; p =â0.022; Supplementary Fig. 7B) of the HRS group is significantly lower than that of the LRS group. Detailed sample information is provided in Supplementary Table 14.
Genetic mutations data and survival data of stage III-IV NSCLC patients were derived from two sets in cBioPortal database and pooled for external validation of the RS model, which included 90 cases from CPTAC set (https://www.cbioportal.org/study/summary?id=lusc_cptac_gdc) and 235 cases from OncoSG set (https://www.cbioportal.org/study/summary?id=luad_oncosg_2020). We observed that the median OS of the HRS group is significantly lower than that of the LRS group (70.6 months vs NA, HR =â1.47; 95% CI: 1.00â2.17; p =â0.05; Supplementary Fig. 7C).
The Relationship in DDR pathway between the HRS and LRS groups
We explored the association between the RS model and DDR pathway. The mutation frequency of DDR sub-pathways did not exhibit a significant difference between the âResponseâ and ânonResponseâ groups. Univariate Cox analysis suggested that the mutation status of these DDR pathways did not correlate with PFS in the ICMBS cohort (Supplementary Fig. 8A). However, the frequency of mutations in the FA pathway increased noticeably in patients with HRS (N =â33, p =â0.008; Supplementary Fig. 8B).
The differences in the TME between the HRS and LRS groups in TCGA-NSCLC
To ascertain the prognostic specificity of the first-line ICT based on the RS model, we conducted an extensive analysis of genetic mutations and mRNA expression data from patients with TCGA-NSCLC. The RS calculation was applied to 984 NSCLC patients using Eq. 1, stratifying them into HRS (N =â379) and LRS (N =â605) groups based on the median cut-off. Furthermore, we conducted a comprehensive analysis of immune microenvironment disparities among patients with TCGA-NSCLC (including LUAD and LUSC) using paired RNA-seq data. This allowed us to meticulously compare and contrast the variations observed between patients with HRS and LRS. Differential expression analysis of immune checkpoint genes and HLA-related genes revealed that the HRS group tended towards an immunosuppressive state (Supplementary Fig. 9A-D), characterized by elevated expression of immunosuppressive regulatory genes such as ADORA2A and IL10RB (Supplementary Fig. 9E-F). In contrast, the LRS group exhibited immune activation with high expression of genes such as HLA-A, HLA-G, and HLA-DQB2 (Supplementary Fig. 9G-I). Survival data of stage III-IV NSCLC patients in TCGA-NSCLC (nâ=â500) were derived (Supplementary Table 14) and no survival differences were observed between the HRS and LRS groups in this cohort (42.9 months vs 49.8 months, HR =â0.93; 95% CI: 0.69â1.27; p =â0.67, Supplementary Fig. 7D), which may be attributed to the lack of explicit immunotherapy exposure among these patients.
Improved prediction capability of RS using a deep learning model based on H&E-stained pathological image in advanced NSCLC Patients Receiving ICT
Using the advanced supervised deep learning technology HoVer-Net (workflow shown in Fig. 6), we conducted an in-depth analysis of H&E-stained pathological images from 162 patients. Using this approach, we identified and counted five different cell types: inflammatory, neoplastic, soft-tissue, dead, and epithelial. To obtain cell counts for all cell types, we calculated the proportion of all cell types to fully reflect the cell distribution characteristics in the TME (Supplementary Table 15). Univariate Cox regression analysis indicated that inflammatory cells (HRâ=â0.00138; 95% CI: 4.25eâ06-0.453; pâ=â0.026), which act as protective factors, were associated with longer PFS. The epithelial cells (HRâ=â0.012; 95% CI: 7.49eâ05-2.07; pâ=â0.09), neoplastic cells (HRâ=â2.262; 95% CI: 0.705â0.285; pâ=â0.171), soft tissue cells (HR =â0.932; 95% CI, 0.291â2.989; p =â0.906), and dead cells (HRâ=â0.918; 95% CI: 0.046â18.287; p =â0.955) were not correlated with PFS.
We compared the prognostic AUC of the RS model with that of other indicator models and observed the robust performance of the RS model (AUC =â0.734, Supplementary Fig. 10A). The AUC of the proportion of epithelial cells trained using deep learning followed closely (AUC =â0.674). Some individual biomarkers for immunotherapy, such as TMB (AUCâ=â0.628), PD-L1 (AUCâ=â0.66), HED (AUC =â0.479), and the co-occurrence of mutations involving Serine/Threonine Kinase 11 (STK11) (AUC =â0.588), exhibited slightly diminished performance in the ICMBS cohort (Supplementary Fig. 10A).
The addition of AI model-predicted epithelial cell proportion only slightly improved the accuracy of RS in predicting PFS, with a high AUC of 0.807 at 16 month (Supplementary Fig. 10B). Meanwhile, epithelial cells were also found to be a significant influencing factor for PFS (HR =â0.59; 95% CI, 0.35â0.99; p =â0.045; Supplementary Fig. 10C). Furthermore, we have included the log-likelihood ratio test results for epithelial cells and RS in Supplementary Table 16. However, the combined group obtained a smaller p-value (2.90E-07). Furthermore, the PMCP integrates tumor mutation characteristics and fully leverages image features identified using AI technology, thereby achieving accurate prediction and classification of disease progression in patients. The results from the multivariate COX regression analysis and the time-ROC curve indicate that there was a significant correlation between PMCP and PFS (Fig. 7A-B). We found that the survival of patients in the PMCP1 subgroup was significantly longer than that of the other two subgroups (HR =â0.45; 95% CI, 0.32â0.64; p <â0.001; Fig. 7C and HRâ=â0.52; 95% CI: 0.32â0.83; p =â0.022; Fig. 7D), while there was no association between PMCP and TMB (Supplementary Fig. 10D). Table 2 presents the clinicopathological characteristics of the patients with PMCP. Furthermore, the chi-square test results revealed a significant association between stage and PMCP groups.
Discussion
In this study, we conducted a comprehensive analysis of genomic features associated with the prognosis and progression of patients with advanced NSCLC who received first-line ICT across multiple centers. Our findings highlight several crucial aspects of disease progression, treatment response, and potential prognostic biomarkers for the stratification of advanced NSCLC.
The extensive heterogeneity in patients with advanced lung cancer poses a significant challenge in the choice of therapy. Additionally, chemotherapy plays a crucial role in inhibiting cancer cell proliferation and eradicating tumor cells, exerting complex effects on both the TME and immune system. This complexity renders existing biomarkers insufficient to fully reflect the TME status, as well as to assess the response to combination therapy. For instance, TMB25, PD-L1 expression26, tumor-infiltrating lymphocytes (TILs)27, HED28, and the DDR pathway29,30 have been extensively studied in single-agent immunotherapy regimens; however, their prognostic significance in ICT regimens has been unsatisfactory. Therefore, by employing an integrative approach that incorporates machine and deep learning methodologies, our objective was to develop effective multi-modal prognostic models to elucidate the intricate molecular landscape of NSCLC and its implications for clinical management.
The first step of this study proposes a mutation-based RS model that is constructed based on tumor mutation patterns, comprising 7 core features, including RTK-RAS pathway as well as ARID1B, ABCC2, NIPBL, DYNC2H1, SETD2, and FAF1. Some of these signatures have been previously implicated in regulating innate cellular immunity and cancer immunotherapy. For example, Li et al. carried out a comprehensive analysis using TCGA datasets to discover a significant correlation between DYNC2H1 gene mutations and higher levels of CD8â+âT cells within NSCLC31. In gemcitabine-resistant pancreatic cancer cells, overexpression of linc-DYNC2H1-4 increases cell proliferation and migration and regulates epithelial-mesenchymal transition (EMT)-related genes32; however, further evidence is required to explore the interaction mechanism between linc-DYNC2H1-4 and DYNC2H1 in lung cancer. Kim et al. indicated that SETD2 mutations lead to the development of cellular resistance to cisplatin33, and the study by Zeng et al. reported that mutations in SETD2 may confer benefits from immunotherapy and radiotherapy34. Patients with altered RTK-RAS signaling have a higher ORR after ICI treatment35,36. Some studies have reported that ARID1A can also serve as a biomarker for the efficacy of ICI therapy in lung cancer. Specifically, ARID1A mutations have been associated with TMB, PD-L1 expression, and increased neoantigens, typically indicating higher immunogenicity and stronger immune responses37,38,39. Furthermore, in NSCLC, there have been reports indicating that co-mutations of ARID1B and ARID1A, which are mutually exclusive isoforms within the SWI/SNF chromatin remodeling complex, can enhance the immunotherapeutic benefits for patients40. However, the proportion of co-mutations in the ICMBS cohort was 14.8% (4/27 patients). At the last follow-up, these 4 individuals had no recurrence or progression. Among the remaining patients with only ARID1B mutation, 10/14 had progressed, with a median PFS of 127.5 days. Furthermore, we observed that co-mutations in ARID1B and the DDR pathway led to decreased survival rates. Studies have suggested that the loss of ARID1B expression can result in alterations in the TME41, and high expression of ARID1B has been identified as a biomarker of ineffective adjuvant chemotherapy in urothelial carcinoma42. Currently, there is no direct evidence to suggest that co-mutations involving the DDR pathway and ARID1B influence the response to ICT. Further investigations are required to elucidate this mechanism. However, the co-occurrence of DDR and ARID1B mutations may alter the TME, potentially contributing to a decrease in the ORR to ICT.
We found that TCGA-NSCLC patients with LRS exhibited higher immunogenicity, upregulation of the HLA-A gene43, and a higher proportion of infiltration of CD56dim natural killer (NK) cells44,45 and subtypes of helper T cells. These findings support the involvement of these cells in a wide spectrum of immune responses and inflammatory processes, potentially contributing to elevated immunogenicity46. The observed indistinguishability between HRS and LRS in terms of prognosis in the TCGA may be due to the heterogeneity of the patient population and the differences in treatment regimens. On the other hand, We also observed more mutations in the Fanconi pathway (FA) in patients with HRS. Disruptions or mutations in the FA pathway can result in the buildup of DNA interstrand crosslink damage, heightening cell susceptibility and fragility, and consequently increasing the risk of cancer47. Furthermore, deficiency in the FA pathway can lead to EMT and enhance the intrinsic inflammatory signaling of keratinocytes, which contributes to the pronounced invasiveness observed in squamous cell carcinoma48. EMT is considered a critical process that drives tumor progression, metastasis, and resistance to chemotherapy. These findings suggest that patients with LRS may show good clinical outcomes after immunotherapy. Conversely, patients with HRS may have a shorter PFS.
Furthermore, we employed a deep learning algorithm (HoVer-Net) to identify TME-related cellular features. This revealed that the proportion of epithelial cells has significant prognostic value, potentially due to a reduction in epithelial cell morphology, which is often associated with increased tumor invasion and metastatic capabilities. A reduction in the number of epithelial cell morphologies within the TME may imply easier tumor dissemination through the vascular or lymphatic systems, resulting in a worse prognosis49. During malignant progression, cancer cells lose epithelial cell characteristics. In contrast, EMT is linked to the proliferation of immunosuppressive cells, particularly myeloid-derived suppressor cells (MDSCs). In addition, EMT correlates with the upregulation of PD-L1, which also plays a role in the facilitation of immune inhibitory gene expression through chemokine production, contributing to the formation of a TME with immunosuppressive properties50. Wang et al. discovered that epithelial marker genes are typically negatively correlated with cytolytic activity (CYT) and that EMT is associated with worse survival and immune suppression50. These results suggest that a higher abundance of epithelial cell morphologies may promote tumor immune responses, enhancing the bodyâs immune surveillance and clearance of tumors, thereby leading to an immune-activated state. In a study of patients with NSCLC who received neoadjuvant PD-1 blockade combined with chemotherapy, it was found that epithelial cells play a significant role in the reconstruction of normal lung structure after immunotherapy51. Additionally, studies have shown that tumor-associated fibroblasts (CAFs) promote immunosuppression by secreting factors such as transforming growth factor-β (TGF-β) and Interleukin-6 (IL-6), which induce EMT in cancer cells, thereby increasing chemotherapy resistance52,53. Therefore, the efficacy of ICT in NSCLC is closely related to the interactions between epithelial cells and fibroblasts. Consequently, a deeper understanding of the role of epithelial cells in TME is crucial for optimizing ICT strategies.
In clinical practice, stratified management of patients has significant research implications. Patients in the ICMBS cohort with PMCP1 may derive the greatest benefits from ICT. This is because of the favorable factors of simultaneously exhibiting both a low risk of genomic mutations and pathological images. This treatment approach capitalizes on the relatively higher tumor immunogenicity of these patients while also enhancing the effectiveness of immunotherapy through chemotherapy. Conversely, for individuals classified with PMCP3, it is possible that they may not derive significant therapeutic benefit from ICT. This is because their tumor immunogenicity is low, and chemotherapy may not effectively enhance the effects of immunotherapy. Therefore, alternative treatment options should be considered for PMCP3 patients to achieve better treatment outcomes. The current results are promising for identifying biomarkers for ICT treatment in advanced NSCLC. However, they are constrained by the lack of external validation of tumor mutations and pathological features in patients who have undergone first-line ICT. Our next research plan is to actively collect information from these patients. Nonetheless, the AI model reduced costs and ensured timely analysis, further promoting the clinical application of stratified management for advanced NSCLC.
Methods
Samples
Sample collection involved the enrolment of 207 patients with advanced NSCLC between January 2021 and December 2023. Among these patients, inclusion criteria (Nâ=â162) were as follows: they received first-line ICT, demonstrated the capacity to adhere diligently to scheduled follow-up visits and necessary treatments at the hospital, and underwent next-generation sequencing (NGS) testing using the ChosenMed 1123-gene panel (ChosenOne®). Exclusion Criteria included patients who received treatments other than the specified protocol (Nâ=â37), those with incomplete clinical data (Nâ=â5), or those who did not undergo NGS (Nâ=â3). This study was performed in six medical centers of the Chinese PLA General Hospital (the first, third, fourth, fifth, sixth and seventh medical centers) and approved by the Ethics Committee of Chinese PLA General Hospital (S2021-462-01). All participants gave written informed consent. This study was conducted in accordance with the Declaration of Helsinki. Clinical follow-up tracking of the efficacy process involved calculating OS times, starting from the date of diagnosis and continuing until either the date of death or the last follow-up. Patients were categorized as having a survival state of 0 if they were still alive or lost to follow-up and as having a survival state of 1 if they were deceased. PFS times were similarly calculated from the date of diagnosis until either the date of tumor progression or deterioration, death from any cause, or the last follow-up. Clinical information, including age at diagnosis, sex, tumor site, tumor node metastasis (TNM), histological type of treatment, and follow-up results were collected for each patient. In this study,74% (120/162) of patients underwent PD-L1 expression assessment using immunohistochemistry (IHC), of which 37 (31%) utilized 22C3 antibodies, while the remaining 24 (90%) utilized SP263 antibodies. In addition, the name of the antibody used was not documented in the medical records of 59 patients. All of the patients were assessed for TPS to represent the proportion of tumor cells expressing PD-L1, and they were subsequently categorized into three groups: TPSâ<â1%, 1% < TPSââ¤â49%, and TPSââ¥â50%.
DNA Isolations and Sequencing
During the experimental procedure, genomic DNA was extracted from formalin-fixed paraffin-embedded (FFPE) tissues using the Concert FFPE DNA extraction kit (ConcertBio, Xiamen, Cat: RC1004). A blood gDNA purification kit (Concert, Xiamen, China; Cat: RC1001) was used for DNA isolation from peripheral blood lymphocytes. DNA samples were fragmented using an ultrasonic disruptor (Covaris M220) and used to generate libraries for subsequent DNA sequencing using the KAPA HyperPrep Kit (KAPA Biosystems, Cat: KK8504). After library preparation, quantification was performed using a Qubit 3.0 Fluorometer (Life Technologies). Subsequently, 1âμg of DNA libraries was hybridized and targeted using a custom probe from the ChosenMed® 1123 panel, covering 2.2âMb, following the Fast Hybridization and Wash Kit (Twist Bioscience, Cat: 101175). The samples were then subjected to 100âbp paired-end sequencing on an MGI-2000 sequencer (BGI), and the raw BCL files were converted into fastq files using the Bcl2fastq Conversion Software.
Data processing
Clean reads were aligned to the human reference genome (UCSC hg19) using BWA (version 0.7.11). Somatic mutations, including small insertions, deletions, and single nucleotide polymorphisms (SNPs), were identified using GATK (version 3.6) and VarScan with the default settings. All variants were annotated using ANNOVAR to ensure precise identification of single nucleotide variants (SNVs) and indels. Next, we implemented rigorous filters, excluding variants situated within intronic regions and those exhibiting a population frequency exceeding 1% in both the 1000 Genomes Project (version 1000gAUG_2015ALL) and ExAC databases. We selected variations supported by more than 8 reads, while excluding synonymous mutations. The following seven types of variations were retained for further analysis: frameshift deletions, frameshift insertions, in-frame deletions, in-frame insertions, missense mutations, nonsense mutations, and splice-site mutations.
Somatic variants, TMB analysis, copy number alteration, and fusion gene
TMB quantifies the genetic diversity of mutations in tumors. TMB is typically calculated as the number of non-synonymous somatic mutations found in the coding regions of the whole exome. The TMB value exceeding 10 mut/mb is classified as âTMB-H,â whereas below 10 is classified as âTMB-Lâ54. To identify copy number alteration (CNA) regions were identified using the parameters established in GISTIC software (version 2.0.23), as outlined in a previous publication55. Fusion genes were identified using DNA- and RNA-based methods. To detect fusion genes, specialized software tools, such as STAR-Fusion56, GeneFuse57, FACTERA58, and in-house pipelines were utilized. Functional annotations and relevant details of the fusion genes were obtained using the Ensembl and RefSeq databases.
Mutational signature analysis and oncogenic functional pathways
Cancer is an outcome of DNA mutagenesis and can be inferred from somatic mutation signatures by analyzing genome sequences. To detect 162 ICMBS patient signatures used the pan-cancer catalogue of 30 signatures was obtained from the COSMIC database (http://cancer.sanger.ac.uk/cosmic). Mutation signature detection base 6 substitution subtypes: Câ>âA, Câ>âG, Câ>âT, Tâ>âA, Tâ>âC, and Tâ>âG, and clustering of the 96 mutation types occurring in ICMBS samples. NMF can be used to decompose matrix A into four non-negative matrices. Additionally, detailed information on the 21 core cancer-related pathways is provided in Supplementary Table 10.
Development of a risk score (RS) model based on genomic features
Univariate Cox regression analysis was conducted to assess the correlation between the mutation status of all genes and hub cancer-related pathways within the ICMBS cohort and PFS. Subsequently, lasso regression of the âglmnetâ package was employed to identify core hub features. Equation 2 was used to calculate the risk score of each patient.
In this context, βsig represents the coefficients of core candidate features derived from the lasso regression analysis, while Mutsig indicates the mutation status of candidate features within the ICMBS cohort, with 0 denoting no mutation and 1 representing a meaningful mutation event. Patients were categorized into HRS and LRS groups based on the median RS derived from this cohort.
Application of an AI-driven model based on whole slide imaging for characterizing cellular composition in tumor microenvironment
In this study, we employed advanced methodologies and state-of-the-art (SOTA) models to improve the accuracy of cell identification in H&E-stained pathological images. We utilized the HoVer-Net model59, which combines cell segmentation and classification. However, to address the inconsistencies in the input and output sizes observed in the original implementation, we utilized PathML60 to seamlessly integrate HoVer-Net into our workflow, thereby resolving challenges such as input patch cropping and convolution without padding. For training and evaluation, we leveraged the PanNuke dataset (https://doi.org/10.48550/arXiv.2003.10778), which is a comprehensive resource introduced by Gamper et al. This dataset consists of 7904 patches of 256*256 pixels, each accompanied by the corresponding masks. With annotations for 189,744 nuclei across 19 tissue types, including clinically significant nuclear classes, such as neoplastic, inflammatory, epithelial, dead, and soft tissue cells, the PanNuke dataset provides a robust foundation for our study.
To ensure robust model performance, we partitioned the PanNuke dataset into training, validation, and test sets at a ratio of 7:1:2. The model was trained for 60 epochs using an optimizer with an initial learning rate of 2âÃâ10â4 and a MultiStepLR learning rate scheduler with milestones set to [10, 25, 40, 50]. The training was conducted on a machine equipped with 8 NVIDIA A800 80 GB GPUs. During the inference phase, we employed a multichannel image processing approach to efficiently segment the tissue regions. Subsequently, these segmented regions were divided into 256*256 px patches for input into the model. The output results of the model accurately counted five types of cells within each patch, including inflammatory, neoplastic, soft tissue, necrotic, and epithelial cells. To accelerate the inference, MPI parallel technology was applied to 20 nodes. Based on the AI-driven model, five cellular composition proportions were calculated from 162 patients from the Chinese PLA General Hospital. Subsequently, univariate COX regression analysis was used to assess the correlation between the proportion of each cell type and PFS. The samples were divided into two groups based on the median values. For example, âhEpiâ indicates that the proportion of epithelial cells in patients is greater than or equal to the median value in ICMBS, while âlEpiâ indicates that the proportion of epithelial cells is less than the median. This approach was also applied to classify four other cell types.
Constructed a prognostic multimodal classifier for progression (PMCP)
Specifically, patients in the PMCP1 subgroup have a low risk of progression and a high proportion of epithelial cells (hEpi_LRS), PMCP2 subgroup includes patients with a high proportion of epithelial cells but also a high risk of progression (hEpi_HRS), as well as patients with a low proportion of epithelial cells but a low risk of progression (lEpi_LRS); while patients in the PMCP3 subgroup exhibit high risk of progression and a low proportion of epithelial cells (lEpi_HRS). Statistical analyses, including KaplanâMeier survival analysis and Cox proportional hazards regression, were conducted to evaluate the prognostic significance of the PMCP classification. The predictive performance of the model was assessed using the AUC to predict PFS.
Statistical analysis and visualization
All statistical evaluations were performed using R software. Categorical variables were assessed using either the chi-squared test or logistic regression using the âstatsâ R package. The KM analysis was used for survival analysis, and group comparisons were performed using the log-rank test. Cox proportional hazards regression was used to assess the relationship between the clinicopathological factors and OS or PFS. Additionally, univariate Cox proportional hazards regression was used to investigate the association of genomic factors and specific genes with PFS or OS, adjusting for false discovery rates where necessary. All reported p-values were two-tailed, and statistical significance was set at pâ<â0.05. All figure plotting was done using R version 3.4.2 and packages including âggplot2â, âmaftoolsâ, âComplexHeatmapâ, and âggsurvâ packages.
Data availability
All analysis data are available in supplementary information. The raw datasets of tissue are available at GSA database<Genome Sequence Archive - CNCB-NGDC>and the project number PRJCA03441. Any additional information required to reanalyze the data reported in this paper is available from the corresponding author.
Code availability
The underlying code supporting this study is available at the following GitHub repository: https://github.com/jackBQ/advanced-non-small-cell-lung-cancer.
References
Bray, F. et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 74, 229â263 (2024).
Siegel, R. L., Miller, K. D., Wagle, N. S. & Jemal, A. Cancer statistics, 2023. Cancer J. Clin.73, 17â48 (2023).
SEER*Explorer: An interactive website for SEER cancer statistics [Internet]. Surveillance Research Program, National Cancer Institute; 2025 Jul 2. [cited 2025 Jul 22]. Available from: https://seer.cancer.gov/statistics-network/explorer/.
Chen, R. et al. Emerging therapeutic agents for advanced non-small cell lung cancer. J. Hematol. Oncol. 13, 58 (2020).
Davodabadi, F. et al. Cancer chemotherapy resistance: mechanisms and recent breakthrough in targeted drug delivery. Eur. J. Pharmacol. 958, 176013 (2023).
Herbst, R. S. et al. Atezolizumab for first-line treatment of PD-L1-selected patients with NSCLC. N. Engl. J. Med. 383, 1328â1339 (2020).
Samstein, R. M. et al. Tumor mutational load predicts survival after immunotherapy across multiple cancer types. Nat. Genet. 51, 202â206 (2019).
Ricciuti, B. et al. Association of high tumor mutation burden in non-small cell lung cancers with increased immune infiltration and improved clinical outcomes of PD-L1 blockade across PD-L1 expression levels. JAMA Oncol. 8, 1160â1168 (2022).
Yamaguchi, H., Hsu, J. M., Yang, W. H. & Hung, M. C. Mechanisms regulating PD-L1 expression in cancers and associated opportunities for novel small-molecule therapeutics. Nat. Rev. Clin. Oncol. 19, 287â305 (2022).
Peters, S., Reck, M., Smit, E. F., Mok, T. & Hellmann, M. D. How to make the best use of immunotherapy as first-line treatment of advanced/metastatic non-small-cell lung cancer. Ann. Oncol. Off. J. Eur. Soc. Med. Oncol. 30, 884â896 (2019).
Paz-Ares, L. et al. First-line nivolumab plus ipilimumab combined with two cycles of chemotherapy in patients with non-small-cell lung cancer (CheckMate 9LA): an international, randomised, open-label, phase 3 trial. Lancet Oncol. 22, 198â211 (2021).
Gandhi, L. et al. Pembrolizumab plus chemotherapy in metastatic non-small-cell lung cancer. N. Engl. J. Med. 378, 2078â2092 (2018).
Paz-Ares, L. et al. Pembrolizumab plus chemotherapy for squamous non-small-cell lung cancer. N. Engl. J. Med. 379, 2040â2051 (2018).
Zhong, H. et al. First-line penpulimab combined with paclitaxel and carboplatin for metastatic squamous non-small-cell lung cancer in China (AK105-302): a multicentre, randomised, double-blind, placebo-controlled phase 3 clinical trial. Lancet. Respir. Med. https://doi.org/10.1016/s2213-2600(23)00431-9 (2024).
Malapelle, U. et al. Standardized and simplified reporting of next-generation sequencing results in advanced non-small-cell lung cancer: Practical indications from an Italian multidisciplinary group. Crit. Rev. Oncol./Hematol. 193, 104217 (2024).
Restrepo, J. C., Dueñas, D., Corredor, Z. & Liscano, Y. Advances in genomic data and biomarkers: revolutionizing NSCLC diagnosis and treatment. Cancers 15, https://doi.org/10.3390/cancers15133474 (2023).
Dwivedi, K. et al. An explainable AI-driven biomarker discovery framework for Non-Small Cell Lung Cancer classification. Computers Biol. Med. 153, 106544 (2023).
Li, Y., Wu, X., Fang, D. & Luo, Y. Informing immunotherapy with multi-omics driven machine learning. NPJ Digit. Med. 7, 67 (2024).
Bai, X. et al. Development and validation of a genomic mutation signature to predict response to PD-1 inhibitors in non-squamous NSCLC: a multicohort study. J. Immunother. Cancer 8, https://doi.org/10.1136/jitc-2019-000381 (2020).
Pan, D., Hu, A. Y., Antonia, S. J. & Li, C.-Y. A gene mutation signature predicting immunotherapy benefits in patients with NSCLC. J. Thorac. Oncol. 16, 419â427 (2021).
Li, X. et al. PathwayTMB: a pathway-based tumor mutational burden analysis method for predicting the clinical outcome of cancer immunotherapy. Mol. Ther. Nucleic Acids 34, 102026 (2023).
Genova, C. et al. Therapeutic implications of tumor microenvironment in lung cancer: focus on immune checkpoint blockade. Front. Immunol. 12, 799455 (2021).
DuCote, T. J. et al. Using artificial intelligence to identify tumor microenvironment heterogeneity in non-small cell lung cancers. Lab. Investig. J. Tech. Methods Pathol. 103, 100176 (2023).
Jee, J. et al. Overall survival with circulating tumor DNA-guided therapy in advanced non-small-cell lung cancer. Nat. Med. 28, 2353â2363 (2022).
Tran, E. et al. Immunogenicity of somatic mutations in human gastrointestinal cancers. Science 350, 1387â1390 (2015).
Doroshow, D. B. et al. PD-L1 as a biomarker of response to immune-checkpoint inhibitors. Nat. Rev. Clin. Oncol. 18, 345â362 (2021).
Creelan, B. C. et al. Tumor-infiltrating lymphocyte treatment for anti-PD-1-resistant metastatic lung cancer: a phase 1 trial. Nat. Med. 27, 1410â1418 (2021).
Jiang, T. et al. HLA-I evolutionary divergence confers response to PD-1 blockade plus chemotherapy in untreated advanced non-small cell lung cancer. Clin. Cancer Res. Off. J. Am. Assoc. Cancer Res. 29, 4830â4843 (2023).
Wang, Z. et al. Comutations in DNA damage response pathways serve as potential biomarkers for immune checkpoint blockade. Cancer Res. 78, 6486â6496 (2018).
Teo, M. Y. et al. Alterations in DNA damage response and repair genes as potential marker of clinical benefit from PD-1/PD-L1 blockade in advanced urothelial cancers. J. Clin. Oncol. Off. J. Am. Soc. Clin. Oncol. 36, 1685â1694 (2018).
Li, C., Zhao, L., Yu, Y., Xiao, M. & Qi, C. DYNC2H1 mutation as an indicator stratified patients benefit from Immune Checkpoint Inhibitors in NSCLC. 40, e21052-e21052, https://doi.org/10.1200/JCO.2022.40.16_suppl.e21052 (2022).
Gao, Y. et al. Linc-DYNC2H1-4 promotes EMT and CSC phenotypes by acting as a sponge of miR-145 in pancreatic cancer cells. Cell Death Dis. 8, e2924 (2017).
Kim, I. K. et al. Acquired SETD2 mutation and impaired CREB1 activation confer cisplatin resistance in metastatic non-small cell lung cancer. Oncogene 38, 180â193 (2019).
Zeng, Z. et al. SETD2 mediates immunotherapy and radiotherapy efficacy via regulating DNA damage responses and genomic stability in lung adenocarcinoma. Genes Dis. 10, 336â339 (2023).
Han, S. et al. Alterations in the RTK/Ras/PI3K/AKT pathway serve as potential biomarkers for immunotherapy outcome of diffuse gliomas. Aging 13, 15444â15458 (2021).
Li, J. et al. Integrative clinical and molecular analysis of advanced biliary tract cancers on immune checkpoint blockade reveals potential markers of response. Clin. Transl. Med. 10, e118 (2020).
Jin, F. et al. ARID1A mutations in lung cancer: biology, prognostic role, and therapeutic implications. Trends Mol. Med. 29, 646â658 (2023).
Hu, G., Tu, W., Yang, L., Peng, G. & Yang, L. ARID1A deficiency and immune checkpoint blockade therapy: From mechanisms to clinical application. Cancer Lett. 473, 148â155 (2020).
Wang, L. et al. Effect and biomarker of immune checkpoint blockade therapy for ARID1A deficiency cancers. Biomed. Pharmacother. Biomed. Pharmacother.130, 110626 (2020).
Helming, K. C. et al. ARID1B is a specific vulnerability in ARID1A-mutant cancers. Nat. Med. 20, 251â254 (2014).
Zhu, G. et al. ARID1B deficiency leads to impaired DNA damage response and activated cGAS-STING pathway in non-small cell lung cancer. J. Cancer 15, 2601â2612 (2024).
Wang, B. et al. Expression of ARID1B is associated with poor outcomes and predicts the benefit from adjuvant chemotherapy in bladder urothelial carcinoma. J. Cancer 8, 3490â3497 (2017).
Berner, F. et al. Keratinocyte differentiation antigen-specific T cells in immune checkpoint inhibitor-treated NSCLC patients are associated with improved survival. Oncoimmunology 10, 2006893 (2021).
Ma, W. et al. Dynamic evaluation of blood immune cells predictive of response to immune checkpoint inhibitors in NSCLC by multicolor spectrum flow cytometry. Front. Immunol. 14, 1206631 (2023).
Xu, X. et al. A nomogram model based on peripheral blood lymphocyte subsets to assess the prognosis of non-small cell lung cancer patients treated with immune checkpoint inhibitors. Transl. Lung Cancer Res. 10, 4511â4525 (2021).
Ivanov, I. I. et al. The orphan nuclear receptor RORgammat directs the differentiation program of proinflammatory IL-17+ T helper cells. Cell 126, 1121â1133 (2006).
Garaycoechea, J. I. et al. Alcohol and endogenous aldehydes damage chromosomes and mutate stem cells. Nature 553, 171â177 (2018).
Webster, A. L. H. et al. Genomic signature of Fanconi anaemia DNA repair pathway deficiency in cancer. Nature 612, 495â502 (2022).
de Visser, K. E. & Joyce, J. A. The evolving tumor microenvironment: from cancer initiation to metastatic outgrowth. Cancer Cell 41, 374â403 (2023).
Taki, M. et al. Tumor immune microenvironment during epithelial-mesenchymal transition. Clin. Cancer Res. Off. J. Am. Assoc. Cancer Res. 27, 4669â4679 (2021).
Hu, J. et al. Tumor microenvironment remodeling after neoadjuvant immunotherapy in non-small cell lung cancer revealed by single-cell RNA sequencing. Genome Med. 15, 14 (2023).
Abulaiti, A. et al. Interaction between non-small-cell lung cancer cells and fibroblasts via enhancement of TGF-β signaling by IL-6. Lung Cancer 82, 204â213 (2013).
Shi, J. et al. Targeted blockade of TGF-β and IL-6/JAK2/STAT3 pathways inhibits lung cancer growth promoted by bone marrow-derived myofibroblasts. Sci. Rep. 7, 8660 (2017).
Fang, W. et al. Comprehensive genomic profiling identifies novel genetic predictors of response to anti-PD-(L)1 therapies in non-small cell lung cancer. Clin. Cancer Res. Off. J. Am. Assoc. Cancer Res. 25, 5015â5026 (2019).
Chen, J. et al. Genomic landscape of lung adenocarcinoma in East Asians. Nat. Genet. 52, 177â186 (2020).
Haas, B. J. et al. STAR-Fusion: Fast and Accurate Fusion Transcript Detection from RNA-Seq. https://doi.org/10.1101/120295 (2017).
Chen, S. et al. GeneFuse: detection and visualization of target gene fusions from DNA sequencing data. Int. J. Biol. Sci. 14, 843â848 (2018).
Newman, A. M. et al. FACTERA: a practical method for the discovery of genomic rearrangements at breakpoint resolution. Bioinformatics (Oxf., Engl.) 30, 3390â3393 (2014).
Graham, S. et al. Hover-Net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563 (2019).
Rosenthal, J. et al. Building tools for machine learning and artificial intelligence in cancer research: best practices and a case study with the pathML toolkit for computational pathology. Mol. Cancer Res.20, 202â206 (2022).
Acknowledgements
We thank all the patients and their families for their participation. We would like to thank Editage (www.editage.cn) for English language editing.
Author information
Authors and Affiliations
Contributions
Conceptualization: Y.H. and B.F.N.; methodology: Y.J.H., X.Q.L., J.X.M., J.L.W. and Z.F.L.; Investigation: Y.J.H., J.X.M., J.L.W., Z.F.L., L.J.W., F.Z., D.H., B.Y., H.T.T, S.Y.L., J.F.H and B.F.N.; formal analysis: Y.J.H., J.L.Z. and Y.Z.; Data Curation: Y.J.H., S.Y.L., J.C. and J.F.H.; writing â original draft: Y.J.H., X.Q.L., J.X.M. and S.Y.L.; writing â review & editing: Y.H., B.F.N., L.J.W., F.Z., D.H., J.C. and B.Y.; funding acquisition: Y.H. and B.F.N.; resources: W.H.X., H.W., J.Y.W., H.F.Q., H.J.G., X.S.L., Z.W.H., D.W.S., B.F.N. and Y.H; supervision: Y.J.H., X.Q.L. and D.H.; visualization: S.Y.L., J.F.H, J.L.Z.,Y.Z., J.Y.S. and J.C.; All authors reviewed and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare the following competing interests: S.Y.L., J.L.Z., Y.Z., J.Y.S. and J.C. are employed by Beijing ChosenMed Clinical Laboratory Co., Ltd. but declare no non-financial competing interests. All other authors declare no financial or non-financial competing interests.
Additional information
Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the articleâs Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâs Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Han, Y., Ma, J., Liu, Z. et al. Integrating genomic and pathological characteristics to enhance prognostic precision in advanced NSCLC. npj Precis. Onc. 9, 271 (2025). https://doi.org/10.1038/s41698-025-01056-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41698-025-01056-8









