Introduction

Kidney tumors represent the third most common urinary malignancy, accounting for 2 ~ 3% of all adult cancers1. More than 90% of these tumors are renal cell carcinomas (RCCs) arising from renal tubular epithelial cells, with clear cell renal cell carcinoma being the most prevalent subtype (~75%)2. In RCC, tumors can extend into the renal vein or inferior vena cava and form tumor thrombi (TTs), affecting 4 ~ 10% of patients with RCC3. The presence of TTs significantly impacts the clinical stage, surgical strategy, and prognosis of RCC4. Previous studies have established TT as an adverse prognostic factor that is strongly linked to worse survival5,6 and higher recurrence risk7,8. Moreover, the high level of TT increases surgical complexity and risk, further leading to increased perioperative morbidity (34%) and mortality (10%)9. Neoadjuvant therapy has emerged as a promising strategy because of its potential to simplify surgery and improve surgical outcomes by reducing the level and length of TT. However, no definitive consensus currently exists on neoadjuvant therapy for RCC with TT due to conflicting results across multiple clinical studies10,11,12. Thus, a detailed exploration of the mechanisms underlying TT formation and progression is crucial for identifying potential biomarkers and therapeutic targets to guide effective neoadjuvant therapy strategies.

Recent genomic studies have revealed that TT likely originates from subclones of the primary tumor (PT) with minimal additional mutations13,14,15. Notably, PTs from patients with TT presented a significantly higher mutational burden and genomic instability, particularly within chromatin-remodeling genes such as BAP1 and SETD2, than those without TT14,16. Bulk transcriptomic analysis revealed similar expression profiles between PTs and TTs, whereas PTs from TT-positive patients presented profiles distinct from those of TT-negative patients14. Specifically, PTs from patients with TT exhibited upregulation of angiogenic and epithelial-mesenchymal transition (EMT) pathways, along with increased cancer-associated fibroblast (CAF) abundance and reduced T cell and B cell infiltration14. While these findings suggest a potentially distinct multi-cellular ecosystem within the PT of RCC with TT, which contributes to TT formation, bulk sequencing methods lack the resolution to fully characterize the heterogeneity of cell populations and their functional states within the tumor microenvironment (TME). Although a recent single-cell study has elucidated the cellular and molecular heterogeneity between PTs and TTs17, a comprehensive understanding of the distinct TME landscape within PTs from RCC patients with and without TT remains lacking. Additionally, spatial transcriptomic studies have provided valuable insights into cell-cell communication and spatial distribution in RCC; for example, some studies revealed fibroblast-guided B cell maturation in tertiary lymphoid structures (TLSs)18 and macrophage-induced EMT via IL1B at the tumor-normal interface19. However, how these spatially organized cell populations within the TME orchestrate TT formation and progression remains largely unknown.

To address these knowledge gaps, we employ single-cell and spatial transcriptomic analyses in a cohort of 36 RCC patients (25 with TT and 11 without TT). We perform single-cell RNA sequencing (scRNA-seq) of 71 surgical samples, including adjacent tissues (ATs), primary tumors (PTs), and tumor thrombi. This includes 56 samples from 22 patients in our discovery dataset and 15 additional samples from 6 patients in our validation dataset. Moreover, we conduct spatial transcriptomics sequencing on 4 PT and 4 TT samples from 8 RCC patients with TT. These data are further integrated with 40 public spatial transcriptomics samples from diverse RCC tumor regions. This enables the comprehensive delineation of a high-resolution cell atlas and spatial landscape of RCC. We characterize critical cell populations that contribute to TT formation and progression, including their cellular compositions, spatial organizations, developmental trajectories, intercellular interactions, and underlying transcriptional regulation mechanisms. Our results not only shed light on the mechanisms underlying TT formation and progression but also provide potential biomarkers and drug targets for neoadjuvant therapy, paving the way for improved clinical management of RCC with TT.

Results

Single-cell profiling of tumor microenvironment in RCC with/without tumor thrombi

We performed single-cell RNA sequencing of 56 fresh surgical resections from 22 unique individuals with RCC (14 with TT, 8 without TT) using the BD Rhapsody platform. The samples included ATs from all 22 RCC patients, PTs from all 22 RCC patients, and TTs from 12 RCC patients with TT (discovery dataset, Fig. 1A). All patients were treatment-naïve and underwent radical nephrectomy. Tumor thrombectomy was performed in patients with TT. To control for tumor burden between RCC patients with and without TT, only individuals with preoperative tumor diameters <10 cm on magnetic resonance imaging were included. After histopathological examination, tumors from 19 out of the 22 patients were evaluated as clear cell renal cell carcinoma; two (P10 and P19) were evaluated as papillary renal cell carcinoma, and one (P21) was evaluated as sarcomatous renal cell carcinoma (Supplementary Data 1). Following stringent quality filtering (see Methods for details), 163,762 high-quality cells were retained for subsequent analysis. We integrated scRNA-seq data from all samples after batch correction (at both the patient and tissue levels) and performed graph-based clustering analysis, and defined 25 major cell types via canonical cell lineage markers20 (Fig. 1B, C, Supplementary Fig. 1A–C, Supplementary Data 2). Owing to the high heterogeneity in cancer cells, inferring copy-number variations (inferCNV)21 was used to further distinguish malignant from normal renal epithelial cells in each patient (Supplementary Fig. 1D). These cell types can be organized into four broad classes: immune, endothelial, epithelial, and mesenchymal cells. Following our single-cell dissociation protocol, immune cells accounted for the largest proportion (72.3%), followed by endothelial cells (14.1%), mesenchymal cells (8.5%), and epithelial cells (5.1%) (Supplementary Fig. 1E). To validate the annotations, we calculated the specificity scores of all genes in each cell type using genesorteR22 and compared our dataset with those of recent studies on kidney fibrosis23 and renal cell carcinoma24, and the results demonstrated high concordance (Supplementary Fig. 1F).

Fig. 1: scRNA-seq analysis revealed the significant TME remodeling in RCC with TT.
figure 1

A Schematic diagram of sampling strategy and scRNA-seq analysis workflow for our discovery dataset. B UMAP plot of 163,762 high-quality single cells from 56 freshly collected tissue samples (n = 22 AT, 22 PT, 12 TT samples) collected from 22 RCC patients in our discovery dataset, colored by cell type annotations. Cells were processed using the BD Rhapsody platform following mechanical and enzymatic dissociation, and filtered based on standard quality control criteria (mitochondrial content >20%, gene number <200 or >5000, UMI count <1000 or >20000; see Methods for details). Major cell types were defined based on canonical markers expression following graph-based clustering. C Heatmap showing representative marker genes in each of cell types identified in the discovery scRNA-seq dataset. D Systematic evaluation of alterations in cell-type proportion based on the comparison strategy shown in Supplementary Fig. 2A. Left: Intra-patient comparison of cell-type proportions among AT (n = 14), PT (n = 14), and TT (n = 12) samples from RCC patients with TT (the two-sided paired t-test). Right: Inter-patient comparison of cell-type proportions in the same tissue region (AT or PT) between RCC patients with and without TT (n = 14 and 8, respectively; the two-sided unpaired t-test). The dashed line indicates a false discovery rate (FDR) threshold of 0.2. Full results are provided in Supplementary Data 3. E ssGSEA analysis showing signature scores of fibroblasts (upper) and NK cells (lower) in PTs from patients with (n = 99) and without (n = 431) TT in TCGA-KIRC dataset. Patients were stratified by TT status manually annotated based on the TCGA pathology reports (Supplementary Data 4). P-values were determined using the two-sided Wilcoxon rank-sum test. Box plots display median, upper and lower quartiles, with whiskers indicating maximum and minimum data points within 1.5 × interquartile range. F Box plots showing the proportions of fibroblasts (upper) and NK cells (lower) in primary tumors (PTs) from patients with and without TT in the discovery dataset (n = 14 and 8, respectively) and the public scRNA-seq dataset (n = 19, without TT only) from Yu et al.25. As Yu et al.‘s dataset does not include patients with TT, three groups were established: PTs_with_TT (our data), PTs_without_TT (our data) and PTs_without_TT (Yu et al.). P-values were determined using two-sided unpaired t-test to evaluate pairwise difference among the three groups. Box plots display median, upper and lower quartiles, with whiskers indicating maximum and minimum data points within 1.5 × interquartile range. G Representative immunofluorescence images (left) and quantification (right) of fibroblast abundance in tumor sections from RCC patients with (n = 8) and without (n = 8) TT. DCN (green) marks fibroblasts, and DAPI (blue) stains nuclei. Representative images from each clinical subgroup are shown to visualize fibroblast abundance at the tissue level. Fibroblast quantification was performed using QuPath-based cell annotation. This analysis serves as orthogonal validation for Fig. 1D. Scale bar = 200 µm; Scale bar inset = 50 µm. P-values were determined using the two-sided Wilcoxon rank-sum test. Box plots show the distribution of the proportion of DCN+ cells across groups. The box represents the interquartile range (IQR, 25th–75th percentile), with the horizontal line indicating the median. Whiskers denote minimum and maximum values. Source data are provided as a Source Data file.

Tumor microenvironment shifts in RCC with TT

We initially explored the differences in cell-type composition among three tissue regions (AT, PT, and TT) within the same RCC patients with TT, using a paired-sample comparison strategy (Supplementary Fig. 2A, Supplementary Data 3). Substantial differences were observed between ATs and tumors (PTs or TTs) (Fig. 1D left, green and purple points). Normal renal epithelial cells, glomerular capillary ECs, and mesangial cells were specifically enriched in ATs, whereas cancer cells, tumor EC 1/2 and pericytes were predominantly present in tumors, reflecting tumorigenesis and angiogenesis in RCC. Notably, dendritic cells, NK cells, and B cells were also preferentially enriched in ATs, whereas regulatory T cells, CD8 + T cells, cycling T cells, and macrophages were more prominent in PTs and TTs, suggesting that altered immune responses are associated with malignancy. However, no statistically significant differences (all p > 0.05 and all FDR > 0.2) in cell type proportions were observed between PTs and TTs (Fig. 1D, left, orange points).

We further compared cell-type compositions within the same tissue region (AT or PT) between RCC patients with and without TT (Supplementary Fig. 2A, Supplementary Data 3). Intriguingly, patients with TT presented a higher proportion of fibroblasts (p = 0.033, FDR = 0.141) and a lower proportion of NK cells (p = 0.014, FDR = 0.075) in PTs compared to patients without TT (Fig. 1D right). These findings were confirmed by reanalyzing bulk RNA-seq data from The Cancer Genome Atlas (TCGA) KIRC dataset using ssGSEA (Fig. 1E, Supplementary Data 4) and through integrating two published scRNA-seq RCC datasets20,25 with our dataset (Fig. 1F, Supplementary Fig. 2B-D). We also independently validated the elevated abundance of fibroblasts in RCC with TT using multiplex immunofluorescence combined with QuPath-based quantification in an additional RCC cohort (Fig. 1G). Importantly, to minimize potential clinicopathological confounders (e.g., histological subtype, metastasis, TT level and sex), we performed a refined analysis in a homogeneous subset of localized clear-cell RCC male patients, which confirmed the robustness of our findings (Supplementary Fig. 2E). Furthermore, both increased fibroblast levels and decreased NK cell levels were associated with a poor prognosis in RCC patients (Supplementary Fig. 2F). Intricate cell-cell communication in the TME is closely associated with tumor progression26, and differential cell-type composition indicated potentially distinct patterns of cellular crosstalk. Therefore, we employed CellPhoneDB27 to analyze ligand-receptor interactions and identify differential cellular crosstalk patterns in PTs between patients with and without TT. We observed that fibroblasts presented the most abundant interactions in the TME, particularly with endothelial cells, cancer cells, and pericytes (Supplementary Fig. 3A). Furthermore, fibroblasts ranked first among all differential cross-talking cell types, suggesting their potential central role in orchestrating intercellular communication in RCC with TT (Supplementary Fig. 3B, C). In summary, our analyses revealed changes in the cell-type composition and cellular crosstalk patterns in RCC with TT, indicating a potential role for fibroblasts in the formation and progression of TT.

The spatial organization of RCC

The spatial location of cells within tumors has been revealed to be closely correlated with their function and state18,28. Therefore, comprehensive profiling of the spatial landscape of the RCC TME can offer valuable insights into the mechanisms underlying TT formation and progression. We performed 10X Visium spatial transcriptomics on tumor sections (4 PTs and 4 TTs) from eight RCC patients with TT. To cover a broader range of tumor regions and include patients with diverse stages, we collected 10X Visium data from 40 RCC tumor slices from two independent datasets18,19, including samples from the tumor-normal interface, tumor core and immune-positive region (Fig. 2A). To explore the spatial organization of RCC, we conducted spatial mapping of cell types in each spatial transcriptomics sample using cell2location29 based on our scRNA-seq data. In parallel, spatial domains were identified within each spatial transcriptomics sample using stLearn30, which integrates spatial location, morphology, and gene expression (Supplementary Fig. 4A). Subsequently, we integrated and clustered all slice-specific spatial domains across the 48 spatial transcriptomics samples based on their cell-type compositions, ultimately defining major spatial organizations referred to as cell niches (CNs). Our single-cell and spatial transcriptomic analyses included adjacent normal tissues, which contain proximal tubular cells that share markers with a proportion of cancer cells in RCC19,31. To ensure accurate spatial mapping of normal epithelial cells and cancer cells, we subclustered cancer cells prior to spatial mapping. We identified five subpopulations on the basis of their differentially expressed genes and enriched pathways (Supplementary Fig. 4B, C): proximal tubule like (PT-like, expressing NAT8 and ACSM2A), epithelial-to-mesenchymal transition like (EMT-like, expressing TGFBI and PLOD2), stress-like (expressing FOS and JUNB), immunoregulatory (expressing HLA-DRA and IFIT3) and cycling (expressing TOP2A and MKI67) cancer cells. These subpopulations are consistent with previously defined cancer cell states19 (Supplementary Fig. 4D).

Fig. 2: Spatial characterization of renal cell carcinoma (RCC) through integrative spatial transcriptomics analysis.
figure 2

A Schematic diagram of the spatial transcriptomics analysis workflow. All data were generated using the 10X Visium platform. Cell niche analysis was performed based on our spatial integration strategy to dissect the spatial organizations in RCC. For more details, see Supplementary Fig. 4A and Methods section. B–C UMAP projection of spatial domains (n = 835 domains) identified across 48 spatial transcriptomics samples. Each point represents a spatial domain defined via stLearn. In (B), spatial domains are colored by their assigned cell niche (CN1-CN23). One representative pie chart per CN indicates dominant cell types (>8%). In (C), each spatial domain is displayed as a pie chart showing its major cell-type composition (>8%). Pie chart colors correspond to the cell types shown in the legend (C). A complementary UMAP embedding colored by data source, sample type, and section identity is provided in Supplementary Fig. 5A to confirm effective batch correction. Detailed results are provided in Supplementary Data 5, summarizing dominant cell types and putative functions of each cell niche. D Bar plot showing the frequency of occurrence (x-axis) of each cell niche (y-axis) across 48 tumor sections. Colors in the bar and pie plots correspond to the cell niches in (B) and cell types in (C), respectively. E Network graph showing spatial associations of dominant cell types across all 23 CNs, delineating how distinct cell populations are spatially organized within the TME. Each node represents a dominant cell type, and an edge connects two cell types that were simultaneously observed within at least one CN. F Boxplot comparing the area of CN15 in PT samples from RCC patients with (n = 8) and without (n = 8) TT. P-values were determined using the two-sided unpaired t-test. Box plots display median, upper and lower quartiles, with whiskers indicating maximum and minimum data points within 1.5 × interquartile range. G Spatial mapping of cell niches (left) and dominant cell types of CN15 (right) in tumor thrombus (P33, n = 4,258 spots) and primary tumor (P30, n = 4,287 spots) sections from two RCC patients with TT. Arrows indicate the location of CN15. This panel provides a visual complement to Fig. 2F by depicting the spatial localization and cellular composition of CN15 at the tissue level. Data are representative of n  =  18 spatial transcriptomic slides. H Representative multiplex immunofluorescence images (left) and quantitative analysis (right) of CN15-like regions—defined by the spatial adjacency of fibroblasts (DCN, green) and EMT-like cancer cells (PLOD2, red)—in tumor sections from RCC patients without (n = 5) and with (n = 5) TT. DCN+ and PLOD2+ cells were annotated using QuPath, and their interface regions were manually delineated and quantified using Fiji software. The cumulative area of each interface region was then normalized to the total area of the corresponding tumor section. Box plots show the distribution of the interface area (% tissue area) between DCN+ cells and PLOD2+ cells across groups. The box represents the interquartile range (IQR, 25th–75th percentile), with the horizontal line indicating the median. Whiskers denote minimum and maximum values. This analysis provides spatial validation of the findings shown in Fig. 2F. Scale bar = 200 µm; Scale bar inset = 50 µm. P-values were determined using the two-sided Wilcoxon rank-sum test. Source data are provided as a Source Data file.

By integrating spatial domains across all spatial transcriptomics samples, we identified 23 distinct CNs within the RCC TME classified as either cancer-cell-less (CN1-11) or cancer-cell-enriched (CN12-23) CNs (Fig. 2B, C, Supplementary Fig. 4E, Supplementary Data 5). To mitigate potential confounding effects, batch effects related to data source, sample type, and section identity were sequentially corrected using Harmony, ensuring that both clustering and UMAP projection reflected true biological signaling (Supplementary Fig. 5A). The absence of batch-specific clustering supported the biological relevance of the identified CNs. The cancer-cell-less CNs were enriched primarily with cells involved in humoral immunity (CN1-4), fibroblasts (CN5-8) and epithelial cells (CN9-11). The cancer-cell-enriched CNs included those dominated by specific cancer cell subpopulations: CN17 (EMT-like), CN20 (PT-like), CN16 (cycling & EMT-like), CN18/CN22/CN23 (mixed), and those composed of cancer cells and other cell types: pericytes/endothelial cells (CN12, CN19), B cells (CN21), CD8+ T cells/macrophages (CN13, CN14), fibroblasts (CN15). Notably, the cell-type compositions of these CNs recapitulated known tissue structures, such as three structures associated with tumor-infiltrating B cells32 [tertiary lymphoid structures: CN1-2 (Supplementary Fig. 5B); stromal infiltrates: CN4-5 (Supplementary Fig. 5C); intratumor infiltrates: CN21 (Supplementary Fig. 5D)] and various vascular structures [artery: CN8 (Supplementary Fig. 5E); intratumor vasculature: CN12, CN19 (Supplementary Fig. 5F); glomerulus: CN10 (Supplementary Fig. 5G)]. Analysis of each CN occurrence across 48 tumor sections revealed that cancer-cell-enriched CNs comprised four of the top five most frequent niches (Fig. 2D). The first and fourth most prevalent CNs corresponded to intratumor vasculature (CN12, CN19), which is consistent with the typically exacerbated angiogenesis of RCC2,33. Notably, the third and fifth most frequent CNs were enriched in fibroblasts (CN7, CN15), suggesting a potentially broad role for fibroblasts in RCC. Further analysis of the cell-type spatial organization network (see Methods section) revealed that fibroblasts interact with diverse cell types to form distinct CNs (Fig. 2E): B cells/plasma cells/CD4+ T cells (CN3/CN4/CN5), macrophages (CN6), vSMCs (CN8), epithelial cells (CN9), and cancer cells (CN15). These results suggest potential functional heterogeneity within the fibroblast population in RCC.

To investigate differences in the spatial organization of primary tumors between patients with and without TT, we quantified the area of each cell niche within primary tumors based on the number of assigned spots. Spatial transcriptomic samples from the tumor-normal interface were excluded due to their limited comparability across samples, allowing us to focus specifically on tumor regions. After assigning TT status based on clinical information for our dataset and pT stage for public datasets, 16 spatial transcriptomic samples from primary tumors (8 with TT and 8 without TT) were included in this analysis (Supplementary Data 6). Among the 23 identified CNs, CN15—a niche enriched in EMT-like cancer cells, cycling cancer cells, and fibroblasts—exhibited a significant expansion in primary tumors from RCC patients with TT (Fig. 2F, G, Supplementary Data 6). To validate this observation, we performed multiplex immunofluorescence staining on RCC tissue microarrays and conducted quantification analysis of CN15-like regions, which confirmed a significant expansion in primary tumors from patients with TT (Fig. 2H). Importantly, this CN15 expansion remained robust after adjusting for dataset source and sample type (Supplementary Fig. 5H). These findings suggest fibroblast-cancer cell interactions may contribute to TT formation, in line with the known roles of CAFs, EMT, and cancer cell proliferation in tumor progression34. In summary, we constructed a spatial map of RCC by integrating 48 tumor sections and further highlighted the key role of fibroblasts in RCC and TT development.

Stromal remodeling during RCC tumorigenesis and progression

To further explore the heterogeneity within the fibroblast population, we conducted subcluster analysis of fibroblasts together with other mesenchymal cells (pericytes, vSMCs and mesangial cells). On the basis of differential expression analysis and the established cell markers23, we identified 12 mesenchymal cell subsets, including two vSMC subsets, four pericyte subsets, five fibroblast subsets and mesangial cells (Fig. 3A, B, Supplementary Fig. 6A–C, Supplementary Data 7). Perivascular cells (vSMCs and pericytes) comprised the largest proportion (82.2%), further supporting the well-characterized highly vascularized nature of RCC (Supplementary Fig. 6D). Both vSMC 1 and vSMC 2 exhibited high expression of canonical smooth muscle marker genes (MYH11, TAGLN) but expressed distinct additional smooth muscle-related genes (vSMC 1: RERGL; vSMC 2: CSRP1, S100A4). Similarly, the expression levels of pericyte-related genes varied between pericyte 1 (RGS5) and pericyte 2 (CYGB and CD248). Interestingly, S100A435, CYGB36 and CD24837 expression in perivascular cells has been reported to be associated with vascular remodeling. Differential cell-type composition analysis revealed that pericyte 2 and vSMC 2 exhibited increased proportions in tumors (PTs or TTs), while vSMC 1 was primarily present in ATs (Fig. 3C left, Supplementary Data 3). In addition, spatial mapping of mesenchymal subpopulations across the 23 defined cell niches revealed higher proportions of pericyte 1 and vSMC 1 in CNs of ATs (CN9-11) compared to pericyte 2 and vSMC 2, respectively (Supplementary Fig. 7A). Differential expression analysis showed both pericyte 2 and vSMC 2 highly expressed extracellular matrix (ECM) remodeling-related genes (e.g., COL4A1, COL4A2, LAMA4, SPARC, and ADAMTS4) and stemness genes (e.g., PRRX1) (Supplementary Fig. 7B). These results indicated the alterations in the perivascular cell state and potential vascular remodeling during RCC tumorigenesis.

Fig. 3: Stromal remodeling during RCC tumorigenesis and TT formation.
figure 3

A UMAP plot of mesenchymal cells (n = 13,965 cells) from our discovery dataset (n = 22 patients; n = 22 AT, 22 PT, and 12 TT samples), colored by mesenchymal cell subsets. For the distribution of these cells by tissue region (AT, PT and TT), clinical subgroup (with and without TT), and patient identity, see also Supplementary Fig. 6A, B. B Heatmap showing representative marker genes for each mesenchymal cell subset identified in the discovery dataset. C Systematic evaluation of alterations in the proportions of mesenchymal cell subsets in our discovery dataset, based on our comparison strategy (refer to Supplementary Fig. 2A). For more details, see the legend for Fig. 1D. Full results are provided in Supplementary Data 3. D Schematic diagram of the scRNA-seq workflow used for the validation dataset, which applied a negative selection strategy to enrich for mesenchymal and epithelial cells. E ssGSEA analysis showing the enrichment scores of FAP+ fibroblast (upper) and CYSLTR2+ fibroblast (lower) signatures—derived from our discovery dataset —in ATs (right; n = 18 and 54, respectively) and PTs (left; n = 99 and 431, respectively) from RCC patients with and without TT in the TCGA-KIRC bulk RNA-seq dataset. P-values were determined using the two-sided Wilcoxon rank-sum test. Box plots display median, upper and lower quartiles, with whiskers indicating maximum and minimum data points within 1.5 × interquartile range. F Box plots showing the proportions of FAP+ fibroblasts (upper) and CYSLTR2+ fibroblasts (lower) in PTs from patients with and without TT, based on scRNA-seq data from both our discovery dataset (n = 8 and n = 5, respectively; PT samples with <50 mesenchymal cells were excluded) and the Yu et al25. dataset (n = 19, without TT only). Groups were defined as in Fig. 1F, and P-values were calculated using the two-sided unpaired t-test. Box plots display median, upper and lower quartiles, with whiskers indicating maximum and minimum data points within 1.5 × interquartile range. G Stacked bar plots showing the proportions of mesenchymal cell subsets in PTs from each patient in our validation dataset (n = 6). We highlighted FAP+ fibroblasts and CYSLTR2+ fibroblasts in this plot. Patient IDs were colored by TT status (blue: with TT; red: without TT). H Representative multiplex immunofluorescence images (left) and quantification (right) of FAP+ fibroblasts—defined as DCN+FAP+ double-positive cells—in tumor sections from RCC patients without (n = 8) and with (n = 8) TT. DCN (green) marks fibroblasts, and FAP (red) labels the specific fibroblast subset. FAP+ fibroblasts were annotated and quantified using QuPath. Box plots show the distribution of the proportion of DCN+FAP+ cells across groups. The box represents the interquartile range (IQR, 25th–75th percentile), with the horizontal line indicating the median. Whiskers denote minimum and maximum values. This analysis provides orthogonal validation for Fig. 3C. Scale bar = 200 µm. Scale bar inset = 50 µm. P-values were calculated using the two-sided Wilcoxon rank-sum test. I Violin plot showing the cell2location-inferred proportions of each mesenchymal cell subset within CN15, based on spatial mapping of 48 spatial transcriptomics samples using reference signatures estimated from our discovery dataset. J Representative multiplex immunofluorescence images (left) and quantification (right) of CN15-like regions—refined based on the spatial adjacency of FAP+ fibroblasts (DCN+FAP+, green & red) and EMT-like cancer cells (PLOD2+, white)—in tumor sections from RCC patients without (n = 5) and with (n = 5) TT. DCN+FAP+ double-positive cells and PLOD2+ cells were annotated using QuPath. Their interface regions were manually delineated and quantified using Fiji software, followed by normalization to the total area of the tumor section. Box plots show the distribution of the interface area (% tissue area) of DCN+FAP+ cells and PLOD2+ cells across groups. The box represents the interquartile range (IQR, 25th-75th percentile), with the horizontal line indicating the median. Whiskers denote minimum and maximum values. This analysis provides spatial validation for Figs. 2H and 3I. Scale bar = 200 μm; Scale bar inset = 50 μm. P-values were determined using the two-sided Wilcoxon rank-sum test. Box plots display median, upper and lower quartiles, with whiskers indicating maximum and minimum data points within 1.5 × interquartile range. Source data are provided as a Source Data file.

For fibroblasts, we identified five subsets. Among them, FBLN5+ fibroblasts, MFAP5+ fibroblasts, and FAP+ fibroblasts specifically express established fibroblast markers reported in various tissues, such as FBLN538, SFRP139, MFAP540, SFRP241, FAP42, and THBS243,44 (Fig. 3B, Supplementary Fig. 7C). In contrast, ADH1B+ fibroblasts and CYSLTR2+ fibroblasts exhibited lower marker specificity, and co-expressed markers characteristic of both pericytes and fibroblasts (Supplementary Fig. 7C–E), suggesting a possible transitional state (to be discussed in the next section). Differential cell-type composition analysis revealed enrichment of FBLN5+ fibroblasts in ATs and CYSLTR2+ fibroblasts in PTs (Fig. 3C left, Supplementary Data 3). Furthermore, we observed that the proportions of ADH1B+ fibroblasts, FBLN5+ fibroblasts, and MFAP5+ fibroblasts gradually decreased from ATs to PTs to TTs, while FAP+ fibroblasts and CYSLTR2+ fibroblasts were the most abundant fibroblasts in PTs and TTs (Supplementary Fig. 8A). Owing to the potential bias of single-cell dissociation leading to a low capture rate of epithelial and mesenchymal cells (8.5% and 5.1%), we performed scRNA-seq with a negative selection strategy (CD45- CD31- cells) to enrich mesenchymal cells and epithelial cells in additional six RCC patients (3 with TT and 3 without TT) for validation (Fig. 3D, Supplementary Fig. 6E). We confirmed similar results in our validation dataset, suggesting a change in fibroblast states or subtypes during RCC tumorigenesis (Supplementary Fig. 8B).

To further determine which cell subpopulations are cancer-associated fibroblasts, we adopted multiple methods. First, in the TCGA-KIRC dataset, we calculated the enrichment scores of each fibroblast subset in ATs and PTs using ssGSEA. We found that FAP+ fibroblasts and CYSLTR2+ fibroblasts were significantly enriched in PTs, while other fibroblast subsets were enriched in ATs (Supplementary Fig. 8C). Second, we analyzed the composition of fibroblast subsets in the tumor-normal interface and tumor core in the spatial transcriptomics data. At the interface, we observed that the fibroblasts on the tumor side were composed mainly of FAP+ fibroblasts and CYSLTR2+ fibroblasts, while the fibroblasts on the adjacent side were composed mainly of ADH1B+ fibroblasts and FBLN5+ fibroblasts (Supplementary Fig. 8D, top). In the tumor core, FAP+ fibroblasts were the most abundant fibroblast subpopulation (Supplementary Fig. 8D, bottom). In addition, we also observed that FAP+ fibroblasts and CYSLTR2+ fibroblasts were present in some cancer-cell-enriched CNs, whereas FBLN5+ fibroblasts and ADH1B+ fibroblasts were significantly enriched in epithelial-cell-enriched CNs (Supplementary Fig. 8E). Third, to further determine which fibroblast subsets are potentially associated with malignant states, we utilized the Seurat label transfer algorithm45 to map cluster labels of mesenchymal cells from a normal adult kidney dataset46 onto our discovery and validation datasets. The results revealed that most of the FAP+ fibroblasts and CYSLTR2+ fibroblasts, as well as a subset of vSMC 2 and pericyte 2, were not assigned labels of normal kidney mesenchymal cells (Supplementary Fig. 8F), suggesting remodeling of the renal interstitium during tumor evolution. Interestingly, both FAP+ fibroblasts and CYSLTR2+ fibroblasts expressed multiple markers of CAFs and myofibroblasts, such as POSTN, FN1, and various collagen genes (Supplementary Fig. 7D). In conclusion, these results highlighted that FAP+ fibroblasts and CYSLTR2+ fibroblasts are potential CAFs in RCC.

Next, we focused on alterations in the composition of mesenchymal cells in different sample groups associated with TT: PT vs. TT (in patients with TT), ATwithout_TT vs. ATwith_TT, and PTwithout_TT vs. PTwith_TT (Supplementary Fig. 2A). Similarly, no significant differences (all p > 0.05 and all FDR > 0.2) in the proportions of mesenchymal cell subsets were found between PTs and TTs (Fig. 3C left, Supplementary Data 3). However, interestingly, in the comparison between patients with and without TT, we found that FAP+ fibroblasts and CYSLTR2+ fibroblasts showed increased proportions in PTs and ATs from patients with TT (Fig. 3C right, Supplementary Data 3). Subsequently, we confirmed this finding by contrasting the ssGSEA scores of fibroblast subset signatures between patients with and without TT in the TCGA-KIRC dataset and by integrating a publicly available scRNA-seq dataset25 with our datasets (Fig. 3E, F, Supplementary Fig. 6F). In our validation dataset, we observed that the proportions of FAP+ fibroblasts and CYSLTR2+ fibroblasts in the PTs of patients with TT consistently exceeded those in the PTs of patients without TT, further confirming previous observations (Fig. 3G). In addition, we also used multiplex immunofluorescence and QuPath-based quantification to verify the enrichment of FAP+ fibroblasts in RCC patients with TT (Fig. 3H). We then performed cell-cell interaction analysis and identified differential interaction patterns in PTs between patients with and without TT. Surprisingly, FAP+ fibroblasts emerged as the cell type with not only the most abundant interactions in the TME (Supplementary Fig. 9A) but also highly activated interactions (ranked third) in RCC with TT (Supplementary Fig. 9B, C). Interestingly, spatial mapping of mesenchymal cell subsets showed that the fibroblasts in CN15, which expanded substantially in patients with TT, were almost all FAP+ fibroblasts (Fig. 3I). This finding was further confirmed by multiplex immunofluorescence combined with quantitative analysis of CN15-like regions (Fig. 3J). In summary, our analysis revealed stromal remodeling during RCC tumorigenesis and progression, and suggested the essential role of FAP+ fibroblasts and CYSLTR2+ fibroblasts in TT formation and progression.

Pericyte-fibroblast transition in RCC: Pericyte 2 to CYSLTR2+ fibroblasts to FAP+ fibroblasts

To further understand the relationship between FAP+ fibroblasts and CYSLTR2+ fibroblasts, we estimated the mesenchymal cell signatures23 and ECM score (Gene Ontology) via ssGSEA. We found that FAP+ fibroblasts presented high levels of the (myo)fibroblast signature and moderate levels of the perivascular cell signature, while CYSLTR2+ fibroblasts exhibited low levels of the (myo)fibroblast signature and high levels of perivascular cell signature (Supplementary Fig. 10A). ECM analysis revealed that FBLN5+ fibroblasts, MFAP5+ fibroblasts and FAP+ fibroblasts expressed the highest levels of glycoprotein and proteoglycan genes. Notably, FAP+ fibroblasts and CYSLTR2+ fibroblasts expressed the highest levels of collagen genes known to promote tumor invasion and metastasis47 (Supplementary Fig. 10B). Interestingly, tumor pericytes (pericyte 2 and cycling pericytes) also highly expressed collagen genes, even higher than other fibroblasts. Further analysis of the expression profiles of FAP+ fibroblasts, CYSLTR2+ fibroblasts and pericyte 2 revealed that CYSLTR2+ fibroblasts expressed a gradient of FAP+ fibroblast and pericyte 2 genes from cells with high expression of pericyte 2 genes and low FAP+ fibroblast genes to cells with low pericyte 2 genes and high FAP+ fibroblast genes (Supplementary Fig. 10C, D). In addition, FAP+ fibroblasts, CYSLTR2+ fibroblasts and pericyte 2 showed similar interaction patterns (Supplementary Fig. 10E). The documented transition between pericytes and myofibroblasts in renal fibrosis and other pathologies23,48 suggests that CYSLTR2+ fibroblasts might represent a transitional cell state between pericyte 2 and FAP+ fibroblasts.

To further investigate this hypothesis, we performed pseudotime trajectory analysis and inferred a potential trajectory from pericyte 2 to CYSLTR2+ fibroblasts, and ultimately to FAP+ fibroblasts (Fig. 4A). PseudotimeDE analysis49 revealed a gradual transition in gene expression along the inferred trajectory, with CAF-associated genes (e.g., MGP, LTBP1, COL14A1, THBS2) progressively upregulated and pericyte-associated genes (e.g., MCAM, CSPG4, CYGB, CD248) gradually downregulated. Notably, CYSLTR2+ fibroblasts were positioned at an intermediate pseudotime state, exhibiting overlapping expression profiles of both lineages and a transient upregulation of a distinct subset of genes (e.g., LRRC17, IGFBP5, AGT, CYSLTR2) (Fig. 4B). For validation, we analyzed three additional datasets, including our validation datasets and two published ccRCC scRNA-seq datasets25,50 (Supplementary Fig. 6E, F and 11A). Consistent with our initial observations, all three datasets displayed similar inferred trajectories and gradient changes in transcriptional programs (Supplementary Fig. 11B-D). With single-cell regulatory network inference (SCENIC51), we further inferred a pseudotime trajectory from pericyte 2 to FAP+ fibroblasts based on the regulon activity matrix, confirming the transition from pericyte 2 to FAP+ fibroblasts, with CYSLTR2+ fibroblasts distributed in the transitional position (Supplementary Fig. 11E, F). Moreover, the activity of FAP+ fibroblast-specific regulons (e.g., CREB3L1, STAT3 and GATA6) and pericyte 2-specific regulons (e.g., MEF2C, FOXC1) gradually changed along the trajectory. CYSLTR2+ fibroblasts exhibited intermediate activity of these regulons, further supporting their transitional state. To validate the transition orthogonally, we conducted RNA velocity analysis52 in mesenchymal cells. We observed directional velocity flow from pericyte 2 to CYSLTR2+ fibroblasts, and further to FAP+ fibroblasts, which was further supported in the validation cohort (Fig. 4C, Supplementary Fig. 11G). Intriguingly, spatial transcriptomics data unveiled a spatial gradient distribution from pericyte 2 (and CYSLTR2+ fibroblasts) to FAP+ fibroblasts at the edge of the tumor nests (Fig. 4D). Notably, we observed a potential transitional stage of cells that co-expressed PDGFRB, THBS2 and CSPG4 using RNA in situ hybridization, supporting pericyte-fibroblast transition in RCC (Fig. 4E). To summarize, our findings suggest that tumor pericytes serve as the main cell origin of FAP+ fibroblasts and that CYSLTR2+ fibroblasts represent a transitional cell state during the pericyte-fibroblast transition.

Fig. 4: Pericyte-fibroblast transition in RCC.
figure 4

A Upper: Diffusion map embedding of pericyte 2 (n = 3,274 cells), CYSLTR2+ fibroblasts (n = 529 cells) and FAP+ fibroblasts (n = 503 cells) based on the gene expression profile from our discovery dataset (n = 22 patients; n = 22 AT, 22 PT, and 12 TT samples). Lower: Slingshot-inferred pseudotime on the same embedding above. B Heatmap showing the top 200 pseudotime-associated genes identified by PseudotimeDE. Representative genes associated with cell subsets (tumor pericytes, CYSLTR2+ fibroblasts and FAP+ fibroblasts) are annotated on the right. C RNA velocity of mesenchymal cells (n = 13,965 cells) in our discovery dataset (n = 22 patients; n = 22 AT, 22 PT, and 12 TT samples), highlighting the velocity flow of pericyte 2 to CYSLTR2+ fibroblasts and subsequently to FAP+ fibroblasts. D Spatial mapping of pericyte 2, CYSLTR2+ fibroblasts, FAP+ fibroblasts in 4 representative RCC sections, showing only spots where the cell2location-inferred proportion of these cells exceeds 8% (upper left, n = 243 spots; upper right, n = 1079 spots; lower left, n = 1,866 spots; lower right, n = 3861 spots). Arrows indicate regions with potential pericyte-fibroblast transitions. E Representative RNA in situ hybridization images showing CSPG4 (pericyte, green), THBS2 (FAP+ fibroblast, white), PDGFRB (mesenchymal cell, red) and DAPI (blue) in RCC sections. Arrows indicate cells co-expressing CSPG4 and THBS2, representing a transitional state between pericytes and FAP+ fibroblasts. Data shown are representative data from 4 biologically independent replicates. Similar results were observed in all replicates. F Schematic of the computational workflow used to identify key transcription factors involved in the pericyte-fibroblast transition. G Pairwise correlation analysis (Spearman) among the expression, chromatin accessibility, and regulon activity of transcription factors in Long et al.50 dataset (red, dataset 1; n = 1 patient, 1 PT sample, 486 cells) and Yu et al.25 dataset (blue, dataset 2; n = 19 patients, 19 PT samples, 2692 cells). Gene names were labeled when all three transcription factor metrics were mutually correlated (Spearman correlation coefficients > 0.1) and showed pseudotime-dependent dynamics, as determined by PseudotimeDE (the one-sided permutation test with Benjamini–Hochberg correction; adjusted p < 0.05). For CREB3L1, p-values for expression, chromatin accessibility, and regulon activity along pseudotime were 0.0032, 0.0016, and 1.1 × 10−28 in the Long et al. dataset, and 2.8 × 10−5, 4.6 × 10−5, and 1.4 × 10−23 in the Yu et al. dataset. H Gene expression (left), regulon activity (middle), and chromatin accessibility (right) of CREB3L1 over pseudotime in Long et al.50 dataset (upper, dataset 1; n = 1 patient, 1 PT sample, 486 cells) and Yu et al.25 dataset (lower, dataset 2; n = 19 patients, 19 PT samples, 2692 cells). Error bands represent 95% confidence intervals around the fitted curve. I Pairwise correlation analysis (Spearman) among the expression, regulon activity, and binding activity of transcription factors in Long et al.50 dataset (red, dataset 1; n = 1 patient, 1 PT sample, 486 cells) and Yu et al.25 dataset (blue, dataset 2; n = 19 patients, 19 PT samples, 2692 cells). Gene names were labeled when all three transcription factor metrics were mutually correlated (Spearman correlation coefficients > 0.1) and showed pseudotime-dependent dynamics, as determined by PseudotimeDE (one-sided permutation test with Benjamini–Hochberg correction; adjusted p < 0.05). For MEF2C, p-values for expression, regulon activity, and binding activity along pseudotime were 8.1 × 10−10, 3.7 × 10−24, and 3.0 × 10−41 in the Long et al. dataset, and 2.7 × 10−11, 7.5 × 10−13, and 2.1 × 10−28 in the Yu et al. dataset. J Gene expression (left), regulon activity (middle), and binding activity (right) of MEF2C over pseudotime in Long et al.50 dataset (upper, dataset 1; n = 1 patient, 1 PT sample, 486 cells) and Yu et al.25 dataset (lower, dataset 2; n = 19 patients, 19 PT samples, 2692 cells). Error bands represent 95% confidence intervals around the fitted curve. Source data are provided as a Source Data file.

To investigate whether the pericyte-fibroblast transition differs between PTs from RCC patients with and without TT, we analyzed the pseudotime distribution along the inferred cell trajectory in our discovery dataset (Fig. 4A). Cells derived from PTs of patients with TT exhibited significantly higher pseudotime values than those from patients without TT (Supplementary Fig. 11H), suggesting a more advanced progression along the pericyte-fibroblast transition. This observation was consistent with our previous findings (Fig. 3C–H) which showed that CYSLTR2+ fibroblasts and FAP+ fibroblasts, representing transitional and terminal cell states in the pericyte-fibroblast transition, were enriched in PTs from RCC with TT. In addition, tumor pericytes from PTs of patients with TT displayed elevated expression of genes associated with myofibroblast and CAF phenotypes (Supplementary Fig. 11I), suggesting a transcriptionally primed state toward fibroblast transition. Furthermore, these pericytes showed significant enrichment of mTORC1 signaling, MYC targets, and glycolytic pathways (Supplementary Fig. 11J, Supplementary Data 8), all of which are functionally associated with the pericyte-fibroblast transition53,54,55. Notably, tumor endothelial cells in PTs from RCC with TT expressed significantly higher levels of PDGFB (Supplementary Fig. 11K), a key ligand known to promote pericyte-fibroblast transition via PDGFRβ signaling48,56,57. Collectively, these findings indicated robust activation of the pericyte-fibroblast transition in RCC with TT and highlighted its potential role in TT formation.

Regulatory mechanisms of the transition from pericyte 2 to FAP+ fibroblasts

Chromatin accessibility plays an essential role in establishing and maintaining cellular identity58. To investigate the regulatory mechanisms driving the transition from pericyte 2 to FAP+ fibroblasts, we leveraged two single-cell multomics datasets (scRNA-seq and scATAC-seq) from previous RCC studies by Yu et al.25 and Long et al.50 (Fig. 4F). The cells were initially clustered on the basis of the scRNA-seq and scATAC-seq data independently. In terms of the scRNA-seq data, mesenchymal cell-specific genes identified within our datasets were used for cell subpopulation annotation (Supplementary Fig. 12A, I). To more precisely annotate cell subsets in scATAC-seq, we considered the combined evidence of chromatin accessibility of marker genes (Supplementary Fig. 12B, J) and annotations predicted by the label transfer algorithm (Supplementary Fig. 12C, K). We found that some cell labels from the scRNA-seq datasets, such as CYSLTR2+ fibroblasts, IFI+ pericytes and cycling pericytes, did not transfer well to the scATAC-seq datasets (Supplementary Fig. 12D, L). This suggested that some transitional states and cell functions may not undergo specific chromatin regulation. Finally, we refined the annotations of mesenchymal cell subsets with high consistency between the scRNA-seq and scATAC-seq data (Supplementary Fig. 12E, M). To gain a comprehensive understanding of transcriptional regulation during the transition, we inferred a trajectory from tumor pericytes to FAP+ fibroblasts based on the gradient changes in chromatin accessibility and paired the cells between scATAC-seq and scRNA-seq data by using an optimal matching approach59 (Supplementary Fig. 12F, N). Along this pseudotime trajectory, we observed high concordance in annotations between these two modalities. Notably, CYSLTR2+ fibroblasts were located in the intermediate position of the trajectory, highlighting transitional changes in chromatin accessibility in this population (Supplementary Fig. 12G, O).

Next, to explore the potential transcription factor (TF) dynamics during the transition, we performed correlation analysis on various metrics associated with TFs, including their expression, chromatin accessibility, regulon activity (assessed via SCENIC), and binding activity (assessed via chromVAR60). Moreover, we applied PseudotimeDE to statistically assess pseudotime-dependent dynamic changes across all TF-associated metrics. We observed a gradual increase in the expression, chromatin accessibility, and regulon activity of CREB3L1, which presented the highest correlation score and robust pseudotime-dependent changes in both datasets, along the pseudotime trajectory (Fig. 4G, H). A recent pancancer scRNA-seq study (not including RCC) has indicated that increased CREB3L1 expression may be associated with the activation of a CAF subtype (FAP-expressing)61, while our results further suggested that the increased chromatin accessibility of the CREB3L1 gene may contribute to this process. Moreover, SCENIC analysis revealed that CREB3L1 regulated the expression of multiple FAP+ fibroblast genes, such as FAP, THBS2, POSTN and collagen genes (Supplementary Data 9), suggesting a critical role in establishing FAP+ fibroblast identity. However, TF binding activity analysis indicated that the motif accessibility of CREB3L1 remained constant along the pseudotime axis (Supplementary Fig. 12H, P, upper). These results indicated that CREB3L1 was potentially activated in response to chromosome architecture changes during pericyte-fibroblast transition, but its influence on further structural alterations might be limited. To explore potential chromatin regulators, we filtered TF genes displaying concordant and pseudotime-dependent changes in expression, regulon activity, and binding activity. We observed a gradual decrease in all three metrics for MEF2C, which exhibited the highest concordance and significant dynamic changes along the pseudotime trajectory (Fig. 4I, J), while chromatin accessibility of MEF2C gene remained open throughout (Supplementary Fig. 12H, P, lower). MEF2C, a pioneer factor, silences fibroblast enhancers and activates cardiac enhancers, contributing to the direct reprogramming of fibroblasts into cardiac-like myocytes62,63. We also identified MCAM and CSPG4, pericyte marker genes, as potential target genes of MEF2C (Supplementary Data 9). Together, these findings suggested downregulation of MEF2C may be associated with pericyte-fibroblast transition priming. Notably, both CREB3L1 and MEF2C exhibited similar dynamic changes in RNA expression and regulon activity during the pericyte-fibroblast transition in our discovery dataset (Supplementary Fig. 12Q), supporting the robustness of the findings obtained from the two public datasets. In summary, we speculate on potential regulatory mechanisms of the pericyte-fibroblast transition using two single-cell multi-omics datasets, suggesting that CREB3L1 and MEF2C may regulate this transition through different mechanisms.

Inhibition of NK cells is mediated by fibroblasts, especially FAP+ fibroblasts

Building on our previous observations (Fig. 1D), we observed a significant reduction in the relative proportion of NK cells in the PTs of patients with TT (Fig. 5A, Supplementary Fig. 13A, C). Among the proliferating cells, only the number of cycling NK cells significantly decreased, suggesting specific inhibition of NK cell proliferation (Fig. 5B, Supplementary Fig. 13B, C). To further investigate the altered functions of NK cells, we compared the expression patterns of NK cells in the PTs of patients with and without TT. We found that NK cells from patients with TT expressed high levels of CXCR4, whereas those from patients without TT expressed high levels of CX3CR1 (Fig. 5C, D). These two receptors have been reported to exhibit opposite expression trends during NK cell maturation, with the expression of CXCR4 decreasing and that of CX3CR1 increasing64. CXCR4 is highly expressed in immature NK cells and is involved in bone marrow homing and the residence of NK cells64. Single-cell enrichment analysis (by AUCell51) revealed the lower enrichment scores for NK cell-mediated cytotoxicity, FcγR-mediated phagocytosis and transendothelial migration in NK cells from patients with TT (Fig. 5E upper, Supplementary Data 10). These combined observations suggest impaired trafficking of NK cells into tumors and NK cell dysfunction in patients with TT.

Fig. 5: FAP+ fibroblast promotes TT formation by regulating NK cells and EMT-like cancer cells.
figure 5

A, B Density plots of UMAP embeddings of all immune cells or cycling cells in PTs from RCC patients with (n = 14) and without (n = 8) TT, based on the discovery dataset. Random downsampling was performed to maintain the same number of cells (immune cells: n = 20,123; cycling cells: n = 774) in each group. NK cells and cycling NK cells were outlined with a red dashed line to highlight changes in their abundance. C Volcano plot showing differentially expressed genes in NK cells between PTs from RCC patients with (n = 14) and without (n = 8) TT, based on the discovery dataset. P-values were obtained using the two-sided Wilcoxon rank-sum test with Bonferroni correction. D Violin and box plots showing the expression levels of CXCR4 and CX3CR1 in NK cells from PT samples of RCC patients with (n = 14) and without TT (n = 8), based on the discovery dataset. Statistical significance was determined using the two-sided Wilcoxon rank-sum test. Box plots display median, upper and lower quartiles, with whiskers indicating maximum and minimum data points within 1.5 × interquartile range. E KEGG pathway activity in NK cells from PT samples of RCC patients with (n = 14) and without (n = 8) TT, based on the discovery dataset. Pathway activity scores were calculated using AUCell at the single-cell level, and differentially activated pathways were identified using the limma package with a two-sided moderated t-test. The six pathways shown were selected from the top ten most significantly different pathways based on adjusted p-values (Benjamini–Hochberg correction). Box plots display median, upper and lower quartiles, with whiskers indicating maximum and minimum data points within 1.5 × interquartile range. The full ranked list of pathways and statistical details are provided in Supplementary Data 10. F Linear regression plot showing a significant negative correlation (two-sided one-sample t-test) between the proportions of NK cells and total fibroblasts in PT samples (n = 22, discovery dataset), based on the discovery dataset. Error bands represent 95% confidence intervals around the fitted curve. G Dot plot showing differential ligand-receptor crosstalk between fibroblasts and NK cells in PTs from RCC with (n = 14) and without (n = 8) TT, based on the discovery dataset. H Dot plot showing the cellular crosstalk (TGFB1-TGFBR1 and CXCL12-CXCR4) between NK cells and five fibroblast subsets, based on the discovery dataset (n = 22 patients; n = 22 AT, 22 PT, and 12 TT samples). CellPhoneDB was used to compute P-values using the one-sided permutation test to assess the statistical significance of ligand-receptor interactions. I qRT-PCR analysis of FAP mRNA expression in CAFs from RCC with and without TT (FAP+ CAFs from P_911 and FAP- CAFs from P_910) to determine the mRNA level of FAP. Data were presented as mean ± SD from 3 biological replicates (primary CAFs isolated from distinct tumor core regions). Statistical significance was assessed using two-sided Student’s t-test. J Cell growth ability was assessed using CCK8 assays at the specified time intervals in 786-O cell line treated with conditioned media (CM) from FAP+ and FAP- CAFs. Data were presented as mean ± SD from 3 biological replicates (primary CAFs isolated from distinct tumor core regions). Statistical analysis was performed using two-way ANOVA. K The migration ability of 786-O cell line treated with CM from different CAFs was measured using a transwell assay. Box plots show the distribution of cell counts across groups. The box represents the interquartile range (IQR, 25th–75th percentile), with the horizontal line indicating the median. Whiskers denote minimum and maximum values. Quantification was performed with n = 4 random fields. Two-sided Student’s t-test was used to assess statistical significance. Scale bar = 100 µm. L Images (left) and quantification (right) of all spheroids generated using Zsgreen-expressing 786-O cells co-cultured with FAP+ and FAP- CAFs (left). Circularity was quantified for 786-O cells (right). Spheroid circularity was quantified and normalized to spheroids without CAFs. Data were presented as mean ± SD from 3 biological replicates (primary CAFs isolated from distinct tumor core regions). Two-sided Student’s t-test. Scale bar = 300 µm. M Dot plot illustrating significant ligand-receptor interactions between FAP+ fibroblasts and cancer cell subsets (EMT-like and cycling), as inferred by CellPhoneDB, based on the discovery dataset (n = 22 patients; n = 22 AT, 22 PT, and 12 TT samples). N Spatial expression patterns of selected ligand-receptor genes (upper) and corresponding ssGSEA scores for integrin-related pathways (lower) in a representative TT sample (P33, n = 4,258 spots). Only spatial spots dominated by fibroblasts, EMT-like cancer cells, or cycling cancer cells (as inferred by cell2location) are shown. Data are representative of n  =  18 spatial transcriptomic slides. Source data are provided as a Source Data file.

We next analyzed whether changes in the proportion and function of NK cells were associated with fibroblasts. Pearson correlation analysis revealed a significant negative correlation between the relative proportion of total fibroblasts and both NK cell proportion (t(20) = –3.22, r = –0.584, p = 0.0043, 95% CI [–0.81, –0.22]; Fig. 5F) and cytotoxicity (t(20) = –1.84, r = –0.380, p = 0.081, 95% CI [–0.69, 0.05]; Supplementary Fig. 13D) in PTs. These findings suggest a potential role for fibroblasts in suppressing NK cells in patients with TT. To explore potential interactions between fibroblasts and NK cells, we performed ligand-receptor-based crosstalk analysis (CellPhoneDB) and highlighted the differential interaction pairs in PTs between patients with and without TT (Fig. 5G). Fibroblasts were found to preferentially interact with NK cells via CXCL12-CXCR4 and TGFB1-TGFBR1 in patients with TT. CXCL12-CXCR4 signaling has been reported to mediate fibroblast-induced quiescence of NK cells65, and TGFB1-TGFBR1 signaling is known to inhibit NK cell metabolism by reducing the rates of glycolysis and oxidative phosphorylation (OXPHOS), leading to NK cell dysfunction66. Consistent with these findings, we observed lower glycolysis and oxidative phosphorylation scores in NK cells from patients with TT using AUCell (Fig. 5E lower, Supplementary Data 10). To identify which fibroblast subpopulations are most closely associated with NK cell exclusion, we applied a beta-binomial regression model to evaluate the relationship between NK cell abundance and the proportions of five fibroblast subpopulations. This analysis revealed that ADH1B+ fibroblasts (z = −2.192, p = 0.028, β = −1.357, 95% CI = [−2.570, −0.1437]) and FAP+ fibroblasts (z = −1.709, p = 0.087, β = −1.254, 95% CI = [−2.691, 0.184]) were the strongest negative predictors of NK cell abundance (Supplementary Fig. 13E). Moreover, FAP+ fibroblasts showed stronger interaction with NK cells via CXCL12-CXCR4 and TGFB1-TGFBR1 (Fig. 5H). In summary, our findings revealed a significant reduction and dysfunction of NK cells in patients with TT, potentially driven by interactions with fibroblast, especially FAP+ fibroblasts.

FAP+ fibroblasts promote TT development by directly regulating cancer cells, especially EMT-like cancer cells

On the basis of our observations regarding CN15, we propose that FAP+ fibroblasts may directly regulate cancer cells to promote TT development. To determine the role of FAP+ fibroblasts in tumor progression, we isolated primary CAFs from RCC tissues (Supplementary Fig. 13F, G). We then compared the capacity of FAP+ and FAP- fibroblasts to promote tumor growth, migration and invasion in vitro. Employing CCK8 and transwell assays, we observed that FAP+ fibroblasts significantly enhanced the proliferation and migration of RCC cell lines (786-O and 769-P) relative to FAP- fibroblasts (Fig. 5I-K, Supplementary Fig. 13H–J, L–N). Furthermore, 3D spheroid invasion assays revealed that FAP+ fibroblasts exhibited a significantly higher capability to facilitate the invasion of 786-O compared with FAP- fibroblasts (Fig. 5L, Supplementary Fig. 13K). Collectively, these findings confirmed the tumor-promoting effect of FAP+ fibroblasts in RCC.

To further elucidate the underlying mechanism, we analyzed the interactions between FAP+ fibroblasts, EMT-like cancer cells and cycling cancer cells via CellPhoneDB. We observed strong interactions between FAP+ fibroblasts and both cancer cell subsets via ECM-integrin signaling (Fig. 5M). Integrins, the molecular bridges between cells and ECM, not only provide physical connections but also transmit signals from the tumor environment, mediating multiple cancer cell behaviors, such as cell migration and tumor invasion67,68. Moreover, integrins can mediate the linkage of actin cytoskeleton to ECM, which can stabilize cell-substrate adhesion during cell migration69. To confirm these inferred interactions, we further investigated whether the putative effects of ECM-integrin signaling on cancer cells were enhanced in the tumors of patients with TT because of the enrichment of FAP+ fibroblasts. To this end, we employed AUCell analysis on EMT-like and cycling cancer cells in both the discovery and validation datasets and compared the enrichment scores among different sample groups. As expected, EMT-like cancer cells from patients with TT exhibited significantly higher enrichment scores of integrin-related pathways in both datasets (e.g., ECM receptor interaction, regulation of actin cytoskeleton, substrate dependent cell migration, focal adhesion and apical junction), confirming enhanced ECM-integrin signaling in RCC with TT (Supplementary Fig. 13O). Notably, TTs consistently exhibited significant activation of substrate dependent cell migration compared with PTs from patients with TT in both datasets, suggesting that ECM-guided cancer cell migration may be associated with TT development. Although similar trends were observed in cycling cancer cells from the discovery dataset, statistical significance was not achieved in the validation dataset (Supplementary Fig. 13P). This may be attributed to the limited cell number or sample size for cycling cancer cells. In addition to ECM-integrin interactions, we also observed interactions mediated by CXCL12-CXCR4 and MMP2-ITGAV/ITGB3 between FAP+ fibroblasts and cancer cells. These interactions have been previously linked to the regulation of integrin expression and the promotion of cancer cell migration and invasion70,71. To further verify the relationship between cell-cell interactions and their potential effects, we focused on two representative spatial transcriptomics samples from RCC patients with TT, including one TT and one PT sample, both of which contained CN15. We found that several ligand-receptor gene pairs, including COL15A1-ITGA3, COL5A2-ITGB1, FBN1-ITGA5, MMP2-ITGAV/ITGB3, and CXCL12-CXCR4, exhibited spatial co-expression within CN15 and its adjacent regions, especially at the invasion front and at the interface between FAP+ fibroblasts and cancer cells (Fig. 5N upper, Supplementary Fig. 13Q, R). Notably, these regions also showed elevated ssGSEA scores for integrin-related pathways (Fig. 5N lower and Supplementary Fig. 13R), highlighting the potential synergistic effects of these interactions in promoting cancer cell migration and invasion. In summary, our analysis provided supporting evidence that FAP+ fibroblasts may directly modulate cancer cells, particularly EMT-like cancer cells, via multiple interactions (ECM-integrins, MMP2-ITGAV/ITGB3 and CXCL12-CXCR4). This modulation may enhance cancer cell migration and invasion and potentially contribute to TT formation.

FAP+ fibroblasts enrichment predicts poor prognosis and anti-VEGF resistance in RCC

To investigate the broader clinical value of FAP+ fibroblasts, we analyzed the associations between the FAP+ fibroblast signature and multiple cancer prognostic factors (clinical stage, T stage, pathological grade and overall survival) in the TCGA-KIRC dataset. We observed that advanced or high-grade RCC patients (stage III/IV, T3/4 or Grade 4) exhibited significantly higher enrichment of FAP+ fibroblasts (Fig. 6A), suggesting a potential role for FAP+ fibroblasts in RCC progression, such as tumor invasion and metastasis. Furthermore, high signature scores for both FAP+ fibroblasts and their precursors (CYSLTR2+ fibroblasts) were associated with worse overall survival (Fig. 6B). We then investigated whether FAP+ fibroblast enrichment might affect the response to therapy in advanced/metastatic RCC. We analyzed bulk RNA-seq data from the Javelin 101 cohort72 of patients treated with avelumab + axitinib (anti-PDL1+ anti-VEGF, n = 354) or sunitinib (anti-VEGF, n = 372), and from CheckMate cohorts73 of patients treated with nivolumab (anti-PD1, n = 181) or everolimus (anti-mTOR, n = 130). Strikingly, a high FAP+ fibroblast signature score was significantly associated with poor progression-free survival (PFS) in both arms of the Javelin 101 cohort (Fig. 6D) but not in either arm of CheckMate (Fig. 6E). This suggests a potential role for FAP+ fibroblasts in predicting the efficacy of anti-VEGF-based therapy in RCC. Interestingly, FAP+ fibroblasts consistently localized in close proximity to intra-tumoral vasculature (CN12, pericyte 2 and tumor EC 1) (Fig. 6F). We confirmed this finding via multiplex immunofluorescence (Fig. 6G). This suggested a potential mechanism by which FAP+ fibroblasts influence the efficacy of anti-VEGF-based therapy, although further functional experiments would be needed to elucidate the underlying mechanisms. In addition, in the CheckMate cohort (OS information unavailable for the Javelin 101 cohort), patients with high ssGSEA scores for the FAP+ fibroblasts signature had worse OS (Fig. 6C), which aligns with the results in the TCGA-KIRC dataset. To evaluate whether the prognostic association of the FAP+ fibroblast signature was independent of clinical stage and pathological grade, we performed multivariable Cox proportional hazards analysis in the TCGA-KIRC cohort. In this analysis, the FAP+ fibroblast signature score was not significantly associated with overall survival after adjusting for both clinical stage and pathological grade (Supplementary Fig. 14). In contrast, both higher clinical stage and pathological grade remained independently associated with worse prognosis. These findings suggested that FAP+ fibroblasts may reflect an aggressive TME rather than serve as an independent prognostic biomarker.

Fig. 6: FAP+ fibroblast enrichment predicts poor prognosis and anti-VEGF Resistance in RCC.
figure 6

A ssGSEA score of FAP+ fibroblast signature for the indicated clinical stage (Stage I–IV; n = 269, 57, 125 and 83 patients, respectively), T stage (T1–T4; n = 275, 69, 182 and 11 patients, respectively) and pathological grading (G1–G4; n = 14, 230, 207 and 78 patients, respectively) in the TCGA-KIRC dataset. P-values were determined by the two-sided Wilcoxon rank-sum test and adjusted for multiple comparisons using the Benjamini-Hochberg method. Box plots display median, upper and lower quartiles, with whiskers indicating maximum and minimum data points within 1.5 × interquartile range. B Kaplan–Meier analysis for overall survival in TCGA-KIRC dataset based on FAP+ fibroblast (left; high, n = 241; low, n = 292) and CYSLTR2+ fibroblast (right; high, n = 199; low, n = 334) signature scores using optimal cutoff method. P-values were determined by the two-sided log-rank test. C Kaplan–Meier analysis for overall survival in the nivolumab arm (left; high, n = 132; low, n = 49) and everolimus arm (right; high, n = 63; low, n = 67) of Checkmate cohorts73 based on FAP+ fibroblast signature score using optimal cutoff method. P-values were determined by the two-sided log-rank test. D Kaplan–Meier analysis for progression-free survival in the sunitinib arm (left; high, n = 204; low, n = 168) and avelumab + axitinib arm (right; high, n = 73; low, n = 281) of Javelin 101 cohort72 based on FAP+ fibroblast signature score using optimal cutoff method. P-values were determined by the two-sided log-rank test. E Kaplan–Meier analysis for progression-free survival in the nivolumab arm (left; high, n = 38; low, n = 143) and everolimus arm (right; high, n = 73; low, n = 57) of Checkmate cohorts73 based on FAP+ fibroblast signature score using optimal cutoff method. P-values were determined by the two-sided log-rank test. F Left: spatial mapping of cell niches in 2 representative samples (n = 1778 and 4972 spots, respectively) to show the close proximity of fibroblast-enriched cell niches (CN6) and intra-tumoral vasculature (CN12). Right: spatial mapping of tumor EC 1 and pericyte 2, FAP+ fibroblast in representative samples, showing only spots where the cell2location-inferred proportion of these cells exceeds 8% (n = 561 and 1285 spots, respectively). Data are representative of n  =  4 spatial transcriptomic slides. G Representative images of immunofluorescence staining of DCN (fibroblast, green), CSPG4 (pericyte, red), FAP (FAP+ fibroblast, yellow) and DAPI (blue) on a RCC section. Data shown are representative of three biologically independent replicates. Similar results were observed in all replicates.

Discussion

Tumor thrombectomy, the most common treatment for RCC with TT, remains one of the most challenging and complex urologic oncologic surgeries with high perioperative morbidity and mortality. Despite the promise of neoadjuvant therapy to downstage TT and thus reduce surgical risk, a significant portion of patients experience minimal benefit or even TT progression after receiving preoperative targeted therapy. This highlights the urgent need for a systematic investigation of the mechanisms underlying TT formation and progression to develop more effective neoadjuvant therapies. In the present study, we employed scRNA-seq and spatial transcriptomics to delineate the cellular and spatial landscape of RCC. Our findings revealed significant TME remodeling in RCC with TT, highlighting the central role of FAP+ fibroblasts in orchestrating TT formation and progression. These FAP+ fibroblasts may represent a promising therapeutic target for neoadjuvant therapy to achieve TT downstaging. Our study not only provides insights into the mechanism of TT development but also establishes a comprehensive cell and spatial atlas of RCC, serving as a valuable resource for future cancer research.

By analyzing the differences in cell-type compositions and cellular crosstalk patterns between various sample groups, we aimed to explore potential drivers of TT formation and progression. Our findings revealed similar cell-type compositions within the TME between PTs and TTs, aligning with prior bulk sample-based studies13,14,15. This similarity suggests that PTs from RCC with TT may have already acquired the ability to invade veins. Intriguingly, we observed significant differences in cell-type composition between RCC patients with and without TT, echoing the findings of a previous bulk transcriptomic study14. Notably, PTs from patients with TT exhibited a significant enrichment of FAP+ fibroblasts compared with those without TT, and these fibroblasts also displayed heightened interactions with other cell types within the TME. Moreover, we determined that FAP+ fibroblasts serve as cancer-associated fibroblasts in RCC by analyzing their distribution in the normal kidney, adjacent tissue, interface, and tumor core. FAP+ fibroblasts express high levels of genes that promote tumor progression, including POSTN74,75, THBS243,44, and CTHRC176. In addition, we found that FAP+ fibroblasts were enriched in RCC with higher clinical stage and pathological grade, further supporting their association with a more aggressive tumor phenotype. These findings collectively suggest that the enrichment of FAP+ fibroblasts in PTs from RCC patients with TT may be a key driver of TT formation and progression. These fibroblasts could potentially push the tumor into the vein through a “rear-driving” model, as proposed in a recent study on TT progression pattern77.

In parallel, we integrated 48 spatial transcriptomes from multiregional tumor sections to construct a spatial atlas of RCC and identified 23 cell niches, which were defined as spatially organized multicellular communities. The results of cell niche analysis provided insights into the TME and spatial organization of RCC. For example, cancer cells with diverse malignant potentials in RCC were found to establish spatially unique cellular neighborhoods. Specifically, less aggressive PT-like cancer cells preferentially interact with CD8+ T cells in CN14, suggesting higher immunogenicity to elicit stronger anti-tumor immunity. In contrast, more aggressive EMT-like cancer cells tend to cooperate with FAP+ fibroblasts to establish CN15, indicating a role in promoting immune evasion and tumor progression. Notably, we observed that fibroblast-enriched cell niches were broadly distributed across multiple RCC regions, and fibroblasts interacted with diverse cell types to establish various spatial organizations. Interestingly, these fibroblast-associated cell niches aligned with recently reported CAF-associated cellular neighborhoods78. Specifically, CN1 and CN2, analogous to the s4-CAF neighborhood, enriched in fibroblasts, CD4+ T cells, and B cells, were located within tertiary lymphoid structures or lymphoid aggregates. CN6, resembling the s3-CAF neighborhood and characterized by abundant fibroblasts and macrophages, was associated with an immunosuppressive microenvironment and poor response to immune checkpoint blockade. CN7, similar to the s2-CAF neighborhood and predominantly comprised FAP+ fibroblasts, was located in the tumor stroma and contributed to the formation of stromal barriers. CN15, corresponding to the s1-CAF neighborhood, enriched in FAP+ fibroblasts and aggressive cancer cells, was linked to tumor invasion and the establishment of immune barriers. In RCC, approximately 82% of patients exhibit intratumoral fibrosis (ITF), which is significantly correlated with adverse prognostic factors, including Fuhrman nuclear grade, intratumoral necrosis, and lymphovascular invasion79. These findings suggest that fibroblasts may serve as a scaffold in the tumorigenesis and progression of RCC and are widely involved in TME formation and remodeling. Notably, CN15, consisting of EMT-like cancer cells, cycling cancer cells, and FAP+ fibroblasts, was highly expanded in RCC with TT, suggesting that the enhanced interaction between FAP+ fibroblasts and cancer cells may be associated with TT development. As TT status in publicly available datasets was inferred based on pT stage (with pT1-pT2 considered non-TT), we cannot fully exclude the possibility that the observed CN15 expansion reflects general features of advanced RCC rather than alterations specific to TT formation. Nonetheless, given that TT represents a locally invasive manifestation of tumor progression, such spatial remodeling may constitute a foundational TME feature that facilitates TT formation. Furthermore, our analyses revealed that FAP+ fibroblasts potentially promote the migration and invasion of EMT-like cancer cells via the ECM-integrin, MMP2-ITGAV/ITGB3, and CXCL12-CXCR4 axes in RCC with TT. Interestingly, we observed increased activity of the apical junction pathway in EMT-like cancer cells from RCC with TT, suggesting that these cells may have undergone a partial or hybrid EMT state. These interactions and the observed cancer cell state have been reported to promote collective cancer invasion80, which exhibits phenotypes similar to TTs. Collectively, these findings suggest that collective cancer invasion may be a key mode of invasion within TTs and highlight the pivotal role of FAP+ fibroblasts in TT development, potentially through direct regulation of EMT-like cancer cells.

We also observed a significant reduction and dysfunction of NK cells in primary tumors from RCC with TT compared with those without TT. Furthermore, we proposed that FAP+ fibroblasts could suppress NK cell infiltration via the CXCL12-CXCR4 and TGFB1-TGFBR1 axes. In the context of breast cancer metastasis, activated hepatic stellate cells disrupt breast cancer dormancy in the liver by inhibiting the proliferation of NK cells through CXCL12-CXCR4 signaling65. Moreover, CAF-derived TGFβ has been shown to downregulate the expression of activating receptors on NK cells, thereby impairing their cytotoxicity81,82. We further observed that a reduction in NK cell abundance was associated with poor overall survival, which is consistent with the findings of a previous study in RCC83. Thus, our results suggest that FAP+ fibroblasts may mediate tumor immune evasion by hindering NK cell activity during TT development, which aligns with the previously established importance of FAP+ fibroblasts in TT development.

The spatial location and origin of CAFs serve as pivotal drivers of diverse functions and phenotypes. Various cell types within the TME, such as mesenchymal stem cells, epithelial cells, and endothelial cells, have been proposed as potential precursors of CAFs84. Here, we identified tumor pericytes as an essential source of FAP+ fibroblasts in RCC via multiple orthogonal methods, which is consistent with pericyte-myofibroblast transition observed in kidney fibrosis23,85,86. Importantly, FAP+ fibroblasts were spatially contiguous to intratumoral vasculature (CN12), showing a spatial gradient distribution from tumor pericytes to FAP+ fibroblasts. Disassociated pericytes from the tumor vasculature have been reported to gradually differentiate into CAFs and further promote tumor invasion and metastasis48. These findings suggest the crucial role of pericyte-fibroblast transition in TT development. Notably, enrichment of the FAP+ fibroblast signature was associated with worse PFS following anti-VEGF-based therapies (sunitinib and avelumab + axitinib), but not after anti-PD1 (nivolumab) or anti-mTOR (everolimus) treatment. Although additional multivariable Cox regression analyses are needed to determine the independence of this association, these findings suggested that FAP+ fibroblasts may specifically impair the efficacy of anti-VEGF-based therapy. Previous studies have reported molecular alterations in RCC under anti-VEGF-based therapy that are functionally linked to CAF activation and therapeutic resistance. In RCC patients receiving immune checkpoint inhibitors in combination with tyrosine kinase inhibitors (IO/TKI), Galectin-1 expression was significantly upregulated in non-responders and associated with poor PFS and increased CAF infiltration87. Galectin-1, produced by cancer cells, has been shown to activate CAFs and induce immunosuppression88, suggesting a potential role in IO/TKI resistance. Moreover, Soupir et al.89 identified significant upregulation of YES1 and enrichment of EMT pathway in RCC patients treated with IO/TKI. YES-associated protein 1 (YAP1), a key downstream effector of YES1 and the Hippo pathway, was linked to both EMT and CAF activation, and was a known linchpin in therapeutic resistance for many cancers90. Collectively, these findings supported a critical role for CAF activation in mediating resistance to anti-VEGF-based therapies. Mechanistically, sunitinib therapy has been reported to induce vessel wall thickening via the accumulation of perivascular FAP+ CAFs and collagen deposition, potentially hindering drug delivery and contributing to therapeutic resistance in RCC91. Similar mechanisms of CAF-driven drug resistance have been reported in other studies92,93. In addition, a recent study94 showed that CAF-derived CXCL3 activates CXCR2 on cancer cells, triggering the activation of ERK1/2 signaling pathway and promoting both tumor progression and sunitinib resistance in RCC. These findings collectively suggested that CAFs may mediate anti-VEGF therapeutic resistance through distinct mechanisms. Building upon these findings, our results further suggested that patients with high FAP+ fibroblast abundance might be more susceptible to pericyte-fibroblast transition in RCC, and anti-VEGF therapy might exacerbate this transition, potentially leading to the accumulation of perivascular FAP+ fibroblasts and impaired drug delivery. By integrating the scRNA-seq and scATAC-seq datasets, we further revealed that CREB3L1 and MEF2C serve as key TFs that regulate pericyte-fibroblast transition. Specifically, downregulation of MEF2C may initiate chromatin accessibility remodeling, which may contribute to activate CREB3L1 expression and further upregulates FAP+ fibroblast-specific genes. PDGF-BB-PDGFRβ signaling was previously reported to drive pericyte differentiation toward fibroblasts48, and PDGF-BB could inhibit MEF2C expression by inducing miR-448 expression in vSMC95. Future studies are warranted to further elucidate the precise mechanisms by which these TFs contribute to this complex cellular transformation.

Our study also has several limitations. Despite the use of multiple validation datasets to mitigate the risk of false positives, the small number of patients may still result in the underdetection of subtle TME alterations. Moreover, clinical heterogeneity poses a challenge to our study given the modest patient numbers. While a small number of patients with non-clear cell RCC or metastatic RCC were included, the majority of samples were derived from patients with localized ccRCC, which likely drove the principal findings. The inclusion of these patients reflects the clinical heterogeneity associated with TT, although it also introduces potential variability. Future studies are warranted to validate and further elucidate TT-associated mechanisms within specific RCC subtypes and clinical stages. Notably, in this study, most patients without TT were classified as pT1, with only one pT3a case, whereas all patients with TT presented with advanced-stage tumors (≥pT3a). As a result, our analysis primarily reflects baseline TME remodeling that is not specific to TT but may represent prerequisite conditions for TT formation. Although we restricted tumor burden (tumor size <10 cm) to reduce variation, the lack of direct control for tumor stage limits our ability to fully distinguish TT-specific TME features from general progression-associated alterations. As a complementary strategy, a stage-matched comparison between pT3a tumors with and without TT could delineate the directional determinants of renal vein invasion in RCC. Future studies incorporating both strategies will be essential for providing broader mechanistic insight into TT formation in RCC. In addition, the limited capture area (6.5 × 6.5 mm) and recommended section thickness (10 μm) for the current 10X Visium platform only allow the detection of the local spatial landscape of tumors. To comprehensively capture spatial architectures within RCC, we collected and integrated spatial transcriptomics data from multiregional tumor sections across multiple RCC patients and revealed a comprehensive spatial atlas of RCC. However, the identification of differential spatial architectures across sample groups requires multilayer sections and wider capture areas, which can facilitate the reconstruction of the spatial landscape of tumors in each individual. Furthermore, mRNA expression is well known to not always strictly reflect protein expression, and spatial transcriptomics cannot detect the content and distribution of extracellular proteins, such as extracellular matrix and secreted cytokines. Thus, spatial proteomics will be essential for better understanding FAP+ fibroblast-mediated ECM remodeling and cellular crosstalk. In addition to vascular invasion, RCC can exhibit local progression or disseminate to distant organs, such as the perinephric fat, adrenal glands or lungs. Importantly, not all RCC patients with TT develop distant metastases. This observation is supported by previous multi-region sequencing studies, which demonstrate that TT and distant metastases may originate from distinct subclones and reflect separable evolutionary trajectories13,96,97. Notably, the relationship between TT and metastasis appears to be highly heterogeneous across RCC patients. Although our study reveals fundamental TME remodeling associated with TT formation, the specific molecular drivers of directed vascular invasion and their complex association with metastatic competence in RCC remain incompletely understood. Future studies that include RCC patients with distinct patterns of disease progression will be critical for elucidating the heterogeneous mechanisms underlying TT formation and their potential links to metastatic evolution. Finally, additional experimental validation is needed to confirm the causality among FAP+ fibroblasts, EMT-like cancer cells, and NK cells, the transcriptional regulation of MEF2C and CREB3L1 in pericyte-fibroblast transition, as well as FAP+ fibroblast-mediated anti-VEGF resistance.

In conclusion, our study presents a cellular and spatial atlas of RCC and proposes that FAP+ fibroblasts may drive TT development through their interaction with EMT-like cancer cells and NK cells. Additionally, we revealed pericyte-fibroblast transition as a potential mechanism for FAP+ fibroblast enrichment and anti-VEGF resistance. In the future, approaches to inhibit pericyte-fibroblast transition and the interaction between FAP+ fibroblasts and EMT-like cancer cells or NK cells hold considerable clinical promise in downstaging TT and attenuating anti-VEGF resistance.

Methods

Human specimens

Human patient samples were collected with written informed consent and ethics approval by the ethics committee of the First Medical Center of PLA General Hospital (S2019-349-01). The informed consent included approval to publish de-identified individual participant metadata, including indirect identifiers such as age, sex (self-reported), and clinical information. No participant compensation was provided. Patient metadata, including age at diagnosis, sex, with/without TT, AJCC TNM stage, histological grade, histologic subtype, sampling site, and follow-up data, are provided as Supplementary Data 1. Sex and gender were not considered in the study design, as our research focused on general features of TT-associated TME remodeling.

Cell lines

The 786-O cell line was purchased from Pricella (#CL-0010), and the 769-P cell line was from our laboratory. Two cell lines were authenticated by STR (short tandem repeat) profiling (performed by Pricella). The mycoplasma contamination of two cell lines was tested using the mycoplasma detection kit (Vazyme, #D101-01), and the results were negative. These two cell lines were cultured and maintained in accordance with the American Type Culture Collection’s (ATCC) guidelines. Specifically, both two cell lines were grown in RPMI 1640 medium supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin.

Primary CAF culture

CAFs were isolated from RCC tissues. Fresh RCC tissue samples were washed 2–3 times with phosphate buffered saline (PBS; Pricella, #PB180327) and finely minced into small pieces (<1 mm3). Thirty minced small tissues were seeded onto 100 mm culture plates and allowed to adhere and proliferate. The culture medium consisted of Minimum Essential Medium α (MEMα; Gibco, #C12571500BT) supplemented with 10% fetal bovine serum (FBS; NEWZERUM, #FBS-CE500), 1% penicillin-streptomycin (Gibco, #15140-122), and 5 ng/mL fibroblast growth factor (FGF; MCE, #P7004). The medium was refreshed every three days to maintain optimal cell growth conditions. Fibroblasts that extended from the tissues were enzymatically detached using Trypsin-EDTA (Gibco, #25200-056) and transferred to new culture dishes for subculturing. Cells were maintained in complete medium at 37 °C in a humidified incubator with 5% CO2. All CAFs used for in vitro study were within seven passages. All primary cells were tested for mycoplasma using the mycoplasma detection kit (Vazyme, #D101-01), and the results were negative.

Sample processing

Immediately following surgical resection, tissue samples were processed for scRNA-seq. Each sample was first washed with cold phosphate-buffered saline (PBS) to remove residual blood and debris. Then, the tissue was mechanically dissociated into small pieces (approximately 1 mm³) on ice. Enzymatic dissociation was achieved by incubating the tissue fragments with a pre-warmed (37 °C) dissociation solution containing 625 μg/mL Liberase TM (Roche, 05401127001) and 50 μg/mL DNase I (Sigma, 9003-98-9) for 15 min with gentle rotation. Dissociation was stopped by adding complete DMEM medium. The resulting cell suspension was then passed sequentially through 70-μm and 40-μm cell strainers to remove tissue debris and obtain a single-cell suspension. The cell suspension was centrifuged at 500 g for 5 min at 4 °C to pellet the cells. Following red blood cell lysis, the cells were washed twice with PBS and finally resuspended at a final concentration of approximately 5 × 10⁶ cells/mL.

Single-cell suspensions were then incubated with a cocktail of fluorophore-conjugated antibodies for 30 min at 4 °C. The antibody panel included anti-CD326-PE-Cyanine7 (BioLegend, 369816), anti-CD45-BV510 (BioLegend, 368526), and anti-CD31-FITC (BioLegend, 303104). DAPI (TONBO Biosciences, 13-0868) was added just before flow cytometry analysis to exclude dead cells. The stained cells were analyzed and sorted using BD FACSAria™ Fusion (BD Biosciences) in BD FACSDiva software. Cells were initially gated based on FSC-A/SSC-A and FSC-A/FSC-H scatters to exclude debris and doublets, respectively. Viable single cells were then identified by negative DAPI staining. For the discovery datasets, viable single cells from 22 patients were used to generate a single-cell mRNA sequencing library. For validation datasets, single cells from an additional 6 patients were further sorted by gating on CD45-CD31- to enrich mesenchymal cells and CD326+ to enrich epithelial cells. Subsequently, the enriched mesenchymal and epithelial cells were mixed in a 1:1 ratio and used to prepare a single-cell mRNA sequencing library.

Single-cell library preparation and sequencing

The scRNA-seq libraries were generated using the BD Rhapsody™ scRNA-seq platform (BD Biosciences) according to the manufacturer’s instructions. Briefly, two to four samples (including adjacent tissue, primary tumor tissue, and, in some cases, tumor thrombus tissue from the same patients) were tagged with unique barcodes using the BD Single-Cell Multiplexing Kit (BD Biosciences). These tagged samples were then loaded onto a single BD Rhapsody™ cartridge for simultaneous processing. Following a 20-minute incubation at room temperature, cell lysis and capture of polyadenylated mRNA molecules occurred using barcoded magnetic beads. These beads were then retrieved magnetically from the microwells and pooled into a single tube for reverse transcription. Unique molecular identifiers (UMIs) and cell label information were added to each cDNA molecule during cDNA synthesis. Whole transcriptome amplification (WTA) and sample-tag sequencing libraries were generated following the manufacturer’s recommended workflow. The libraries were sequenced on the NovaSeq platform (Illumina) with a 150-bp paired-end run targeting a sequencing depth of 50,000 reads/cell. Paired-end FASTQ files were processed via the BD Rhapsody WTA Local bioinformatics pipeline with default parameters, as outlined in BD Rhapsody™ Sequence Analysis Pipeline User’s Guide (Bioinformatics Guides - BD Biosciences). The RSEC_MolsPerCell.csv files, single-cell expression matrices, were used for downstream bioinformatics analysis.

Spatial transcriptomics

The spatial transcriptomics was performed on paraffin sections of primary tumor tissues and tumor thrombus tissues. Selected samples were evaluated for size, shape and overall status to ensure optimal coverage of the 6.5 × 6.5 mm2 capture areas on Visium slides (10× Genomics). Additionally, samples were further screened based on RNA integrity number (RIN) values obtained using the Agilent 2100 Bioanalyzer. The spatial transcriptomics assay, Visium spatial gene expression slides, and reagents were utilized following the manufacturer’s instructions (10× Genomics). Libraries were prepared using TruSeq Illumina libraries and sequenced on the NovaSeq platform (Illumina) with a minimum sequencing depth of 50,000 read pairs per spatial spot. Images and sequencing reads were processed via 10x Genomics SpaceRanger (v 1.3.0).

Multiplex immunofluorescence

RCC primary tumor and tumor thrombus tissue samples were fixed in 4% formalin for 6–72 h and embedded in paraffin. The 4 μm paraffin sections were deparaffinized in xylene and rehydrated through a graded alcohol series. Antigen retrieval was performed by heating sections in citrate buffer (pH = 6) (ZSGB-BIO, ZLI-9064) in a microwave for 20 min at 95 °C, followed by a 30-min cool-down at room temperature. Multiplex fluorescence labeling was achieved using the NEON 9-color Allround Discovery Kit (NEON, NEFP950). Briefly, endogenous peroxidase activity was quenched with peroxidase blocker (ZSGB-BIO, PV-6000) for 15 minutes. Sections were then washed three times with PBS before blocking with 10% goat serum (ZSGB-BIO, ZLI-9056) to minimize nonspecific antibody binding. Primary antibodies were incubated for 2 h at 37 °C or overnight at 4 °C. Following washes, HRP-conjugated secondary antibody was incubated for 25 min at room temperature. TSA-dendron fluorophore was used to combine the secondary antibody. Finally, primary and secondary antibodies were thoroughly removed by heating the slides in elution buffer (Abcracker®, ABCFR5L) for 20 seconds at 95 °C using a microwave. This process allowed sequential labeling of each antigen with distinct fluorophores. Multiplex antibody panels included: DCN (Proteintech, 14667-1-AP, 1:100); FAP (Cell Signaling Technology, 66562, 1:100); PLOD2 (Proteintech, 66342-1-Ig, 1:100); NG2 (Cell Signaling Technology, 43916, 1:100). After all the antibodies were detected, the slides were imaged using a Zeiss LSM880 confocal laser scanning microscopy platform. QuPath (v0.6.0) was used to quantify the proportion of DCN+ and FAP+ cells. In brief, nuclei were segmented using DAPI, maximum and minimum nuclear areas were set to 400 μm3 and 10 μm3, respectively, and the threshold was set to 25. With the assistance of two experienced pathologists, the cells on the image were classified into DCN+ cells, FAP+ cells and PLOD+ cells, at least 15 cells of each type were selected. Finally, the proportions of DCN+ cells and FAP+ cells were calculated. Fiji software was used to quantify the cumulative area of CN15-like regions, defined by the spatial adjacency of fibroblasts and EMT-like cancer cells. Fibroblasts and EMT-like cancer cells were annotated using QuPath. With the assistance of two experienced pathologists, the CN15-like regions defined above, were manually delineated to quantify the cumulative area in each section by using Fiji software. The data is normalized based on the total area of the slices, in order to eliminate the interference caused by the slice sizes.

Cells were fixed in 4% formaldehyde for 10 min followed by permeabilization with 0.5% Triton X-100 for 5 min, then incubated for 30 min in PBS-T containing 5% goat serum, followed with the following antibodies: FAP (Cell Signaling Technology, 66562, 1:100); α-SMA (Invitrogen, 14-9760-82, 1:400); DCN (Proteintech, 14667-1-AP, 1:100) overnight at 4 °C. The following day, plates were washed three times in PBS and incubated with the appropriate secondary antibodies: goat-anti-rabbit-Alexa Flour 488 (Abcam, ab150077, 1:500); goat-anti-mouse-Alexa Flour 647 (Abcam, ab150115, 1:500). Secondaries were incubated for 1 h at room temperature. Slides were then washed and incubated for 5 min in DAPI (Sigma-Aldrich, #D9542). After all the antibodies were detected, the plates were imaged using a Zeiss LSM 880 confocal laser scanning microscop.

RNA in situ hybridization

RNA in situ hybridization assays for the detection of DCN, FAP, CSPG4, THBS2, and PDGFRB expression in 4 μm paraffin sections of RCC primary tumor and tumor thrombus tissues were performed using the RNAscope Multiplex Fluorescent Assay (Advanced Cell Diagnostics, 323100) and RNAscope 4-Plex Ancillary Kit (Advanced Cell Diagnostics, 323120) according to the manufacturer’s instructions. Briefly, slides were warmed at 60 °C for 1 h before they were deparaffinized in xylene and rehydrated through a graded alcohol series. Tissue pretreatment involved incubation with RNAscope Hydrogen Peroxide (Advanced Cell Diagnostics, 322381) for 10 min at room temperature, followed by RNAscope Target Retrieval Reagent (Advanced Cell Diagnostics, 322000) for 15 min at 98 °C and RNAscope Protease Plus (Advanced Cell Diagnostics, 322381) for 30 min at 40 °C. Specific target probes were then hybridized to the tissue sections for 2 h at 40 °C. The sense probes included: RNAscope ISH probe-Hs-CSPG4 (#422041), RNAscope ISH probe-Hs-THBS2 (#555471), RNAscope ISH probe-Hs-FAP (#411971), RNAscope ISH probe-Hs-DCN (#589521), and RNAscope ISH probe-Hs-PDGFRB (#548991). A series of amplification steps were subsequently performed using the reagents and protocols provided in the RNAscope 4-Plex Ancillary Kit. Finally, DAPI (ZSGB-BIO, ZLI-9600) staining was used for nuclei visualization. Stained slides were imaged using a Zeiss LSM 880 confocal laser scanning microscopy platform.

RNA extraction and quantitative reverse transcription-polymerase chain reaction (qRT-PCR)

Total RNA was isolated from CAFs using TRI reagent (Sigma-Aldrich, #P9424). cDNA was synthesized from 1 μg of total RNA using HiScript III All-in-one RT SuperMix Perfect for qPCR (Vazyme, #R333-01). qRT-PCR was performed using Taq Pro Universal SYBR qPCR Master Mix (Vazyme, #Q712-03) and monitored in real-time using a CFX Connect Real-Time PCR Detection System (Bio-Rad Laboratories). Relative gene expression levels were calculated using the 2 − ΔΔCt method, and β-actin was used for normalization. The primer sequences were used as follows: FAP: forward, 5′-ATGAGCTTCCTCGTCCAATTCA-3′; reverse, 5′-AGACCACCAGAGAGCATATTTTG −3′, ACTB: forward, 5′-TGGCACCCAGCACAATGAA-3′; reverse: 5′-CTAAGTCATAGTCCGCCTAGAAGCA-3′; DCN: forward, 5′- ATGAAGGCCACTATCATCCTCC −3′; reverse, 5′- GTCGCGGTCATCAGGAACTT −3′; ACTA2: forward, 5′-AAAAGACAGCTACGTGGGTGA −3′; reverse, 5′-GCCATGTTCTATCGGGTACTTC −3′. All measurements were performed in triplicate.

3D spheroid invasion assays

Zsgreen-expressing RCC cell line (786-O) were mixed with primary CAFs at a ratio of 5:1 (786-O to CAFs). A total of 4500 cells were re-suspended in 100 µl complete medium and seeded into ultra-low-attachment round-bottom 96-well plates (Corning, #7007). Plates were centrifuged at 300 g for 3 min to concentrate the cell suspension before being incubated at 37 °C. 3 days later, spheroids were formed and 50 µl medium was removed from each well, followed by the addition of 50 µl Cultrex Basement Membrane Extract-Type 3 (Culture BME-3; R&D System, #3632-005-02). The plates were then centrifuged at 300 g for 3 min at 4 °C, then incubated for 60 min at 37 °C. Subsequently, 100 μl of completed medium was added to each well. The time point used to quantify invasion was optimized for individual cell lines and ranged from 1 to 5 days. Spheroids were imaged with Nikon microscope (Eclipse TiE). Circularity of spheroids was analyzed using Fiji software.

Collection of cell culture medium

2 × 106 CAFs were seeded on 60 mm plates, When the cell density reached 95%, removed the medium, cells were washed once with PBS, and 4 mL of serum-free MEMα (culture medium; CM) was added. After 48 h of incubation at 37 °C, the CM was collected and passed through a 0.45 μm membrane syringe filter to remove any cells and cell debris. The CM was stored at −80 °C until further use.

Cell proliferation assay

786-O cells and 769-P cells were plated in a 96-well plate at a density of 1500 cells/well and 3000 cells/well, respectively, and added 100 µl medium, consisting of 75 µl of completed medium and 25 µl of CM, then the plate was incubated at 37 °C overnight. Detection was started the next day and the cell proliferation was detected at indicated time points by using CCK-8 kit (Beijing Aoqing biotechnology, #AQ-308). The supernatant was carefully aspirated, and 100 µl CCK-8 reagent was added, and then the plate was incubated at 37 °C for 1 h in the dark. At last, the absorbance at 450 nm was measured using microplate reader (TECAN SPARK).

Transwell assay

The 786-O cells and 769-P cells were re-suspended in serum-free medium and the density was adjusted to 5 × 105 cells/mL and 6 × 105 cells/mL, respectively. 50 µl cell suspension and 450 µl CM were introduced into the upper chambers, while the lower cross-pore compartment contained a solution with 10% FBS. After the cells migrated for 24 h, cells were fixed and stained with crystal violet. Migrated cells were counted under an inverted light microscope. The number of migrated cells was quantified by counting the number of cells from 4 random fields at ×100 magnification.

scRNA-seq data processing

To construct a high-resolution single-cell atlas of RCC, we processed the scRNA-seq data from 28 patients (22 from the discovery dataset and 6 from the validation dataset) using a standardized three-step pipeline: (1) cell-type annotation, (2) removal of low-quality cells and doublets, and (3) discrimination of normal and malignant epithelial cells. All the processing steps were performed in Seurat (v4.2.1)45.

Step 1: Cell annotation. To ensure the accuracy of cell annotation, we performed initial quality control separately in each patient’s dataset. We discarded cells (1) with a mitochondrial content of more than 20%, (2) with less than 200 or more than 5000 genes and (3) with less than 1000 or more than 20000 UMIs. We then log-normalized the raw count matrix (scale.factor = 10000) and identified the highly variable genes (HVGs) using variance-stabilizing transformation (vst). The top 3000 HVGs were used for dimensionality reduction via principal component analysis (PCA). We subsequently corrected principal components by using the Harmony (v0.1.0) algorithm98 to remove the tissue-level batch effect. We constructed a shared nearest neighbor (SNN) graph (k.param = 20) based on the first 50 corrected principal components and clustered cells (resolution = 1.5) using the Louvain algorithm. We then leveraged classical cell markers to merge cell clusters with similar expression patterns and assigned major cell type labels. Step 2: Identification of low-quality cells and doublets. To identify low-quality cells and doublets at high resolution, we performed cell sub-clustering within each cell type for each patient’s dataset. Parameters were adjusted based on the specific cell type and cell count. Cell clusters expressing implausible combinations of cell-type-specific markers were classified as doublets. We further identified low-quality cells by their significantly high mitochondrial gene expression and abnormally low UMI count within cell clusters. These cells were excluded from further analysis. In this process, we also confirmed and refined the cell annotations based on the cell-type-specific markers. Step 3: Discrimination of normal and malignant epithelial cells. In the first two steps, we initially annotated normal epithelial cells and cancer cells according to the expression of marker genes. Next, we randomly sampled 50 cells from each non-epithelial cell type (to accelerate computation) as a reference group, and then inferred the copy number variation (CNV) of epithelial cells using inferCNV21 (Trinity CTAT Project. https://github.com/broadinstitute/inferCNV, v1.6.0). Accounting for inter-patient genomic variability, the CNV inference was performed separately in each patient. Based on the inferred CNV profiles, we used a hierarchical clustering algorithm to divide the epithelial cells into four cell clusters. By comprehensively considering the marker-based cell annotations and the known CNVs in RCC (e.g., 5q gain and 3p loss), we merged the epithelial cell clusters and refined the final annotations of normal epithelial and cancer cells. Finally, we integrated all patients’ scRNA-seq datasets by sequentially removing patient-level and tissue-level batch effects using the Harmony algorithm, and visualized all cells using UMAP.

In addition, four publicly available scRNA-seq datasets20,25,46,50 were processed using the same pipeline and annotated based on a consistent set of canonical cell markers.

Gene specificity scores for cell annotation validation

We ranked all genes in each cell type according to the gene specificity scores using the sortGenes function of the genesorteR (v0.4.3)22 R package setting binarizeMethod to ‘adaptiveMedian’. The gene specificity scores were calculated based on the specificity and expression level of genes in each cell type. To evaluate the accuracy of our cell annotations in our discovery dataset, we calculated Pearson correlation coefficients between our gene specificity scores and those from two published datasets (Bi et al.24 and Kuppe et al.23), both of which provided author-defined cell-type labels, allowing for cross-dataset comparison. As the Bi et al. dataset comprises systemically treated RCC patients, it was used exclusively for benchmarking cell-type annotations and excluded from subsequent validation analyses. In addition, the Kuppe et al. dataset, which focused on renal fibrosis, enabled a more detailed evaluation of mesenchymal cell annotations.

Differential cell-type composition analysis

We categorized all cells into four broad cell types: immune cell, endothelial cell, epithelial cell, and mesenchymal cell. Moreover, we divided our scRNA-seq data (discovery dataset) into distinct sample groups according to sampling sites and TT status. Within each sample group, we calculated the proportion of each cell type relative to its corresponding broad cell type. To assess alterations in cell-type composition (Supplementary Fig. 2A), we performed analyses at two levels: (i) within individual patients, by comparing different tissue regions, and (ii) between clinical groups (patients with vs. without TT), by comparing the same tissue regions. For intra-patient comparisons across tissue regions (AT, PT, and TT) in RCC patients with TT, we used two-sided paired t-test after excluding two patients (P9 and P10) due to missing TT data resulting from low cell viability. For inter-patient comparisons between RCC patients with and without TT, we used two-sided unpaired t-test to compare cell-type proportions in the same tissue region (either AT or PT). Fold change (FC) was defined as:

$${{FC}}_{{{\rm{c}}}{{{\rm{omparison}}}}_{{{\rm{A}}}\; {{\rm{vs}}}\; {{\rm{B}}}}}^{{{\rm{cell}}}\; {{\rm{type}}}}=\frac{{{\rm{mean}}}({{{\bf{P}}}}_{{{\rm{A}}}}^{{{\rm{cell}}}\; {{\rm{type}}}})-{{\rm{mean}}}({{{\bf{P}}}}_{{{\rm{B}}}}^{{{\rm{cell}}}\; {{\rm{type}}}})}{{{\rm{mean}}}({{{\bf{P}}}}_{{{\rm{A}}}}^{{{\rm{cell}}}\; {{\rm{type}}}})+{{\rm{mean}}}({{{\bf{P}}}}_{{{\rm{B}}}}^{{{\rm{cell}}}\; {{\rm{type}}}})}$$
(1)
$${{{\bf{P}}}}_{{{\rm{group}}}}^{{{\rm{cell}}}\; {{\rm{type}}}}=({p}_{1}^{{{\rm{cell}}}\; {{\rm{type}}}},{p}_{2}^{{{\rm{cell}}}\; {{\rm{type}}}},...,{p}_{{{\rm{n}}}}^{{{\rm{cell}}}\; {{\rm{type}}}})$$
(2)

where pi represents the proportions of the given cell type in the i-th sample of specific group.

To validate the observed reduction in NK cells and enrichment of fibroblasts in PTs from RCC patients with TT, we analyzed two published scRNA-seq datasets (Young et al.20 and Yu et al.25). However, due to the absence of fibroblasts in the Young et al. dataset, fibroblast subpopulations could not be resolved. To further confirm the enrichment of FAP+ fibroblasts and CYSLTR2+ fibroblasts, we used both our validation dataset and Yu et al. dataset for independent validation.

Receptor-ligand interaction inference and differential cellular crosstalk analysis

Based on our discovery dataset, we used CellPhoneDB (v2.1.7)27 to infer receptor-ligand interaction pairs between diverse cell types within the TME. As for differential cellular crosstalk analysis, we separately inferred cell-cell interactions in PTs from patients with and without TT. To constructed a differential interaction graph, we identified significantly activated interaction pairs in patients with TT (p < 0.05 in patients with TT but p > 0.05 in patients without TT). To accurately reflect the cell-cell interactions within TME, we excluded the cell types present in normal kidney from this graph, such as normal renal epithelial cells and glomerular capillaries ECs. In this graph, nodes represent cell types, edges denote existing activated interaction between specific two cell types in patients with TT, and edge weights are determined by the number of activated interactions. We calculated the weighted degree of each node to evaluate the overall activation levels of each cell in patients with TT.

Integration of cancer cells

To overcome substantial patient heterogeneity in cancer cells in our discovery dataset, we excluded patients with fewer than 50 cancer cells and then identified the top 3000 HVGs within cancer cells of each remaining patient. Genes identified as HVGs in at least half of the patients were subsequently included in further analyses, such as PCA for dimensionality reduction, Harmony-based batch correction, and cell clustering.

Cell niche analysis

Spatial mapping of cell types and identification of spatial domains were independently performed for each spatial transcriptomics sample. First, we used cell2location (v0.1)29 to calculate cell-type compositions for each spot. Reference signatures of major cell types were estimated using a negative binomial regression model and our discovery dataset (raw count matrix of scRNA-seq data). The reference signature matrix was further used to estimate the cell-type abundances in each spot based on a hierarchical Bayesian model. Cell-type proportions per spot were calculated by normalizing the inferred cell-type abundances. Second, spatial domains within each tumor section were identified by clustering spots using the stSME module of stLearn (v0.4.11)30, which integrated spatial location, morphology, and gene expression of spots. Specifically, features were extracted from H&E-stained histological images using the ResNet50 model, and the SME_normalize function was applied to adjust gene expression matrix with the parameter setting of use_data = “raw” and weights = “weights_matrix_all”. Subsequently, we performed PCA analysis on the SME-normalized data, followed by Louvain clustering with the top 50 principal components. Third, to identify shared spatial organizations across different tumor sections with similar cell-type compositions, cell-type proportions for each spatial domain were computed by averaging the cell-type abundances of constituent spots. We integrated all spatial domains across 48 spatial transcriptomics samples using the Harmony algorithm, with cell-type proportions as input. Batch effects from data source, sample type, and section identity were sequentially corrected (theta = 2, max_iter = 5 for each step), followed by spatial domain clustering using the Louvain algorithm. Finally, clusters were merged based on similarity of cell-type compositions and revealed 23 general spatial domain clusters, termed “cell niches” (CNs). Dominant cell types within each CN were defined as those with an average proportion exceeding 8%. To investigate the spatial organization of cell types within RCC tumors, we constructed a cell-type spatial organization network. In this network, each node represents a dominant cell type identified across the 23 CNs, and undirected edges were added between pairs of dominant cell types co-occurring within the same CN. This network revealed spatial associations among dominant cell types and delineated how distinct cell populations are spatially organized within the TME of RCC.

To identify TT-associated differences in the spatial organization of primary tumors, we compared the area of each CN between patients with and without TT. CN area was defined as the number of spots assigned to each niche per sample. To ensure biological comparability, only tumor regions were included in this analysis. Therefore, spatial transcriptomic samples from the tumor-normal interface were excluded due to limited comparability across samples. TT status was defined based on known clinical annotation for our spatial transcriptomic dataset or inferred from T stage for publicly available datasets (T1-T2 considered non-TT). Samples with ambiguous TT status (e.g., T3a) were excluded. Sample details are in Supplementary Data 6. Statistical comparisons were performed using two-sided unpaired t-test, with P < 0.05 considered significant.

The spatial transcriptomics data used for this analysis included 8 samples generated in this study (4 PTs and 4 TTs from RCC patients with TT) and 40 tumor sections from two publicly available datasets18,19 (11 tumor-normal interface samples, 5 tumor core regions and 24 immune-positive regions from RCC patients across diverse stages). Unless otherwise specified, all analysis steps described above were performed based on these 48 spatial transcriptomics samples.

Trajectory inference

To ensure precise trajectory inference, we reanalyzed the raw count data of target cells (pericyte 2, CYSLTR2+ fibroblast and FAP+ fibroblast), including log-normalization, identification of HVGs, dimensionality reduction using PCA and batch correction using Harmony. This reanalysis facilitated the extraction of features associated with pericyte-fibroblast transition, thereby improving the performance of trajectory inference. Subsequently, we used the corrected principal components as input to generate the diffusion maps using the destiny (v3.12.0) R package (https://github.com/theislab/destiny). We performed trajectory inference and pseudotime analysis based on the diffusion map projection using Slingshot (v2.6.0)99 R package. Finally, we used the PseudotimeDE (v1.0.0)49 R package to analyze the pseudotime-dependent genes along the trajectory, and displayed the top 50 significant genes (adjusted p < 0.05) in a heatmap. To validate the consistency of the pericyte-fibroblast transition across datasets, we independently performed trajectory inference on multiple scRNA-seq datasets, including our discovery and validation dataset, as well as those from Yu et al.25, and Long et al.50. Each dataset contained tumor pericytes, CYSLTR2+ fibroblasts, and FAP+ fibroblasts.

Single-cell regulatory network inference and clustering (SCENIC) analysis

To explore key transcription factors (TFs) and their corresponding regulon activities, we applied the pySCENIC (v0.10.4)51 pipeline with default settings to the discovery dataset, Yu et al. dataset25, and Long et al. dataset50. pySCENIC infers relationships between known TFs and potential target genes using the GRNBoost2 and RcisTarget algorithms to obtain TF regulons and evaluates the activity of each regulon using the AUCell score. Using the regulon activity matrix, we filtered the top 500 most variable regulons (selection.method = “mvp”) and performed PCA dimensionality reduction. We then used Harmony to correct the PCs and performed diffusion map and slingshot-based trajectory inference. Finally, we also used the PseudotimeDE49 R package to identify pseudotime-dependent regulons. The relative trajectory analysis was performed on the discovery dataset to confirm pericyte-fibroblast transition.

RNA velocity

We used velocyto (v0.17.17)52 to annotate each patient’s transcripts as spliced and unspliced reads and generated loom files, which were subsequently merged via loompy. We further applied the scVelo (v0.2.4)100 Python package to pre-process velocity data and learn the full transcriptional dynamics of splicing kinetics by dynamical model. Finally, we projected the velocity information onto the previously generated UMAP to visualize the velocity flow between cell populations in our discovery and validation datasets.

scATAC-seq data processing

Here, we mainly used Signac (v1.8.0)101 R package to process scATAC-seq data (Yu et al. dataset25 and Long et al. dataset50). First, we performed initial data quality control to remove low quality cells followed criteria: (1) 1000 <peak region fragments <20000, (2) reads in peaks > 15%, (3) blacklist ratio <0.05, (4) nucleosome signal <4, (5) TSS enrichment > 3. We then performed latent semantic indexing (LSI) analysis, including term frequency-inverse document frequency (TF-IDF) normalization, feature selection (top 99% peaks) and dimensionality reduction using singular value decomposition (SVD). We selected second to 30th dimensionality, followed by LSI component correction using Harmony to remove patient-level batch effect. We used the corrected LSI components as input to generate SNN graph and then clustered cells using Leiden algorithm. To identify the identity of cells in scATAC-seq data, we created gene activity score matrix by counting the number of fragments mapping to the genes and their 2 kb upstream region, and applied Seurat label transfer algorithm to assign the cell annotations of the paired scRNA-seq data to the cells in scATAC-seq. In brief, we integrated paired scRNA-seq gene expression and scATAC-seq gene activity to identify a set of anchors using FindTransferAnchors function, and then transferred cell labels from scRNA-seq to scATAC-seq using TransferData function. Finally, combined with the gene activity of canonical cell type marker gene activity and the predicted cell labels, we merged and annotated the cell clusters. Due to limited cell-type resolution of peaks in the raw cellranger-atac output, we identified mesenchymal-cell-specific peaks and refined the annotations of mesenchymal cell subsets through an iterative two-step process. First, we generated pseudo-bulk profiles for mesenchymal cell subpopulations by aggregating their fragments, followed by peak calling using MACS2 (v2.2.7.1)102. Second, we performed dimensionality reduction and cell clustering based on the newly identified cell-subset-specific peaks, including LSI analysis, Harmony-based batch correction and Leiden clustering. Third, we created the gene activity score matrix and transferred the cell labels of mesenchymal cell subsets from the paired scRNA-seq to refine the cell annotations. Similar to the scRNA-seq analysis, we identified and removed clusters exhibiting implausible combination of cell-type-specific marker gene activity, considering them as potential doublets.

To match the mesenchymal cells between scATAC-seq and scRNA-seq, we integrated the gene activity score matrix from scATAC-seq and the gene expression matrix of scRNA-seq using Seurat co-embedding workflow and paired the cells in this space using an optimal matching approach59. This assigned a unique gene expression profile for each cell in the scATAC-seq dataset, facilitating the investigation of the relationship between transcriptional regulation and chromatin regulation. Moreover, to infer the pericyte-fibroblast transition based on the gradient changes of chromatin accessibility, we used LSI components from the scATAC-seq dataset as input to perform diffusion map projection and slingshot-based trajectory inference. We then evaluated the TF binding activity using the chromVAR (v1.20.0)60 R package based on motifs from the human_pwms_v2 list, and the TF regulon activity using the pySCENIC python package. To investigate potential regulatory mechanisms, we performed pairwise Spearman correlation analysis on various TF-related metrics, including expression and chromatin accessibility of TF genes, TF regulon activity, and TF binding activity. In parallel, we applied the PseudotimeDE R package to evaluate the pseudotime-dependent dynamics of each TF-associated metric. Transcription factors were filtered based on two independent sets of criteria: (1) gene expression, chromatin accessibility, and regulon activity for each TF were mutually correlated (all Spearman correlation coefficients > 0.1) and exhibited pseudotime-dependent dynamics (PseudotimeDE-adjusted p < 0.05); (2) gene expression, regulon activity, and TF binding activity met the same criteria.

Differential expression analysis and gene set enrichment analysis

To investigate cell-type-specific genes and differential expression profiles across various sample groups, we used the log-normalized expression matrix as input to identify differentially expressed genes using the two-sided Wilcoxon rank-sum test with Bonferroni correction (adjusted p < 0.05 and logFC > 0.5). Cell signature scoring of all bulk RNA-seq datasets was performed using single sample gene set enrichment analysis (ssGSEA). To validate the results of scRNA-seq, we compared cell signature scores between patients with and without TT using two-sided Wilcoxon rank-sum test. Moreover, we performed the paired two-sided Wilcoxon rank-sum test to compare various fibroblast subset signature scores between adjacent tissues and primary tumors. In spatial transcriptomics data, ssGSEA was used with Hallmark, KEGG, and GO Biological Process gene sets from MSigDB to evaluate functional pathway enrichment within each spot. For scRNA-seq data, we applied the AUCell (v1.12.0) algorithm to assess the activity of specific pathways at single-cell resolution. To identify pathways and biological processes differentially regulated between RCC with and without TT, we performed gene set enrichment analysis (GSEA), and additionally used the limma package to compare AUCell-based pathway activities at the single-cell level.

Cell signature analysis in TCGA, Javelin 101 cohort and Checkmate cohort

To obtain total fibroblast and NK cell signatures, we performed pairwise differential expression analyses based on our discovery scRNA-seq dataset. For each target cell type (fibroblasts or NK cells), we compared its transcriptome with those of all other major cell types (including T cells, B cells, macrophages, etc.). Genes that were consistently upregulated in the target cell type across all pairwise comparisons (adjusted P < 0.05 and logFC > 0.5) were retained as candidate signature genes. We then manually filtered for genes with minimal expression in other major cell types based on FeaturePlot results to achieve fibroblast or NK cell signature (Supplementary Data 11). For the fibroblast subsets signatures, we performed similar pairwise differential analyses and the selection of candidate genes among all 16 mesenchymal cell subsets. Due to the potential transitional cell state of CYSLTR2+ fibroblast and ADH1B+ fibroblast, we also incorporated genes co-expressed within various cell subsets of the same lineage as their signature genes. For example, highly expressed genes in both CYSLTR2+ fibroblast and FAP+ fibroblast (or pericyte 2), such as POSTN and THBS2, were included as its candidate signature genes. We further manually screened the genes specific to the mesenchymal cells to obtain final fibroblast subset signatures (Supplementary Data 11).

In TCGA-KIRC dataset, we performed pairwise two-sided Wilcoxon rank-sum test to compare the FAP+ fibroblast signature score across multiple cancer prognostic factors (clinical stage, T stage and pathological grading). The association between FAP+ fibroblast signature score and progression-free survival (PFS) and overall survival (OS) was evaluated via Kaplan-Meier analysis using the survminer and survival R packages. For the signature score of FAP+ fibroblast, patient cohorts were grouped into high and low groups by optimal cutoff values using surv_cutpoint function from survminer. We used the two-sided log-rank test to detect significance of differences in PFS and OS between two groups.

Beta-binomial regression

To evaluate the association between different fibroblast subpopulations and NK cell abundance in RCC, we applied a beta-binomial regression model using the glmmTMB (v1.1.11) R package based on our discovery dataset. In this model, NK cell counts were treated as the number of successes out of total immune cells per sample, and the normalized proportions of five fibroblast subpopulations were included as covariates. This modeling framework accounts for the discrete nature of cell counts and potential overdispersion across samples. Regression coefficients reflected the strength and direction of association between each fibroblast subset and NK cell abundance.

Statistical analysis

All statistical analyses were conducted utilizing R software (version 4.0.5) and Python (version 3.9.12). Detailed descriptions of the statistical tests applied for each specific analysis are delineated in the Fig. legends and the Methods.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.