MultiGATE: integrative analysis and regulatory inference in spatial multi-omics data via graph representation learning

Miao, Jishuai; Li, Jinzhao; Xin, Jingxue; Tu, Jiajuan; Ge, Muyang; Qi, Ji; Zhou, Xiaocheng; Zhu, Ying; Yang, Can; Lin, Zhixiang

doi:10.1038/s41467-025-63418-x

Download PDF

Article
Open access
Published: 24 October 2025

MultiGATE: integrative analysis and regulatory inference in spatial multi-omics data via graph representation learning

Jishuai MiaoÂ ORCID: orcid.org/0009-0005-6345-8590¹^Â na1,
Jinzhao Li¹^Â na1,
Jingxue Xin¹,
Jiajuan Tu²,
Muyang Ge¹,
Ji Qi¹,
Xiaocheng ZhouÂ ORCID: orcid.org/0000-0001-5635-9395¹,
Ying ZhuÂ ORCID: orcid.org/0000-0002-6594-3734³,
Can YangÂ ORCID: orcid.org/0000-0002-4407-3055^4,5 &
â€¦
Zhixiang LinÂ ORCID: orcid.org/0000-0001-9301-947X^1,6Â

Nature Communications volumeÂ 16, ArticleÂ number:Â 9403 (2025) Cite this article

2143 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

New spatial multi-omics technologies, which jointly profile transcriptome and epigenome/protein markers for the same tissue section, expand the frontiers of spatial techniques. Here, we introduce MultiGATE, which utilizes a two-level graph attention auto-encoder to integrate the multi-modality and spatial information in spatial multi-omics data. The key feature of MultiGATE is that it simultaneously performs embedding of the spatial pixels and infers the cross-modality regulatory relationship, which allows deeper data integration and provides insights on transcriptional regulation. We evaluate the performance of MultiGATE on spatial multi-omics datasets obtained from different tissues and platforms. Through effectively integrating spatial multi-omics data, MultiGATE both enhances the extraction of latent embeddings of the pixels and boosts the inference of transcriptional regulation for cross-modality genomic features.

Introduction

In recent years, the advent of spatial transcriptomic technologies has revolutionized our understanding of gene expression patterns within complex tissues and organs. These techniques profile gene expression levels in their native spatial context, offering rich insights into the organization and heterogeneity of biological tissues. New spatial multi-omics technologies, which jointly profile spatial transcriptome and epigenome/protein markers for the same tissue section, have expanded the frontiers of spatial techniques: spatial ATACâ€“RNAâ€“seq and spatial CUT&Tag-RNA-seq¹ jointly profile spatial transcriptome and epigenome; Slide-tags² jointly profiles open chromatin, RNA, and T-cell receptor sequences in single nuclei, together with the spatial barcode of the nuclei; Spatial Protein and Transcriptome Sequencing (SPOTS)³ is a technology that jointly profiles RNA and protein markers. Compared to single-omics technologies, spatial multi-omics provides additional molecular modality for more comprehensive profiling of the tissue and allows a deeper understanding of the regulatory relationship across different molecular modalities in the central dogma of molecular biology, within the native spatial context of the tissue.

Methods have been developed for spatial transcriptomic data, including SpaGCN⁴, STAGATE⁵, and SpatialPCA⁶. These methods, utilizing either graph neural networks or spatially-aware matrix factorization, have achieved precise spatial clustering of the pixels, thereby facilitating our understanding of spatial gene expression patterns. However, there is a pressing need to utilize the information embedded within additional omics layers in spatial multi-omics data. Methods have been developed for single-cell multi-omics data^{7,8,9,10,11,12}. Among them, Seurat WNN⁷ provides an unsupervised framework to learn the relative importance of each molecular modality in each cell, enabling integrative analysis of single-cell multi-omics data. However, it was not designed for spatial multi-omics and does not account for the spatial information, which is crucial in the study of spatial data. MOFA+⁸ is designed for the integration of single-cell multi-modal data, using a linear factor model that decomposes the input matrices into the product of low-rank matrices (weight matrices and low-dimensional representation matrices); this linear factor model does not consider spatial information and the cross-modality regulatory relationship. totalVI⁹ is designed for CITE-seq data (RNAâ€‰+â€‰surface protein); it uses the variational autoencoder (VAE) framework to model gene expression raw counts with negative-binomial (NB) distribution and the protein counts as an NB mixture of foreground and background signal; the VAE framework in totalVI does not incorporate spatial information in the spatial multi-omics data. MultiVI¹⁰ is based on a conditional VAE and models each modality using a specific distribution: NB distribution for gene expression raw counts and Bernoulli distribution for chromatin accessibility; this conditional VAE does not consider spatial information and the cross-modality regulatory relationship. JSNMF¹¹ employs a joint semi-orthogonal nonnegative matrix factorization (NMF) model to learn separate latent factor matrices for each modality and then fuses them via a consensus cell-cell similarity graph, and the NMF model does not incorporate the spatial information. In contrast, a more recent method, SpatialGlue¹³, performs an integrated analysis of spatial multi-omics data using a graph convolution network to obtain embeddings for each molecular modality separately, and then employs attention mechanisms to integrate information from different modalities. However, SpatialGlue uses principal components as the input and does not model the cross-modality regulation in the embedding of pixels. To fully utilize the information in spatial multi-omics data and realize its potential in uncovering the regulatory relationship across different molecular modalities, it is crucial to model the connection of the cross-modality features.

In response to the critical need for integrative tools in spatial multi-omics, we have developed MultiGATE, and its core is a two-level graph attention autoencoder designed to (i) infer cross-modality regulatory relationships (including cis-regulation, trans-regulation and proteinâ€“gene interactions) and (ii) embed spatial pixels into a latent space for clustering/spatial domain identification and visualization. By modeling regulatory links (such as peakâ€“gene associations, proteinâ€“gene interactions, and enzymeâ€“metabolite associations) directly into its graph attention mechanism, MultiGATE learns more informative low-dimensional representations of each spatial pixel/spot by deeper integrating different modalities. In turn, these refined embeddings sharpen the attention scores between cross-modality features, yielding more accurate inference of cross-modality regulation. The analysis of the cross-modality regulatory relationship enabled by MultiGATE provides unique insight in studying transcriptional regulation in the native tissue context powered by spatial multi-omics data (Detailed methodological innovations of MultiGATE are described in Supplementary Note S1). We demonstrate the superior performance of MultiGATE through various spatial multi-omics datasets generated from different tissues and technologies. Our results highlight the ability of MultiGATE to effectively capture the regulatory relationships across different molecular modalities while providing enhanced latent embeddings of the pixels for accurate spatial clustering.

Results

Method overview

The core of MultiGATE is a two-level graph attention autoencoder that simultaneously embeds the pixels/spots in a low-dimensional space and models the cross-modality feature regulatory relationships (e.g., peakâ€“gene, proteinâ€“gene, and metaboliteâ€“gene). At the first level, a cross-modality attention mechanism is employed to model the cross-modality regulatory relationship; at the second level, a within-modality attention mechanism is utilized to incorporate spatial information, which encourages the embedding of neighboring pixels to be similar. To further improve the cross-modality data integration, MultiGATE also incorporates a Contrastive Language-Image Pretraining (CLIP) loss, which encourages the alignment of the embeddings from different modalities.

The holistic structure of MultiGATE enables the aggregation and alignment of spatial information and molecular features from each modality. MultiGATE effectively extracts the low-dimensional representations of the pixels for clustering/spatial domain identification and data visualization, and simultaneously unravels the regulatory relationships of the cross-modality features for cis-regulation, trans-regulation, and protein-gene interactions (Fig. 1).

MultiGATE unveils the functional landscape of the adult human hippocampus

We first employed MultiGATE to analyze the Spatial ATACâ€“RNAâ€“seq dataset¹ generated from the adult human brain hippocampus, where chromatin accessibility and gene expression are simultaneously profiled at different spatial pixels. We manually annotated the hippocampus layers and white matter (Fig. 2a) based on guidance from previous studies^14,15,16 and marker genes. These annotations served as the ground truth for evaluating the performance of different methods. We calculated the Adjusted Rand Index (ARI) for each method and compared their clustering accuracy. The results were obtained using pixels that underwent preprocessing in SpatialGlue¹³, which filters out low-quality pixels, and we removed the pixels that were filtered out by SpatialGlue when computing the ARI for each method to ensure a fair comparison. Of all the methods evaluated, MultiGATE achieved the highest accuracy in detecting the layer structure within the human hippocampus (ARI: MultiGATE 0.60, SpatialGlue 0.36, Seurat WNN 0.23, MOFA+ 0.10, and MultiVI 0.14; Fig. 2b and Supplementary Fig. 1A) and superior performance in other quantitative metrics (Supplementary Fig. 2A). Upon visually inspecting the clustering results in Fig. 2b (A unified color scheme is applied across methods using label matching, as detailed in Supplementary Note S2), MultiGATE provided a clearer deciphering of the molecular layer (ML) and the choroid plexus compared to SpatialGlue. On the other hand, Seurat WNN does not incorporate the spatial information, resulting in a clustering outcome that lacks spatial organization.

**Fig. 2: MultiGATE enables precise detection of spatial domains and authentic integrative regulatory inference in the human hippocampus.**

MultiGATE simultaneously models the peakâ€“gene association in embedding the pixels in a low-dimensional space, distinguishing itself from SpatialGlue¹³ and Seurat WNN⁷, which rely on principal components as input and do not model peakâ€“gene associations. By incorporating a graph-based representation of regulatory interactions, MultiGATE employs a Bayesian-like approach that combines prior knowledge of genomic distance with the observed spatial multi-omics data to estimate cross-modality attention, allowing for modeling spatial information and the multi-omics genomic features in the attention mechanism simultaneously (â€œMethodsâ€).

To assess whether the attention mechanism in MultiGATE can accurately capture the cis-regulatory interactions, we compared the attention score estimated by MultiGATE with external eQTL data¹⁷, which has been widely used to identify genetic variants that influence gene regulation¹⁸. For the human hippocampus data, the attention scores estimated by MultiGATE decrease as genomic distance increases, whereas the peakâ€“gene pairs supported by eQTL data from the human hippocampus demonstrate higher attention scores, compared to those that are not supported (Fig. 2c). This suggests that MultiGATE effectively captures genuine cis-regulatory interactions. Next, we used the human hippocampus eQTL data¹⁷ as the benchmark, and compared the performance for identifying peakâ€“gene association between MultiGATE and other methods (Fig. 2d): MultiGATE achieved the best AUROC score (0.703), compared to Cicero¹⁹ (AUROCâ€‰=â€‰0.530), Spearman correlation (AUROCâ€‰=â€‰0.515), and LASSO regression (AUROCâ€‰=â€‰0.501). To further showcase the biological relevance of the peakâ€“gene interactions identified by MultiGATE, we examined genes with well-established functional importance in the hippocampus, including CA12 and PRKD3. CA12 is a carbonic anhydrase in hippocampal neurons, regulates pH, and supports neuronal excitability^20,21. PRKD3 is associated with synaptic function and Alzheimerâ€™s disease^22,23. For each of these genes, MultiGATE successfully identified peakâ€“gene associations that are supported by hippocampus-specific eQTLs (Fig. 2e), whereas non-significant associations are generally not supported by eQTL signals. Moreover, MultiGATE can be extended to detect long-range interactions between enhancers and promoters: across eight distance bins (0â€“150â€‰kb, 150â€“300â€‰kb, â€¦, >â€‰1.25â€‰Mb), MultiGATE can distinguish the brain-specific enhancer-promoter contacts from those in other tissues across 24 diverse human tissues (Supplementary Note S3). In addition, MultiGATE has been extended to capture trans-regulatory interactions by incorporating TF binding priors and learning TFâ€“peakâ€“gene attention scores. Using ChIP-seq data for SOX2 in brain tissue as ground truth, our method outperformed motif-only, TF binding potential, and cosine similarity baselines in predicting SOX2 binding, achieving an AUC of 0.8669 and AUPR of 0.4906 (Supplementary Note S4). In summary, MultiGATE enables regulatory inference and enhances our understanding of cis-regulatory and trans-regulatory interactions by effectively mining multi-omics spatial data to precisely infer regulatory networks.

Finally, to explore the biological relevance of the identified clusters, we examined differentially expressed genes (DEGs) for each cluster identified by MultiGATE in Supplementary Fig. 3B. For instance, PPFIA2 displayed differential expression and showed high expression levels in the granule cell layer, aligning with previous findings²⁴. Known molecular markers of the hippocampus, such as SLC1A2, were specifically expressed in the ML¹⁴ identified by MultiGATE. Furthermore, PLP1 encodes a major component of myelin protein in the central nervous system²⁵. And it is highly expressed in the stratum lacunosum-moleculare region identified by MultiGATE. Gene Ontology (GO) analysis of the DEGs from this region revealed significant enrichment for myelination (Supplementary Fig. 3C), aligning with the known function of PLP1. These findings underscore the biological relevance and accuracy of the clusters identified by MultiGATE.

MultiGATE reveals a layered pattern that mimics the annotated structure in the mouse brain while enabling regulatory inference

We further employed MultiGATE to analyze the spatial ATACâ€“RNAâ€“seq dataset¹ generated from mouse postnatal day 22 (P22) brains. We utilized a P56 coronal annotation provided by the Allen Mouse Brain Atlas as a reference in Fig. 3a.

**Fig. 3: MultiGATE enhances structural identification in the P22 mouse brain.**

We aligned the clustering results at the profiled region within the whole tissue slide in Fig. 3b to enhance clarity. To quantitatively evaluate clustering results, we computed the Intraclass Correlation Coefficient (ICC)²⁶. The ICC quantifies the consistency of observations within each cluster. Higher ICC values reflect greater intra-cluster homogeneity for a given modality, indicating improved clustering accuracy²⁷. MultiGATE achieves the highest combined ICC score among all methods compared, outperforming SpatialGlue, Seurat WNN, MOFA+, and MultiVI (Supplementary Figs. 2C, 1B, and 4A). Both MultiGATE and SpatialGlue exhibit the ability to detect the genu of corpus callosum (ccg), lateral ventricle (VL), caudoputamen (CP), olfactory limb (aco), and lateral preoptic area, as well as accurately identifying the six layers present in the cortex. MultiGATE more accurately discriminates the outermost cortical layer (Cluster 5, shown in red). In contrast, Cluster 5 in SpatialGlue shows no cluster-enriched markers (Supplementary Note S5), indicating a heterogeneous mixture of multiple cell types. More specifically, Cluster 5 in SpatialGlue contains pixels in the LS nucleus region¹, which are molecularly different from the other pixels in Cluster 5 (Fig. 3b and Supplementary Note S5). This highlights the enhanced precision and reliability offered by MultiGATE in cortical layer analysis. In contrast, Seurat WNN does not consider the spatial information and the clustering result does not have a clear spatial organization of the regions (Fig. 3b).

To validate MultiGATEâ€™s attention mechanism in inferring peakâ€“gene regulation, we utilized two validation datasets: EnhancerAtlas²⁸ and EGAS²⁹ for enhancer-gene regulation. In Fig. 3c, d, the attention scores estimated by MultiGATE decrease as genomic distances increase; Moreover, higher attention scores were observed for the peakâ€“gene pairs where the peak overlaps with an enhancer that regulates the corresponding gene. These results are attributed to the adaptive learning of attention, which incorporates prior weights that depend on genomic distance and the observed multi-omics data. Next, we compared the performance on identifying peakâ€“gene regulatory pairs between MultiGATE and other methods. MultiGATE outperformed Cicero¹⁹, Spearman correlation and lasso regression, as indicated by the highest AUROC scores in Fig. 3c, d. We next demonstrate the regulatory relationship for the gene Xrcc5. The gene Xrcc5 is known for its involvement in DNA repair mechanisms in the mouse brain³⁰. MultiGATE identified a peak that regulates Xrcc5 (indicated by the red curve) and this peak is about 90â€‰kb away from the gene body of Xrcc5; This peak also overlaps with an enhancer that regulates Xrcc5 in the Enhancer Atlas database; The other peaks near Xrcc5 are not predicted to regulate Xrcc5 in MultiGATE and neither of these peaks is supported in the Enhancer Atlas dataset (Fig. 3e). This observation and the overall distribution of attention scores in Fig. 3c, d demonstrate the superior performance of MultiGATE in regulatory inference.

To demonstrate that MultiGATE effectively combines the information in multiple modalities, we also inspected the clustering result for each molecular modality separately⁵. There are substantial differences between the clustering outcomes of the two individual modalities: spatial clusters generated by ATAC-seq provide more informative patterns regarding contextual layering and internal structure (Supplementary Fig. 3D); RNA-seq also contributes valuable information, especially the identification of the outermost layer (Supplementary Fig. 3D). MultiGATE integrates both modalities to produce more accurate and comprehensive clustering results. By incorporating multi-modal information, MultiGATE capitalizes on the strength of ATAC-seq and RNA-seq, leading to enhanced performance in clustering analysis (Fig. 3b).

Furthermore, we have successfully identified marker genes associated with the spatial clusters identified by MultiGATE (Supplementary Note S6), including marker genes for the regions ccg, CP, and VL, as shown in Fig. 3f. Pde10a is highly expressed in CP, which is part of the striatum dorsal region, consistent with the previous findings³¹. Hmgb2 was shown to be expressed in neural stem/progenitor cells, which are highly active in the VL zone and contribute to the balance between neurogenesis and gliogenesis³², consistent with the observed high expression of Hmgb2 in the VL region in the spatial multi-omics data (Fig. 3f). Top2a is highly expressed in the VL region as shown in Fig. 3f, which supports previous findings that Top2a is associated with mitosis and cell cycle regulation and is highly expressed in the VL region³³. Nr4a2 is highly expressed in the EPd region (Fig. 3f), consistent with previous findings that Nr4a2-expressing cells migrate to form the dorsal endopiriform nucleus³⁴. Fth1 is highly expressed in the corpus callosum, particularly in oligodendrocytes, where it plays a critical role in neuroprotection and the maintenance of axonal health³⁵, which aligns well with our observation that Fth1 is highly expressed in the CP region.

These expression patterns and the above regulatory inference derived from spatial analysis contribute to a more profound understanding of the diverse regions within the brain.

MultiGATE reveals spatial patterns in diverse spatial multi-omics technologies

SPOTS³ is a spatial multi-omics technology that jointly profiles whole transcriptome and protein markers while preserving tissue architecture. We first analyzed a SPOTS dataset³ generated from murine spleen, which consists of distinct cellular populations organized into well-defined structures such as germinal centers (GCs), marginal zones (MZs), and red pulp (The bright-field image is shown in Fig. 4a). The clustering results of MultiGATE, SpatialGlue, and Seurat WNN are shown in Fig. 4b. MultiGATE achieves the highest combined ICC score among the five methods, outperforming SpatialGlue, Seurat WNN, MOFA+, and totalVI (Supplementary Figs. 2D, 7C, and 8B). MultiGATE identified five clusters: T cell, B cell, and three distinct macrophage subtypes. Spatially, the T cells (purple) are located at the center, surrounded by B cells (green), and further encircled by successive layers of White Pulp Macrophages (WPM), Marginal Zone Macrophages (MZM), and Red Pulp Macrophages (RPM) (Supplementary Note S7 and Supplementary Fig. 5A, B). This anatomically consistent layering reflects spleen structure and highlights MultiGATEâ€™s ability to distinguish spatially organized immune populations.

**Fig. 4: MultiGATE discerns clear patterns in the SPOTS dataset.**

Notably, MultiGATE demonstrates a more precise clustering of T cells and B cells compared to SpatialGlue. We evaluated this by comparing the ADT expression of the CD3 protein, a canonical T cell marker, across T cell and B cell clusters identified by each method (Fig. 4c). Since CD3 is not expressed in B cells, a method that better separates these populations should yield a greater difference in CD3 expression between the two clusters. Indeed, the clustering of MultiGATE shows significantly greater separation in CD3 expression (Wilcoxon Pâ€‰=â€‰1â€‰Ã—â€‰10^âˆ’6, rank-biserial râ€‰=â€‰âˆ’0.953, and Cliffâ€™s Î´â€‰=â€‰0.953) than SpatialGlue (Wilcoxon Pâ€‰=â€‰1â€‰Ã—â€‰10^âˆ’6, rank-biserial râ€‰=â€‰âˆ’0.866, and Cliffâ€™s Î´â€‰=â€‰0.866; Supplementary Note S8). Additionally, within the T cell cluster itself, CD3 expression is significantly higher in the clustering produced by MultiGATE than SpatialGlue (Wilcoxon Pâ€‰=â€‰1.2â€‰Ã—â€‰10^âˆ’5; Supplementary Note S8), suggesting more precise identification of T cells. To visually compare clustering performance, we examined three annotated regions in the bright-field image (Fig. 4a). The T cell clusters (purple pixels) identified by MultiGATE show stronger concordance with CD3 protein expression than those identified by SpatialGlue (Fig. 4d). Furthermore, SpatialGlue identifies a significantly higher number of B cells (outliers in Fig. 4c) with elevated CD3 expressionâ€“likely T cells misclassified as B cells. In regions 2 and 3 (Fig. 4d), the highlighted pixels show high CD3 expression and low CD19 expression, indicating that these pixels should be categorized as T cells, which MultiGATE correctly identifies. In summary, these results suggest that MultiGATE achieves a more precise separation of T cells and B cells. Its spatial clustering result also aligns well with the protein-based deconvolution (Supplementary Note S9). In Fig. 4e, the T cell cluster shows strong CD3, CD4, and CD8 expression, consistent with known T cell profiles: CD8 for cytotoxic T lymphocytes³⁶, and CD3/CD4 crucial for anti-tumor responses³⁷. Additionally, CD169-Siglec is highly expressed in the MZM cluster (Fig. 4e), aligning with its known function as an I-type lectin that recognizes gangliosides and captures viruses³⁸.

We also implemented Seurat WNN on this dataset (Fig. 4b). Even at the lowest resolution of 0.1 for Louvain clustering, Seurat WNN identified seven clusters, roughly capturing the T cell (Cluster 4) and B cell (Cluster 3) populations. However, because it does not consider the spatial information, its overall clustering result lacks the layered anatomical pattern in murine spleen-particularly for the macrophage clusters. For example, Cluster 5 likely corresponds to MZM, yet its pixels are scattered and mixed with other clusters, lacking clear layered pattern. In contrast, MultiGATE recovers the expected spatial pattern of macrophage subtypes relative to GCs: WPM appear closest, followed by MZM, and RPM farthest (Supplementary Fig. 5B and Supplementary Note S7). Beyond spatial clustering, MultiGATE identifies regulatory associations by linking proteins to related gene sets via cross-modality attention. Applied to CD3, it assigned higher attention scores to CD3-related genes than to B cell-related, macrophage-related, or random controls, confirming its ability to recover meaningful gene-protein relationships (Supplementary Note S10).

Next, we employed MultiGATE to analyze a Slide-tags dataset² generated from metastatic melanoma samples, where open chromatin and RNA expression were profiled in single nuclei, along with spatial barcodes providing the spatial location of the nuclei. This dataset primarily consisted of two tumor clusters, as annotated in the original study² (Supplementary Fig. 6A). Consistent with these annotations, MultiGATE, along with SpatialGlue, and Seurat WNN accurately divided the tumor cells into two clusters (Supplementary Fig. 6B and D). Further analysis of DEGs identified by MultiGATE between tumor Cluster 1 and Cluster 2 revealed distinct expression patterns (Supplementary Fig. 6C and E). Cluster 1 was characterized by a mesenchymal-like state, with elevated expression and chromatin accessibility of TNC, a marker of invasion and metastasis, alongside PLCB4 and CPEB2, which support YAP signalingâ€“a key driver of epithelial-mesenchymal transition and cellular plasticity in melanoma^{39,40,41,42,43}. Cluster 1 also showed downregulation of MHC genes (CTSB, HSP90AA1, HSP90AB1, and LGMN; Supplementary Fig. 6E), suggesting immune evasion and potential resistance to immunotherapy, a trait often linked to aggressive tumor behavior⁴¹. In contrast, Cluster 2 displayed a melanocytic-like state with high expression of DCT and APOE, both implicated in treatment resistance via maintenance of melanocyte identity and ferroptosis evasion^44,45. This melanocytic state was further supported by the significant enrichment of Cluster 2 upregulated genes (in Supplementary Fig. 6E) in pigmentation (GO:0043473; Hypergeometric Pâ€‰=â€‰1.38â€‰Ã—â€‰10^âˆ’4) and in MITF-regulated pathways (R-HSA-9730414; Hypergeometric Pâ€‰=â€‰6.47â€‰Ã—â€‰10^âˆ’6), where MITF is a master regulator of melanocyte identity and drives melanocyte development. Additionally, Cluster 2 showed a relative upregulation of MHC genes within the KEGG antigen processing and presentation pathway, with 7 out of 27 genes showing a fold change greater than 1.2 (Hypergeometric test, Pâ€‰=â€‰4.63â€‰Ã—â€‰10^âˆ’6), indicating enhanced antigen presentation. This increased immune visibility likely promotes greater interaction with cytotoxic T cells, contributing to a more proliferative and therapy-sensitive phenotype in Cluster 2^46,47. Together, these findings reveal that heterogeneity in melanoma identified by MultiGATE reflects distinct cell states with divergent therapeutic vulnerabilities.

To further demonstrate MultiGATEâ€™s versatility across measurement technologies and tissue types, we performed three additional evaluations: (1) integration of spatial transcriptomics and metabolomics (Supplementary Fig. 7 and Supplementary Note S11), showing that MultiGATE can jointly analyze modalities without genomic distance priors; (2) analysis of spatial RNAâ€‰+â€‰protein data in human breast cancer (Supplementary Fig. 8 and Supplementary Fig. 2B); and (3) a simulated breast cancer-patterned spatial ATACâ€‰+â€‰RNA dataset (Supplementary Fig. 9 and Supplementary Note S12), confirming its applicability to tissues lacking stereotypical structure. In all cases, MultiGATE outperformed competing methods, accurately recovering spatial domains.

Discussion

Spatial multi-omics analysis has emerged as a powerful approach for understanding the interplay between genes, chromatin accessibility, protein, and spatial organization in biological systems. In this study, we propose MultiGATE, a two-level graph attention framework that leverages the graph attention mechanism to integrate spatial multi-omics data.

Compared to existing methods, MultiGATE provides a unique advantage in simultaneously embedding the pixels and inferring the cross-modality regulatory relationship, which is not available in alternative approaches such as SpatialGlue and Seurat WNN. The attention scores incorporate genomic distances as prior knowledge to model cross-modality regulatory interactions, providing new insights into transcriptional regulation. Furthermore, the latent representations learned by MultiGATE enable accurate spatial clustering and enhanced visualization.

In conclusion, MultiGATE is a significant approach in the analysis of spatial multi-omics data. Its integration of multi-omics and spatial information, generation of high-quality embeddings, and inference of regulatory interactions make it a valuable tool for unraveling gene regulation and spatial organization. We anticipate that MultiGATE will facilitate discoveries and advancements across various fields of biological research.

Beyond modeling cis-regulation, MultiGATE can also capture trans-regulatory interactions by incorporating TF binding priors and learning TFâ€“peakâ€“gene attention scores. While initial results are promising, large-scale validation remains challenging due to the limited availability of high-quality ChIP-seq datasets in brain tissues. Future efforts will aim to broaden trans-regulatory validation as more tissue-specific datasets become accessible.

Currently, MultiGATE does not incorporate histology images into pixel embeddings, as only bright-field images were provided in the spatial ATACâ€“RNAâ€“seq datasets¹. Previous studies have demonstrated that combining histology images with gene expression data improves cell segmentation^48,49 and enables high-resolution gene expression^50,51. As a future direction, subsequent iterations of MultiGATE could integrate an additional pre-trained encoder designed to process histology images.

Methods

The framework of MultiGATE

MultiGATE integrates spatial multi-omics data and the key feature of MultiGATE is that it provides both the cross-modality connection of the features, which reveals the regulatory relationship across modalities (for example, the peakâ€“gene association in spatial ATACâ€“RNAâ€“seq data), and the latent embeddings of the spatial spots, which can be used for downstream analysis including clustering and visualization of the spatial spots. These are achieved through a two-level graph attention auto-encoder that seamlessly integrates the cross-modality features and the spatial proximity of the spots. The first-level auto-encoder employs the feature connectivity graph to integrate the features of different modalities in spatial multi-omics data. It effectively combines the multi-modality information and learns attention scores, which capture the cross-modality connection of the features, facilitating inference of the regulatory relationships. The second-level auto-encoders use the neighboring graph of the spots to capture the spatial information of the spots in each modality, respectively, which encourages spatial smoothness for the enhancement of the latent embeddings of the spots. The latent embeddings and cross-modality regulatory scores provided by MultiGATE play a vital role in facilitating subsequent integration and downstream analysis for spatial multi-omics data, thereby enabling a more comprehensive and insightful exploration of the data.

The two-level graph attention auto-encoder

Let ${{{\bf{X}}}}_{1}\in {{\mathbb{R}}}^{N\times p}$ be the preprocessed data matrix from one modality with N spots and p features, ${{{\bf{X}}}}_{2}\in {{\mathbb{R}}}^{N\times q}$ be the preprocessed data matrix from another modality with N spots and q features.

The first-level auto-encoder in MultiGATE, referred to as the cross-modality graph attention auto-encoder, employs the feature connectivity graph as a prior to integrate cross-modality features of different modalities in spatial multi-omics data. The feature connectivity graph is an adjacency matrix denoted by A, where each entry A_ij represents the connection between feature i and feature j. The features in the adjacency matrix are the union of two modalities in spatial multi-omics data. For spatial ATACâ€“RNAâ€“seq data¹, features are the union of genes and chromatin peaks. The value of A_ij lies in the range of $\left(1,e\right]$ and is determined by the base pair distance between peak i and gene j, with the value decreasing as the base pair distance increases, while being restricted within 150â€‰kb. For SPOTS data³, features encompass the union of genes and proteins. In this scenario, A_ijâ€‰=â€‰1 if and only if the gene i encodes the protein j or the subunits of protein j. In the feature connectivity graph, self-loops are added for each feature: A_iiâ€‰=â€‰1, for all i. The encoder in the first-level cross-modality auto-encoder generates the integrated feature matrix by aggregating information from the features that are connected across modalities through the feature connectivity graph A. More specifically, we transpose X₁ and X₂, concatenate ${{{\bf{X}}}}_{1}^{T}$ and ${{{\bf{X}}}}_{2}^{T}$ along the dimension of the features and obtain the merged data matrix ${{{\bf{X}}}}^{T}\in {{\mathbb{R}}}^{(p+q)\times N}$. Let ${{{\bf{X}}}}_{(f)}^{T}$ be the data vector for feature f, âˆ€ f âˆˆ {1, 2, â‹¯ â€‰, p + q}. The encoder integrates the cross-modality features as follows:

$${\bar{{{\bf{X}}}}}_{(f)}^{T}={\sigma }_{1}\left({\sum}_{g\in {{{\bf{A}}}}_{f}}{{{\rm{att}}}}_{gf}{{{\bf{X}}}}_{(g)}^{T}+{{{\rm{att}}}}_{ff}{{{\bf{X}}}}_{(f)}^{T}\right),$$

(1)

where ${\bar{{{\bf{X}}}}}_{(f)}^{T}$ is the integrated data for feature f, Ïƒ₁ is the ReLU activation function, A_f represents the set of features that are connected to feature f in the feature connectivity graph A, att_gf is the attention score between feature g and feature f, and att_ff is the self-attention for feature f. These attention scores are learned adaptively based on a self-attention mechanism⁵². For cross-modality attentions, we first calculate the attention coefficient from feature f to feature g as follows:

$${e}_{fg}={{\rm{Sigmoid}}}\left({{{\bf{v}}}}_{1}^{T}\left({{{\bf{X}}}}_{(f)}^{T}\right)+{{{\bf{v}}}}_{2}^{T}\left({{{\bf{X}}}}_{(g)}^{T}\right)\right),$$

(2)

where v₁ and v₂ are trainable parameters, and the Sigmoid activation function is used. Next, the attention scores are computed with a Bayesian-like approach:

$${{{\rm{att}}}}_{fg}=\frac{{A}_{fg}\exp ({e}_{fg})}{{\sum }_{h\in {{{\bf{A}}}}_{f}}{A}_{fh}\exp ({e}_{fh})},$$

(3)

where

$${A}_{fg}=\exp \left({\left(\frac{{{{\rm{dist}}}}_{fg}+{{\rm{bp}}}\_{{\rm{width}}}}{{{\rm{bp}}}\_{{\rm{width}}}}\right)}^{-0.75}\right).$$

(4)

dist_fg is the genomic distance between features f and g. bp_width is the bandwidth in the kernel set as 400 (A sensitivity analysis in Supplementary Note S13). A_fg represents the prior knowledge that quantifies the connection between features f and g based on their genomic distance, and its value decays when the genomic distance increases; The attention coefficient e_fg represents the connection between the features derived from the observed data. In addition to the cross-modality attention att_fg, we also incorporate a self-attention att_ff in computing ${\bar{{{\bf{X}}}}}_{(f)}^{T}$, to preserve the information in feature f itself. att_ff is computed the same as att_fg, with dist_ff set to 0 in computing A_ff. In summary, the first-level auto-encoder integrates the cross-modality features, and the learned attention scores att_fg reflect the strength of the connections between the cross-modality features and can be used for inferring their regulatory relationships.

Following the acquisition of integrated features ${\bar{{{\bf{X}}}}}^{T}$, we proceed to the second-level auto-encoders in MultiGATE, which are the within-modality graph attention auto-encoders. The second-level auto-encoders employ the spatial neighborhood graph to aggregate information from neighboring spots in spatial multi-omics data. The spatial neighborhood graph captures the similarity of neighboring spots using an adjacency matrix denoted as S, where S_ijâ€‰=â€‰1 if spot j is an immediate spatial neighbor of spot i (and S_iiâ€‰=â€‰1 for all i). Neighborhoods are defined based on direct spatial adjacency^4,53. The spatial RNA+protein³ technique uses the 10Ã— Visium platform, each spot has six neighbors, following a hexagonal grid layout⁵³. For Spatial ATACâ€“RNAâ€“seq data, neighborhood size reflects pixel resolution: in the 50â€‰Î¼m human hippocampus dataset, each pixel has four adjacent neighbors (up/down/left/right), while in the 20â€‰Î¼m P22 mouse brain dataset, each pixel has eight neighbors in a 3â€‰Ã—â€‰3 grid. The average number of neighbors per spot across datasets is visualized in Supplementary Note S14. The spatial neighborhood graph allows us to effectively represent the spatial relationships between spots and incorporate their similarity in the auto-encoder. The output from the first-level cross-modality encoder, ${\bar{{{\bf{X}}}}}^{T}$, is first transposed to $\bar{{{\bf{X}}}}$ and then split to ${\bar{{{\bf{X}}}}}_{1}$ and ${\bar{{{\bf{X}}}}}_{2}$ according to the modality. ${\bar{{{\bf{X}}}}}_{1}$ and ${\bar{{{\bf{X}}}}}_{2}$ are the input for the second-level auto-encoders. For the k-th modality, we utilize an auto-encoder with the structure described below to capture the information from neighboring spots, where k âˆˆ {1, 2}. Let ${\bar{{{\bf{x}}}}}_{ki}$ be the integrated feature vector for modality k of spot i, âˆ€ i âˆˆ {1, 2, â‹¯ , N}. Let L be the number of layers in the encoder. The output of the encoder in the l-th layer (l âˆˆ {1, 2, â‹¯ , L âˆ’ 1}) is calculated as follows:

$${\bar{{{\bf{x}}}}}_{ki}^{(l)}={\sigma }_{2}\left({\sum}_{j\in {{{\bf{S}}}}_{i}}{{{\rm{att}}}}_{ij}^{(l)}\left({{{\bf{W}}}}_{k}^{l}{\bar{{{\bf{x}}}}}_{kj}^{(l-1)}\right)\right),$$

(5)

where ${{{\bf{W}}}}_{k}^{l}$ is the trainable projection matrix in layer l of modality k, Ïƒ₂ is the ELU activation function, S_i is the set of neighbors for spot i in the spatial neighborhood graph S (including spot i itself) and ${{{\rm{att}}}}_{ij}^{(l)}$ is the spatial attention score between spot i and spot j in the l-th graph attention layer. To obtain ${{{\rm{att}}}}_{ij}^{(l)}$, we first calculate the attention coefficient from spot i to neighboring spot j in the l-th encoder as follows:

$${e}_{ij}^{(l)}={{\rm{Sigmoid}}}\left({{{\bf{v}}}}_{{{\rm{self}}}}^{T}\left({{{\bf{W}}}}_{k}^{l}{\bar{{{\bf{x}}}}}_{ki}^{(l-1)}\right)+{{{\bf{v}}}}_{{{\rm{nei}}}}^{T}\left({{{\bf{W}}}}_{k}^{l}{\bar{{{\bf{x}}}}}_{kj}^{(l-1)}\right)\right),$$

(6)

where v_self and v_nei are trainable vectors. The attention score ${{{\rm{att}}}}_{ij}^{(l)}$ is then computed as softmax

$${{{\rm{att}}}}_{ij}^{(l)}=\frac{\exp ({e}_{ij}^{(l)})}{{\sum }_{d\in {{{\bf{S}}}}_{i}}\exp ({e}_{id}^{(l)})}.$$

(7)

Our proposed attention scores differ from the standard graph attention auto-encoder⁵² in the following two important aspects. Firstly, the calculation of cross-modality attention scores incorporates prior information on the connection of cross-modality features: it considers the prior knowledge that a peak is less likely to regulate a gene when the genomic distance is larger. Secondly, the trainable projection matrices in the cross-modality attention coefficients in equation (2) are fixed as identity matrices: we set ${{{\bf{v}}}}_{1}^{T}({{{\bf{X}}}}_{(f)}^{T})$ instead of ${{{\bf{v}}}}_{1}^{T}({{{\bf{W}}}}_{1}{{{\bf{X}}}}_{(f)}^{T})$. This decision is based on the intrinsic strong connection of the cross-modality features: regulatory regions/peaks regulate gene expression, and gene expression directly affects protein level. Performing an early projection could potentially disrupt this information, hence, we avoid including the projection matrices.

The last encoder layer (i.e., L-th layer) does not employ the attention layer, and its output is calculated as follows:

$${\bar{{{\bf{x}}}}}_{ki}^{(L)}={\sigma }_{2}\left({\sum}_{j\in {{{\bf{S}}}}_{i}}{{{\bf{W}}}}_{k}^{L}{\bar{{{\bf{x}}}}}_{kj}^{(L-1)}\right),$$

(8)

where the output ${\bar{{{\bf{x}}}}}_{ki}^{(L)}$ is the latent embedding learned for the k-th modality.

After the input passes through the cross-modality encoder and the within-modality encoder, we obtain the latent embeddings ${\bar{{{\bf{X}}}}}^{(L)}$. Next, these latent embeddings undergo reconstruction first through the within-modality decoder and then through the cross-modality decoder, where both decoders are symmetric to their corresponding encoders. More specifically, the within-modality decoder takes the latent embeddings ${\bar{{{\bf{X}}}}}^{(L)}$ as the input (i.e., ${\tilde{{{\bf{x}}}}}_{ki}^{(L)}={\bar{{{\bf{x}}}}}_{ki}^{(L)}$), its output in the (l âˆ’ 1)-th layer (l âˆˆ {L, â‹¯ , 2}) is calculated as follows:

$${\tilde{{{\bf{x}}}}}_{ki}^{(l-1)}={\sigma }_{2}\left({\sum}_{j\in {{{\bf{S}}}}_{i}}{\hat{{{\rm{att}}}}}_{ij}^{(l-1)}\left({\hat{{{\bf{W}}}}}_{k}^{l}{\tilde{{{\bf{x}}}}}_{kj}^{(l)}\right)\right).$$

(9)

Symmetric to the within-modality encoder, the output in the outermost within-modality decoder layer is calculated as follows:

$${\tilde{{{\bf{x}}}}}_{ki}^{(0)}={\sigma }_{2}\left({\sum}_{j\in {{{\bf{S}}}}_{i}}{\hat{{{\bf{W}}}}}_{k}^{1}{\tilde{{{\bf{x}}}}}_{kj}^{(1)}\right).$$

(10)

MultiGATE sets the parameters in the within-modality decoder to be the same as those in the within-modality encoder to avoid overfitting: ${\hat{{{\bf{W}}}}}_{k}^{l}={({{{\bf{W}}}}_{k}^{l})}^{T}$, for lâ€‰=â€‰1, â‹¯ â€‰, L and ${\hat{{{\rm{att}}}}}_{ij}^{(l)}={{{\rm{att}}}}_{ij}^{(l)}$, for lâ€‰=â€‰1, â‹¯ , L âˆ’ 1. We concatenate the output of the within-modality decoder for both modalities and denote it as $\tilde{{{\bf{X}}}}$. Next, the cross-modality decoder takes $\tilde{{{\bf{X}}}}$ as the input, and outputs the reconstructed data:

$${\hat{{{\bf{X}}}}}_{(f)}^{T}={\sigma }_{2}\left({\sum}_{g\in {{{\bf{A}}}}_{f}}{\hat{{{\rm{att}}}}}_{gf}{\tilde{{{\bf{X}}}}}_{(g)}^{T}+{\hat{{{\rm{att}}}}}_{ff}{\tilde{{{\bf{X}}}}}_{(f)}^{T}\right),$$

(11)

where ${\hat{{{\bf{X}}}}}_{(f)}^{T}$ is the reconstructed data for feature f. MultiGATE sets the cross-modality decoder to be symmetric to the cross-modality encoder to avoid overfitting: ${\hat{{{\rm{att}}}}}_{gf}={{{\rm{att}}}}_{gf}$ and ${\hat{{{\rm{att}}}}}_{ff}={{{\rm{att}}}}_{ff}$.

The overall architecture of MultiGATE

In all datasets, the cross-modality mechanism consists of one GAT (Graph Attention Network) layer with two learnable vectors to compute attention scores between cross-modality features. In the cross-modality attention mechanism, we integrate the information of each gene with its neighboring peaks (or proteins) according to attention scores, and each peak (or protein) with its neighboring genes. The cross-modality encoder then outputs two enhanced data matricesâ€“one of size (Peaksâ€‰Ã—â€‰Spots) and one of size (Genesâ€‰Ã—â€‰Spots)â€“which integrate information from the other modality and are passed on to the within-modality attention step.

For the within-modality attention autoencoders, each encoder has two hidden layers (dimensions 512 and 30), and the decoder is symmetric to the encoder. A sensitivity analysis for a latent dimension of 30 is presented in Supplementary Note S15.

Loss function and training details

There are two components in the loss function. The first component is the reconstruction loss for the modalities.

$${\sum }_{i=1}^{N}| | {{{\bf{X}}}}_{1i}-{\hat{{{\bf{X}}}}}_{1i}| {| }_{2}+{\sum }_{i=1}^{N}| | {{{\bf{X}}}}_{2i}-{\hat{{{\bf{X}}}}}_{2i}| {| }_{2}.$$

(12)

The second component is the CLIP loss⁵⁴, which encourages the latent embeddings for different modalities, ${\bar{{{\bf{X}}}}}_{1}^{(L)}$ and ${\bar{{{\bf{X}}}}}_{2}^{(L)}$ to be similar and further facilitates cross-modality integration, in addition to the cross-modality graph attention. The CLIP loss is calculated as follows:

$$\begin{array}{l}-\frac{1}{2N}{\sum }_{i=1}^{N}\log \frac{\exp ({{\rm{sim}}}({{{\bf{f}}}}_{i},{{{\bf{g}}}}_{i})\exp ({{\rm{temp}}}))}{{\sum }_{j=1}^{N}\exp ({{\rm{sim}}}({{{\bf{f}}}}_{i},{{{\bf{g}}}}_{j})\exp ({{\rm{temp}}}))}+\\ -\frac{1}{2N}\mathop{\sum }_{i=1}^{N}\log \frac{\exp ({{\rm{sim}}}({{{\bf{g}}}}_{i},{{{\bf{f}}}}_{i})\exp ({{\rm{temp}}}))}{\mathop{\sum }_{j=1}^{N}\exp ({{\rm{sim}}}({{{\bf{g}}}}_{i},{{{\bf{f}}}}_{j})\exp ({{\rm{temp}}}))},\end{array}$$

(13)

where the function $\it sim$ represents cosine similarity, ${{{\bf{f}}}}_{i}={{{\bf{M}}}}_{1}^{T}{\bar{{{\bf{x}}}}}_{1i}^{(L)}$, ${{{\bf{g}}}}_{i}={{{\bf{M}}}}_{2}^{T}{\bar{{{\bf{x}}}}}_{2i}^{(L)}$, M₁ and M₂ are trainable projection matrices for modality 1 and modality 2, respectively. The embeddings f_i and g_i, âˆ€ i âˆˆ {1, 2, â‹¯ â€‰, N} serve as the final output latent embeddings generated by MultiGATE, which are subsequently utilized for downstream analysis. temp is the temperature parameter in CLIP loss, which is set to 1 by default.

MultiGATE employs an Adam optimizer to minimize the loss with an initial learning rate of 1e-4. The weight decay is set as 1e-4. There are two kinds of activation functions: Ïƒ₁ is ReLU, while Ïƒ₂ is ELU. The default number of iterations is set to 1000. For details on runtime performance and memory usage across datasets, please refer to Supplementary Notes S16 and S17.

MultiGATE attention-based cis-regulatory inference and validation

We use the following procedure to get peakâ€“gene regulatory pairs from MultiGATEâ€™s cross-modality attention scores and evaluate the peakâ€“gene regulatory pairs:

1.
Cross-modality attention computation

In the cross-modality autoencoder, for each peak f and gene g, we first compute an unnormalized attention coefficient
$${e}_{fg}={{\rm{Sigmoid}}}\left({{{\bf{v}}}}_{1}^{T}{{{\bf{X}}}}_{(f)}^{T}+{{{\bf{v}}}}_{2}^{T}{{{\bf{X}}}}_{(g)}^{T}\right),$$
where ${{{\bf{X}}}}_{(f)}^{T}$ and ${{{\bf{X}}}}_{(g)}^{T}$ are the observed data for peak f and gene g, and v₁, v₂ are learned vectors.

We then incorporate a genomic distance prior
$${A}_{fg}=\exp \left({\left(\frac{{{{\rm{dist}}}}_{fg}+{{\rm{bp}}}\_{{\rm{width}}}}{{{\rm{bp}}}\_{{\rm{width}}}}\right)}^{-0.75}\right)\quad ({{\rm{bp}}}\_{{\rm{width}}}=400\,{{\rm{bp}}}),$$
and normalize e_fg to obtain the cross-modality attention score
$${{{\rm{att}}}}_{fg}=\frac{{A}_{fg}\,\exp ({e}_{fg})}{{\sum }_{h\in {{{\mathcal{N}}}}_{f}}{A}_{fh}\,\exp ({e}_{fh})},$$
where ${{{\mathcal{N}}}}_{f}$ is the set of features connected to peak f in the feature connectivity graph.
2.
Rescaling and thresholding

First, we pooled all inter-feature attention scores (i.e., excluding self-attention) and performed a linear transformation to map them onto the interval [0, 1]. The resulting distribution was bimodal (Supplementary Note S18). We fitted a two-component Gaussian mixture model to these normalized scores and defined our threshold Î¸ as the intersection point of the two Gaussian densities-i.e., the value at which the posterior probability of belonging to either component is equal (Supplementary Note S18). This data-driven cutoff separates low, noise-driven attention scores from high, biologically meaningful ones. Empirically, we obtained
$${\theta }_{{{\rm{Human}}}\,{{\rm{hippocampus}}}}=0.204,\quad {\theta }_{{{\rm{Mouse}}}\,{{\rm{P}}}22}=0.141.$$
These thresholds were then used to call peakâ€“gene links.

To validate the peakâ€“gene regulatory pairs identified in the human hippocampus data, we used the human hippocampus eQTL¹⁷ data. To validate the peakâ€“gene regulatory pairs identified in the mouse brain data, we used the EnhancerAtlas²⁸ and EGAS²⁹ data obtained from the mouse cortex and striatum: The EnhancerAtlas obtains regulatory information by predicting consensus enhancers based on high-throughput datasets, including histone modification, CAGE, GRO-seq, transcription factor binding, and DHS; The EGAS dataset utilizes single-cell sequencing data and a non-parametric permutation-based procedure to predict the genome-wide enhancer-gene associations. However, mouse eQTL data was not employed due to the absence of the corresponding tissues of interest. During validation, we restricted our analysis to genes present in these external datasets and evaluated whether their linked peaks were supported by independent evidence.

A peakâ€“gene pair is considered to be supported by EnhancerAtlas or EGAS if the peak overlaps with a genomic region that regulates the corresponding gene. A peakâ€“gene pair is considered to be supported by eQTL if there is an eQTL locus within the peak and the locus is associated with the corresponding gene.

Data description

We utilize MultiGATE to analyze diverse types of spatial multi-omics data, including spatial ATACâ€“RNAâ€“seq, spatial snRNA-ATAC-seq, spatial protein-RNA-seq, and spatial RNAâ€‰+â€‰metabolomics. In the context of spatial ATACâ€“RNAâ€“seq, Spatial Epigenome-Transcriptome Co-profiling¹ generates data from the juvenile mouse brain as well as the adult human brain. This technique allows for the simultaneous profiling of epigenetic and transcriptomic characteristics, providing insights into the regulatory relationships between the genome and gene expression within a spatial context. For spatial snRNA-ATAC-seq, Slide-tags² generates multi-omics measurements capturing open chromatin and RNA within the same cells derived from metastatic melanoma. This approach enables the investigation of both chromatin accessibility and gene expression profiles within individual cells, facilitating a deeper understanding of the molecular characteristics of melanoma cells. In the case of spatial protein-RNA-seq, SPOTS³ generates high-throughput data that enables simultaneous profiling of spatial transcriptomics and protein expression within the mouse spleen. By integrating transcriptomic and proteomic information, SPOTS offers a comprehensive view of the spatially resolved molecular landscape of the spleen, shedding light on the interplay between gene expression and protein abundance. For spatial RNA and metabolomics data, the SMA⁵⁵ protocol enabled the generation of mouse brain spatial RNAâ€‰+â€‰metabolomics profiles by integrating spatial transcriptomics (SRT) and MALDI-MSI on the same tissue section. For the breast cancer spatial RNAâ€‰+â€‰protein dataset, the Visium CytAssist Spatial Gene and Protein Expression assay⁵⁶ was used to simultaneously profile gene and protein expression with spatial resolution in formalin-fixed, paraffin-embedded tissue samples, generating the human breast cancer dataset.

Data preprocessing

For the scRNA-seq or snRNA-seq data, we first log-transform and normalize the raw gene expression by library size using Scanpy⁵⁷. Next, we select the top 3000 highly variable genes as input. For the scATAC-seq or snATAC-seq data, we first select the top 50,000 highly variable peaks and then run TF-IDF on the chromatin accessibility matrix. Next, we log-transform and normalize the processed chromatin accessibility matrix by sequencing depth. For the protein data, we apply CLR (centered log-ratio transformation) to the protein matrix using muon⁵⁸.

Clustering

We employ two clustering strategies for the latent embeddings obtained from MultiGATE. The first strategy involves averaging the latent embeddings for each spot and subsequently applying Louvain clustering⁵⁹, as utilized in Slide-tags tumor dataset or mclust⁶⁰, to the averaged embeddings in SPOTS spleen. The second strategy involves employing Seurat WNN (FindMultiModalNeighbors function)⁷ to construct a weighted graph based on the latent embeddings, and then we apply Louvain clustering using this graph in the mouse brain as well as the adult human brain from spatial ATACâ€“RNAâ€“seq analyses. In general, both clustering methods yield similar results, with the WNN approach providing more refined details. A sensitivity analysis of the Louvain clustering resolution is presented in Supplementary Note S19.

Identifying differentially expressed genes

Wilcoxon test is employed to discover DEGs for each identified spatial domain (Benjaminiâ€“Hochberg adjustment).

Gene Ontology enrichment analysis

For the Spatial ATACâ€“RNAâ€“seq dataset¹ generated from the adult human brain hippocampus, we conducted the gene set enrichment analysis implemented in the GSEAPY package⁶¹ to discover the enriched GO terms for spatially variable genes in the detected domain with adjusted P valueâ€‰ <â€‰0.01.

Parameter settings for comparison methods

For all comparing methods, we used default hyperparameters as specified in their official documentation: SpatialGlue (https://spatialglue-tutorials.readthedocs.io/en/latest/), Seurat WNN (https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis). Full parameter settings are detailed in Supplementary Note S20.

Quantitative metrics calculation

We employed multiple clustering evaluation metrics to quantitatively evaluate the performance of spatial clustering results across different methods.

Datasets with ground truth labels

For the adult human hippocampus dataset (spatial ATACâ€“RNAâ€“seq), the mouse brain dataset (spatial transcriptomicsâ€‰+â€‰metabolomics), the breast cancer-patterned spatial ATACâ€‰+â€‰RNA dataset (Supplementary Fig. 9), which have ground truth spatial domain annotations, we computed the following metrics to assess clustering agreement: Rand Index (RI), ARI, Adjusted Mutual Information, Normalized Mutual Information, Homogeneity, Completeness, V-measure, Fowlkes-Mallows Index.

Datasets without ground truth labels

For the P22 mouse brain dataset (spatial ATACâ€“RNAâ€“seq), the mouse spleen dataset (spatial RNAâ€‰+â€‰protein), and human breast cancer (spatial RNAâ€‰+â€‰protein), where no ground truth labels are available, we adopted the ICC²⁷ to evaluate clustering quality (Supplementary Note S21).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All datasets analyzed in this paper have already been published and are available on public websites. All data utilized in the paper are accessible for open download. The Spatial epigenome-transcriptome, including the human hippocampus and mouse brain¹ is provided by the UCSC cell browser (https://brain-spatial-omics.cells.ucsc.edu/). The Slide-tags dataset is collected from the Single Cell Portal (https://singlecell.broadinstitute.org/single_cell/study/SCP2176). The mouse spleen data (spatial RNAâ€‰+â€‰Protein) generated by SPOTS can be found at (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE198353). The mouse brain spatial RNAâ€‰+â€‰metabolomics dataset can be found at (https://data.mendeley.com/datasets/w7nw4km7xd/1). The human breast cancer data (spatial RNAâ€‰+â€‰Protein) can be found at (https://www.10xgenomics.com/datasets/gene-and-protein-expression-library-of-human-breast-cancer-cytassist-ffpe-2-standard). The processed datasets used in this study have been deposited to Figshare and can be accessed at: https://doi.org/10.6084/m9.figshare.27978765.Â Source data are provided with this paper.

Code availability

The code used to develop the model, perform the analyses, and generate the results of this study has been deposited in the GitHub repository cuhklinlab/MultiGATE (https://github.com/cuhklinlab/MultiGATE) under the Apache License 2.0. The specific version of the code associated with this publication (v0.1.0) has been archived on Zenodo and is accessible via 10.5281/zenodo.16310792⁶².

References

Zhang, D. et al. Spatial epigenomeâ€“transcriptome co-profiling of mammalian tissues. Nature 616, 113â€“122 (2023).
ArticleÂ ADSÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Russell, A. J. C. et al. Slide-tags enables single-nucleus barcoding for multimodal spatial genomics. Nature 625, 101â€“109 (2024).
ArticleÂ ADSÂ CASÂ PubMedÂ Google ScholarÂ
Ben-Chetrit, N. et al. Integration of whole transcriptome spatial profiling with protein markers. Nat. Biotechnol. 41, 788â€“793 (2023).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Hu, J. et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods 18, 1342â€“1351 (2021).
ArticleÂ PubMedÂ Google ScholarÂ
Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739 (2022).
ArticleÂ ADSÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Shang, L. & Zhou, X. Spatially aware dimension reduction for spatial transcriptomics. Nat. Commun. 13, 7203 (2022).
ArticleÂ ADSÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573â€“3587 (2021).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 21, 1â€“17 (2020).
ArticleÂ Google ScholarÂ
Gayoso, A. et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods 18, 272â€“282 (2021).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Ashuach, T. et al. MultiVI: deep generative model for the integration of multimodal data. Nat. Methods 20, 1222â€“1231 (2023).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Ma, Y., Sun, Z., Zeng, P., Zhang, W. & Lin, Z. JSNMF enables effective and accurate integrative analysis of single-cell multiomics data. Brief. Bioinform. 23, bbac105 (2022).
ArticleÂ PubMedÂ Google ScholarÂ
Zhang, W. & Lin, Z. iPoLNGâ€“An unsupervised model for the integrative analysis of single-cell multiomics data. Front. Genet. 14, 998504 (2023).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Long, Y. et al. Deciphering spatial domains from spatial multi-omics with spatialglue. Nat. Methods 21, 1658â€“1667 (2024).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Amaral, D. G., Scharfman, H. E. & Lavenex, P. The dentate gyrus: fundamental neuroanatomical organization (dentate gyrus for dummies). Prog. Brain Res. 163, 3â€“790 (2007).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Boldrini, M. et al. Antidepressants increase neural progenitor cells in the human hippocampus. Neuropsychopharmacology 34, 2376â€“2389 (2009).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Cappaert, N. L., Van Strien, N. M. & Witter, M. P. Chapter 20: hippocampal formation. In The Rat Nervous System (Fourth Edition) (ed Paxinos, G.) 511â€“573 (Academic Press, San Diego, 2015).
Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204â€“213 (2017).
ArticleÂ ADSÂ Google ScholarÂ
Gilad, Y., Rifkin, S. A. & Pritchard, J. K. Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet. 24, 408â€“415 (2008).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Pliner, H. A. et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol. Cell 71, 858â€“871.e8 (2018).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Ciccone, L., Cerri, C., Nencetti, S. & Orlandini, E. Carbonic anhydrase inhibitors and epilepsy: State of the art and future perspectives. Molecules 26, 6380 (2021).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Lemon, N., Canepa, E., Ilies, M. A. & Fossati, S. Carbonic anhydrases as potential targets against neurovascular unit dysfunction in Alzheimerâ€™s disease and stroke. Front. Aging Neurosci. 13, 772278 (2021).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Esteban-Martos, A. et al. A functional pipeline of genome-wide association data leads to midostaurin as a repurposed drug for Alzheimerâ€™s disease. Int. J. Mol. Sci. 24, 12079 (2023).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Jorfi, M., Maaser-Hecker, A. & Tanzi, R. E. The neuroimmune axis of alzheimerâ€™s disease. Genome Med. 15, 6 (2023).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Karpf, J. et al. Dentate gyrus astrocytes exhibit layer-specific molecular, morphological and physiological features. Nat. Neurosci. 25, 1626â€“1638 (2022).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Inoue, K. PLP1-related inherited dysmyelinating disorders: Pelizaeus-Merzbacher disease and spastic paraplegia type 2. Neurogenetics 6, 1â€“16 (2005).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Koo, T. K. & Li, M. Y. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J. Chiropr. Med. 15, 155â€“163 (2016).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Coleman, K. et al. Resolving tissue complexity by multimodal spatial omics modeling with MISO. Nat. Methods 22, 530â€“538 (2025).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Gao, T. & Qian, J. EnhancerAtlas 2.0: an updated resource with enhancer annotation in 586 tissue/cell types across nine species. Nucleic Acids Res. 48, D58â€“D64 (2020).
CASÂ PubMedÂ Google ScholarÂ
Xie, F. et al. Robust enhancer-gene regulation identified by single-cell transcriptomes and epigenomes. Cell Genom.Â 3, 100342 (2023).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Zhang, L.-Y. et al. Effects of expression level of DNA repair-related genes involved in the NHEJ pathway on radiation-induced cognitive impairment. J. Radiat. Res. 54, 235â€“242 (2013).
ArticleÂ ADSÂ CASÂ PubMedÂ Google ScholarÂ
Siuciak, J. A. et al. Genetic deletion of the striatum-enriched phosphodiesterase PDE10A: evidence for altered striatal function. Neuropharmacology 51, 374â€“385 (2006).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Bronstein, R., Kyle, J., Abraham, A. B. & Tsirka, S. E. Neurogenic to gliogenic fate transition perturbed by loss of HMGB2. Front. Mol. Neurosci. 10, 153 (2017).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Cebrian-Silla, A. et al. Single-cell analysis of the ventricular-subventricular zone reveals signatures of dorsal and ventral adult neurogenesis. eLife 10, e67436 (2021).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Watson, C. & Puelles, L. Developmental gene expression in the mouse clarifies the organization of the claustrum and related endopiriform nuclei. J. Comp. Neurol. 525, 1499â€“1508 (2017).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Mukherjee, C. et al. Oligodendrocytes provide antioxidant defense function for neurons by secreting ferritin heavy chain. Cell Metab. 32, 259â€“272 (2020).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Wu, Z. et al. CD3+ CD4-CD8-(double-negative) T cells in inflammation, immune disorders and cancer. Front. Immunol. 13, 816005 (2022).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Seung, E. et al. A trispecific antibody targeting HER2 and T cells inhibits breast cancer growth via CD4 cells. Nature 603, 328â€“334 (2022).
ArticleÂ ADSÂ CASÂ PubMedÂ Google ScholarÂ
Sewald, X. et al. Retroviruses use CD169-mediated trans-infection of permissive lymphocytes to establish infection. Science 350, 563â€“567 (2015).
ArticleÂ ADSÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Zhang, X. et al. The Hippo pathway oncoprotein YAP promotes melanoma cell invasion and spontaneous metastasis. Oncogene 39, 5267â€“5281 (2020).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Savorani, C. et al. A dual role of YAP in driving TGFÎ²-mediated endothelial-to-mesenchymal transition. J. Cell Sci. 134, jcs251371 (2021).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Fukunaga-Kalabis, M. et al. Tenascin-C promotes melanoma progression by maintaining the ABCB5-positive side population. Oncogene 29, 6115â€“6124 (2010).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Phan, H. T., Kim, N. H., Wei, W., Tall, G. G. & Smrcka, A. V. Uveal melanomaâ€“associated mutations in PLCÎ²4 are constitutively activating and promote melanocyte proliferation and tumorigenesis. Sci. Signal. 14, eabj4243 (2021).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Zeng, P. et al. CPEB2 enhances cell growth and angiogenesis by upregulating ARPC5 mRNA stability in multiple myeloma. J. Orthop. Surg. Res. 18, 384 (2023).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Pak, B. J. et al. Radiation resistance of human melanoma analysed by retroviral insertional mutagenesis reveals a possible role for dopachrome tautomerase. Oncogene 23, 30â€“38 (2004).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
More, S. et al. Secreted Apoe rewires melanoma cell state vulnerability to ferroptosis. Sci. Adv. 10, eadp6164 (2024).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Rodig, S. J. et al. MHC proteins confer differential sensitivity to CTLA-4 and PD-1 blockade in untreated metastatic melanoma. Sci. Transl. Med. 10, eaar3342 (2018).
ArticleÂ PubMedÂ Google ScholarÂ
Johnson, D. B. et al. Melanoma-specific MHC-II expression represents a tumour-autonomous phenotype and predicts response to anti-PD-1/PD-L1 therapy. Nat. Commun. 7, 10582 (2016).
ArticleÂ ADSÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Chen, H., Li, D. & Bar-Joseph, Z. SCS: cell segmentation for high-resolution spatial transcriptomics. Nat. Methods 20, 1237â€“1243 (2023).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Pang, M., Roy, T. K., Wu, X. & Tan, K. CelloType: a unified model for segmentation and classification of tissue images. Nat. Methods 22, 348â€“357 (2025).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
He, B. et al. Integrating spatial gene expression and breast tumour morphology via deep learning. Nat. Biomed. Eng. 4, 827â€“834 (2020).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Xie, R. et al. Spatially resolved gene expression prediction from histology images via bi-modal contrastive learning. Adv. Neural Inf. Process. Syst. 36, 70626â€“70637 (2023).
Google ScholarÂ
Salehi, A. & Davulcu, H. Graph attention auto-encoders. In 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), (eds Alamaniotis, M. & Pan, S.) 989â€“996. (IEEE Computer Society, 2020).
Zhao, E. et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat. Biotechnol. 39, 1375â€“1384 (2021).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Radford, A. et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 8748â€“8763 (PMLR, 2021).
Vicari, M. et al. Spatial multimodal analysis of transcriptomes and metabolomes in tissues. Nat. Biotechnol. 42, 1046â€“1050 (2024).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
10x Genomics. Visium CytAssist gene and protein expression library of human breast cancer, IF, 6.5 mm (FFPE) dataset. 10x Genomics [https://www.10xgenomics.com/resources/datasets/gene-and-protein-expression-library-of-human-breast-cancer-cytassist-ffpe-2-standard] (2023).
Wolf, F. A., Angerer, P. & Theis, F. J. Scanpy: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Bredikhin, D., Kats, I. & Stegle, O. MUON: multimodal omics analysis framework. Genome Biol. 23, 42 (2022).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech.Â 2008, P10008 (2008).
ArticleÂ Google ScholarÂ
Scrucca, L., Fop, M., Murphy, T. B. & Raftery, A. E. mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J. 8, 289 (2016).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics 39, btac757 (2023).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Miao, J. & Li, J. cuhklinlab/MultiGATE: v0.1.0. Zenodo. https://doi.org/10.5281/zenodo.16310792 (2025).

Download references

Acknowledgements

We thank Prof. Hongyu Zhao from Yale University for his valuable and constructive suggestions on the manuscript. This work is supported by the Chinese University of Hong Kong startup grant (4930181 to Z.L.), the Chinese University of Hong Kong Science Facultyâ€™s Collaborative Research Impact Matching Scheme (CRIMS 4620033 to Z.L.), the Chinese University of Hong Kong direct grants (4053540 to Z.L., 4053586 to Z.L.), and Research Grants Council, University Grants Committee (GRF 14301120 to Z.L., GRF 14300923 to Z.L.). This work is supported in part by the Innovation and Technology Commission (ITCPD/17-9 to C.Y.); Hong Kong Research Grants Council Grants (16301419 to C.Y., 16308120 to C.Y., 16307221 to C.Y., 16307322 to C.Y., 16302823 to C.Y., 16309424 to C.Y.); The Hong Kong University of Science and Technology Startup Grants (R9405 to C.Y.) and the Big Data Institute (Z0428 to C.Y.). This work is supported in part by the National Key Research and Development Project of China (Grant No. 2023YFF1204802 to Y.Z.), National Science and Technology Innovation 2030 Major Program (Grant No. 2021ZD0200100 to Y.Z.). This work is supported in part by the Open Research Fund of Key Laboratory of Mathematical Sciences (Central China Normal University), P. R. China (MPL2025ORG013 to J.T.).

Author information

These authors contributed equally: Jishuai Miao, Jinzhao Li.

Authors and Affiliations

Department of Statistics and Data Science, The Chinese University of Hong Kong, Hong Kong SAR, China
Jishuai Miao,Â Jinzhao Li,Â Jingxue Xin,Â Muyang Ge,Â Ji Qi,Â Xiaocheng ZhouÂ &Â Zhixiang Lin
School of Science, Hubei University of Technology, Wuhan, China
Jiajuan Tu
State Key Laboratory of Brain Function and Disorders, MOE Frontiers Center for Brain Science, Institutes of Brain Science and Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, China
Ying Zhu
Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
Can Yang
State Key Laboratory of Nervous System Disorders, The Hong Kong University of Science and Technology, Hong Kong SAR, China
Can Yang
CUHK Shenzhen Research Institute, Shenzhen, China
Zhixiang Lin

Authors

Jishuai Miao
View author publications
Search author on:PubMedÂ Google Scholar
Jinzhao Li
View author publications
Search author on:PubMedÂ Google Scholar
Jingxue Xin
View author publications
Search author on:PubMedÂ Google Scholar
Jiajuan Tu
View author publications
Search author on:PubMedÂ Google Scholar
Muyang Ge
View author publications
Search author on:PubMedÂ Google Scholar
Ji Qi
View author publications
Search author on:PubMedÂ Google Scholar
Xiaocheng Zhou
View author publications
Search author on:PubMedÂ Google Scholar
Ying Zhu
View author publications
Search author on:PubMedÂ Google Scholar
Can Yang
View author publications
Search author on:PubMedÂ Google Scholar
Zhixiang Lin
View author publications
Search author on:PubMedÂ Google Scholar

Contributions

Z.L. and C.Y. supervised this study. J.L., J.M., and Z.L. proposed and developed MultiGATEâ€™s computational method. J.M., J.L., and J.X. conducted data analysis. C.Y., Y.Z., J.T., M.G., J.Q., and X.Z. provided advice and guided data analysis. J.M., J.X., and J.L. developed and released MultiGATE software package. J.M., J.L., J.X., Y.Z., C.Y., and Z.L. drafted the manuscript. J.M., C.Y., Y.Z., J.T., M.G., J.Q., and X.Z. revised the manuscript.

Corresponding authors

Correspondence to Can Yang or Zhixiang Lin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Yuzhou Chang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisherâ€™s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Transparent Peer Review file

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Miao, J., Li, J., Xin, J. et al. MultiGATE: integrative analysis and regulatory inference in spatial multi-omics data via graph representation learning. Nat Commun 16, 9403 (2025). https://doi.org/10.1038/s41467-025-63418-x

Download citation

Received: 13 December 2024
Accepted: 15 August 2025
Published: 24 October 2025
DOI: https://doi.org/10.1038/s41467-025-63418-x