Abstract
New spatial multi-omics technologies, which jointly profile transcriptome and epigenome/protein markers for the same tissue section, expand the frontiers of spatial techniques. Here, we introduce MultiGATE, which utilizes a two-level graph attention auto-encoder to integrate the multi-modality and spatial information in spatial multi-omics data. The key feature of MultiGATE is that it simultaneously performs embedding of the spatial pixels and infers the cross-modality regulatory relationship, which allows deeper data integration and provides insights on transcriptional regulation. We evaluate the performance of MultiGATE on spatial multi-omics datasets obtained from different tissues and platforms. Through effectively integrating spatial multi-omics data, MultiGATE both enhances the extraction of latent embeddings of the pixels and boosts the inference of transcriptional regulation for cross-modality genomic features.
Introduction
In recent years, the advent of spatial transcriptomic technologies has revolutionized our understanding of gene expression patterns within complex tissues and organs. These techniques profile gene expression levels in their native spatial context, offering rich insights into the organization and heterogeneity of biological tissues. New spatial multi-omics technologies, which jointly profile spatial transcriptome and epigenome/protein markers for the same tissue section, have expanded the frontiers of spatial techniques: spatial ATACâRNAâseq and spatial CUT&Tag-RNA-seq1 jointly profile spatial transcriptome and epigenome; Slide-tags2 jointly profiles open chromatin, RNA, and T-cell receptor sequences in single nuclei, together with the spatial barcode of the nuclei; Spatial Protein and Transcriptome Sequencing (SPOTS)3 is a technology that jointly profiles RNA and protein markers. Compared to single-omics technologies, spatial multi-omics provides additional molecular modality for more comprehensive profiling of the tissue and allows a deeper understanding of the regulatory relationship across different molecular modalities in the central dogma of molecular biology, within the native spatial context of the tissue.
Methods have been developed for spatial transcriptomic data, including SpaGCN4, STAGATE5, and SpatialPCA6. These methods, utilizing either graph neural networks or spatially-aware matrix factorization, have achieved precise spatial clustering of the pixels, thereby facilitating our understanding of spatial gene expression patterns. However, there is a pressing need to utilize the information embedded within additional omics layers in spatial multi-omics data. Methods have been developed for single-cell multi-omics data7,8,9,10,11,12. Among them, Seurat WNN7 provides an unsupervised framework to learn the relative importance of each molecular modality in each cell, enabling integrative analysis of single-cell multi-omics data. However, it was not designed for spatial multi-omics and does not account for the spatial information, which is crucial in the study of spatial data. MOFA+8 is designed for the integration of single-cell multi-modal data, using a linear factor model that decomposes the input matrices into the product of low-rank matrices (weight matrices and low-dimensional representation matrices); this linear factor model does not consider spatial information and the cross-modality regulatory relationship. totalVI9 is designed for CITE-seq data (RNAâ+âsurface protein); it uses the variational autoencoder (VAE) framework to model gene expression raw counts with negative-binomial (NB) distribution and the protein counts as an NB mixture of foreground and background signal; the VAE framework in totalVI does not incorporate spatial information in the spatial multi-omics data. MultiVI10 is based on a conditional VAE and models each modality using a specific distribution: NB distribution for gene expression raw counts and Bernoulli distribution for chromatin accessibility; this conditional VAE does not consider spatial information and the cross-modality regulatory relationship. JSNMF11 employs a joint semi-orthogonal nonnegative matrix factorization (NMF) model to learn separate latent factor matrices for each modality and then fuses them via a consensus cell-cell similarity graph, and the NMF model does not incorporate the spatial information. In contrast, a more recent method, SpatialGlue13, performs an integrated analysis of spatial multi-omics data using a graph convolution network to obtain embeddings for each molecular modality separately, and then employs attention mechanisms to integrate information from different modalities. However, SpatialGlue uses principal components as the input and does not model the cross-modality regulation in the embedding of pixels. To fully utilize the information in spatial multi-omics data and realize its potential in uncovering the regulatory relationship across different molecular modalities, it is crucial to model the connection of the cross-modality features.
In response to the critical need for integrative tools in spatial multi-omics, we have developed MultiGATE, and its core is a two-level graph attention autoencoder designed to (i) infer cross-modality regulatory relationships (including cis-regulation, trans-regulation and proteinâgene interactions) and (ii) embed spatial pixels into a latent space for clustering/spatial domain identification and visualization. By modeling regulatory links (such as peakâgene associations, proteinâgene interactions, and enzymeâmetabolite associations) directly into its graph attention mechanism, MultiGATE learns more informative low-dimensional representations of each spatial pixel/spot by deeper integrating different modalities. In turn, these refined embeddings sharpen the attention scores between cross-modality features, yielding more accurate inference of cross-modality regulation. The analysis of the cross-modality regulatory relationship enabled by MultiGATE provides unique insight in studying transcriptional regulation in the native tissue context powered by spatial multi-omics data (Detailed methodological innovations of MultiGATE are described in Supplementary Note S1). We demonstrate the superior performance of MultiGATE through various spatial multi-omics datasets generated from different tissues and technologies. Our results highlight the ability of MultiGATE to effectively capture the regulatory relationships across different molecular modalities while providing enhanced latent embeddings of the pixels for accurate spatial clustering.
Results
Method overview
The core of MultiGATE is a two-level graph attention autoencoder that simultaneously embeds the pixels/spots in a low-dimensional space and models the cross-modality feature regulatory relationships (e.g., peakâgene, proteinâgene, and metaboliteâgene). At the first level, a cross-modality attention mechanism is employed to model the cross-modality regulatory relationship; at the second level, a within-modality attention mechanism is utilized to incorporate spatial information, which encourages the embedding of neighboring pixels to be similar. To further improve the cross-modality data integration, MultiGATE also incorporates a Contrastive Language-Image Pretraining (CLIP) loss, which encourages the alignment of the embeddings from different modalities.
The holistic structure of MultiGATE enables the aggregation and alignment of spatial information and molecular features from each modality. MultiGATE effectively extracts the low-dimensional representations of the pixels for clustering/spatial domain identification and data visualization, and simultaneously unravels the regulatory relationships of the cross-modality features for cis-regulation, trans-regulation, and protein-gene interactions (Fig. 1).
MultiGATE is a two-level graph attention auto-encoder designed for spatial multi-omics analysis. It extracts the latent embeddings of the pixels/spots in spatial multi-omics data, while simultaneously incorporating the regulatory relationship of the cross-modality features through the cross-modality attention mechanism and the spatial relationship of the pixels/spots through the within-modality attention mechanism. In addition to reconstruction loss, a CLIP contrastive loss aligns embeddings across modalities. MultiGATE yields (i) latent representations of pixels for clustering and visualization and (ii) cross-modality attention scores for cross-modality regulatory inference.
MultiGATE unveils the functional landscape of the adult human hippocampus
We first employed MultiGATE to analyze the Spatial ATACâRNAâseq dataset1 generated from the adult human brain hippocampus, where chromatin accessibility and gene expression are simultaneously profiled at different spatial pixels. We manually annotated the hippocampus layers and white matter (Fig. 2a) based on guidance from previous studies14,15,16 and marker genes. These annotations served as the ground truth for evaluating the performance of different methods. We calculated the Adjusted Rand Index (ARI) for each method and compared their clustering accuracy. The results were obtained using pixels that underwent preprocessing in SpatialGlue13, which filters out low-quality pixels, and we removed the pixels that were filtered out by SpatialGlue when computing the ARI for each method to ensure a fair comparison. Of all the methods evaluated, MultiGATE achieved the highest accuracy in detecting the layer structure within the human hippocampus (ARI: MultiGATE 0.60, SpatialGlue 0.36, Seurat WNN 0.23, MOFA+ 0.10, and MultiVI 0.14; Fig. 2b and Supplementary Fig. 1A) and superior performance in other quantitative metrics (Supplementary Fig. 2A). Upon visually inspecting the clustering results in Fig. 2b (A unified color scheme is applied across methods using label matching, as detailed in Supplementary Note S2), MultiGATE provided a clearer deciphering of the molecular layer (ML) and the choroid plexus compared to SpatialGlue. On the other hand, Seurat WNN does not incorporate the spatial information, resulting in a clustering outcome that lacks spatial organization.
a Bright-field image and manually annotated segmentation of hippocampus layers and white matter (WM) in the human hippocampus. b Spatial clustering of hippocampal regions using MultiGATE, SpatialGlue, and Seurat WNN. Clustering performance is assessed using the Adjusted Rand Index (ARI), with higher values indicating greater clustering accuracy. c Box plots representing attention scores for peakâgene pairs across different genomic distances, grouped based on whether they are supported by expression quantitative trait loci (eQTL) evidence. The box plots indicate the medians (centerlines), means (triangles), first and third quartiles (bounds of boxes), and 1.5âÃâinterquartile range (whiskers). Sample sizes per bin (False/True): 0â25âkb (621/222), 25â50âkb (479/88), 50â75âkb (461/78), 75â100âkb (469/44), 100â125âkb (446/30), 125â150âkb (405/29). d Receiver operating characteristic (ROC) curves comparing the performance of MultiGATE and other methods in predicting eQTL-associated regulatory interactions. e Visualization of MultiGATE-predicted cis-regulatory interactions for the target genes CA12 and PRKD3 along with eQTL evidence. Source data are provided as a Source Data file.
MultiGATE simultaneously models the peakâgene association in embedding the pixels in a low-dimensional space, distinguishing itself from SpatialGlue13 and Seurat WNN7, which rely on principal components as input and do not model peakâgene associations. By incorporating a graph-based representation of regulatory interactions, MultiGATE employs a Bayesian-like approach that combines prior knowledge of genomic distance with the observed spatial multi-omics data to estimate cross-modality attention, allowing for modeling spatial information and the multi-omics genomic features in the attention mechanism simultaneously (âMethodsâ).
To assess whether the attention mechanism in MultiGATE can accurately capture the cis-regulatory interactions, we compared the attention score estimated by MultiGATE with external eQTL data17, which has been widely used to identify genetic variants that influence gene regulation18. For the human hippocampus data, the attention scores estimated by MultiGATE decrease as genomic distance increases, whereas the peakâgene pairs supported by eQTL data from the human hippocampus demonstrate higher attention scores, compared to those that are not supported (Fig. 2c). This suggests that MultiGATE effectively captures genuine cis-regulatory interactions. Next, we used the human hippocampus eQTL data17 as the benchmark, and compared the performance for identifying peakâgene association between MultiGATE and other methods (Fig. 2d): MultiGATE achieved the best AUROC score (0.703), compared to Cicero19 (AUROCâ=â0.530), Spearman correlation (AUROCâ=â0.515), and LASSO regression (AUROCâ=â0.501). To further showcase the biological relevance of the peakâgene interactions identified by MultiGATE, we examined genes with well-established functional importance in the hippocampus, including CA12 and PRKD3. CA12 is a carbonic anhydrase in hippocampal neurons, regulates pH, and supports neuronal excitability20,21. PRKD3 is associated with synaptic function and Alzheimerâs disease22,23. For each of these genes, MultiGATE successfully identified peakâgene associations that are supported by hippocampus-specific eQTLs (Fig. 2e), whereas non-significant associations are generally not supported by eQTL signals. Moreover, MultiGATE can be extended to detect long-range interactions between enhancers and promoters: across eight distance bins (0â150âkb, 150â300âkb, â¦, >â1.25âMb), MultiGATE can distinguish the brain-specific enhancer-promoter contacts from those in other tissues across 24 diverse human tissues (Supplementary Note S3). In addition, MultiGATE has been extended to capture trans-regulatory interactions by incorporating TF binding priors and learning TFâpeakâgene attention scores. Using ChIP-seq data for SOX2 in brain tissue as ground truth, our method outperformed motif-only, TF binding potential, and cosine similarity baselines in predicting SOX2 binding, achieving an AUC of 0.8669 and AUPR of 0.4906 (Supplementary Note S4). In summary, MultiGATE enables regulatory inference and enhances our understanding of cis-regulatory and trans-regulatory interactions by effectively mining multi-omics spatial data to precisely infer regulatory networks.
Finally, to explore the biological relevance of the identified clusters, we examined differentially expressed genes (DEGs) for each cluster identified by MultiGATE in Supplementary Fig. 3B. For instance, PPFIA2 displayed differential expression and showed high expression levels in the granule cell layer, aligning with previous findings24. Known molecular markers of the hippocampus, such as SLC1A2, were specifically expressed in the ML14 identified by MultiGATE. Furthermore, PLP1 encodes a major component of myelin protein in the central nervous system25. And it is highly expressed in the stratum lacunosum-moleculare region identified by MultiGATE. Gene Ontology (GO) analysis of the DEGs from this region revealed significant enrichment for myelination (Supplementary Fig. 3C), aligning with the known function of PLP1. These findings underscore the biological relevance and accuracy of the clusters identified by MultiGATE.
MultiGATE reveals a layered pattern that mimics the annotated structure in the mouse brain while enabling regulatory inference
We further employed MultiGATE to analyze the spatial ATACâRNAâseq dataset1 generated from mouse postnatal day 22 (P22) brains. We utilized a P56 coronal annotation provided by the Allen Mouse Brain Atlas as a reference in Fig. 3a.
a Annotated coronal section of a P56 mouse brain from the Allen Mouse Brain Atlas. b Spatial clustering of brain regions in a P22 mouse brain using MultiGATE, SpatialGlue, and Seurat WNN, displaying distinct regional segmentation by each method. c Box plots showing attention scores for peakâgene pairs across genomic distances, grouped by whether they are supported by EnhancerAtlas, a multi-species database linking enhancers to target genes. The ROC curves compare the predictive performance of MultiGATE and other methods in identifying enhancer-gene interactions supported by EnhancerAtlas. The box plots indicate the medians (centerlines), means (triangles), first and third quartiles (bounds of boxes), and 1.5âÃâinterquartile range (whiskers). Sample sizes per bin (False/True): 0â25âkb (1714/236), 25â50âkb (1735/161), 50â75âkb (1509/167), 75â100âkb (1344/132), 100â125âkb (1265/101), 125â150âkb (1075/93). d Similar analysis as in (c) for peakâgene pairs evaluated by EGAS, an enhancer-gene association study employing a permutation-based approach to minimize false positives, validated through chromatin conformation and enhancer perturbation experiments. The box plots indicate the medians (centerlines), means (triangles), first and third quartiles (bounds of boxes), and 1.5âÃâinterquartile range (whiskers). Sample sizes per bin (False/True): 0â25âkb (380/39), 25â50âkb (342/35), 50â75âkb (324/21), 75â100âkb (304/17), 100â125 kb (276/6), 125â150âkb (223/2). e Visualization of cis-regulatory interactions predicted by MultiGATE for the target gene Xrcc5, supported by evidence from EnhancerAtlas. f Spatial expression patterns of differentially expressed genes (DEGs) in specific brain clusters of the P22 mouse brain. Source data are provided as a Source Data file.
We aligned the clustering results at the profiled region within the whole tissue slide in Fig. 3b to enhance clarity. To quantitatively evaluate clustering results, we computed the Intraclass Correlation Coefficient (ICC)26. The ICC quantifies the consistency of observations within each cluster. Higher ICC values reflect greater intra-cluster homogeneity for a given modality, indicating improved clustering accuracy27. MultiGATE achieves the highest combined ICC score among all methods compared, outperforming SpatialGlue, Seurat WNN, MOFA+, and MultiVI (Supplementary Figs. 2C, 1B, and 4A). Both MultiGATE and SpatialGlue exhibit the ability to detect the genu of corpus callosum (ccg), lateral ventricle (VL), caudoputamen (CP), olfactory limb (aco), and lateral preoptic area, as well as accurately identifying the six layers present in the cortex. MultiGATE more accurately discriminates the outermost cortical layer (Cluster 5, shown in red). In contrast, Cluster 5 in SpatialGlue shows no cluster-enriched markers (Supplementary Note S5), indicating a heterogeneous mixture of multiple cell types. More specifically, Cluster 5 in SpatialGlue contains pixels in the LS nucleus region1, which are molecularly different from the other pixels in Cluster 5 (Fig. 3b and Supplementary Note S5). This highlights the enhanced precision and reliability offered by MultiGATE in cortical layer analysis. In contrast, Seurat WNN does not consider the spatial information and the clustering result does not have a clear spatial organization of the regions (Fig. 3b).
To validate MultiGATEâs attention mechanism in inferring peakâgene regulation, we utilized two validation datasets: EnhancerAtlas28 and EGAS29 for enhancer-gene regulation. In Fig. 3c, d, the attention scores estimated by MultiGATE decrease as genomic distances increase; Moreover, higher attention scores were observed for the peakâgene pairs where the peak overlaps with an enhancer that regulates the corresponding gene. These results are attributed to the adaptive learning of attention, which incorporates prior weights that depend on genomic distance and the observed multi-omics data. Next, we compared the performance on identifying peakâgene regulatory pairs between MultiGATE and other methods. MultiGATE outperformed Cicero19, Spearman correlation and lasso regression, as indicated by the highest AUROC scores in Fig. 3c, d. We next demonstrate the regulatory relationship for the gene Xrcc5. The gene Xrcc5 is known for its involvement in DNA repair mechanisms in the mouse brain30. MultiGATE identified a peak that regulates Xrcc5 (indicated by the red curve) and this peak is about 90âkb away from the gene body of Xrcc5; This peak also overlaps with an enhancer that regulates Xrcc5 in the Enhancer Atlas database; The other peaks near Xrcc5 are not predicted to regulate Xrcc5 in MultiGATE and neither of these peaks is supported in the Enhancer Atlas dataset (Fig. 3e). This observation and the overall distribution of attention scores in Fig. 3c, d demonstrate the superior performance of MultiGATE in regulatory inference.
To demonstrate that MultiGATE effectively combines the information in multiple modalities, we also inspected the clustering result for each molecular modality separately5. There are substantial differences between the clustering outcomes of the two individual modalities: spatial clusters generated by ATAC-seq provide more informative patterns regarding contextual layering and internal structure (Supplementary Fig. 3D); RNA-seq also contributes valuable information, especially the identification of the outermost layer (Supplementary Fig. 3D). MultiGATE integrates both modalities to produce more accurate and comprehensive clustering results. By incorporating multi-modal information, MultiGATE capitalizes on the strength of ATAC-seq and RNA-seq, leading to enhanced performance in clustering analysis (Fig. 3b).
Furthermore, we have successfully identified marker genes associated with the spatial clusters identified by MultiGATE (Supplementary Note S6), including marker genes for the regions ccg, CP, and VL, as shown in Fig. 3f. Pde10a is highly expressed in CP, which is part of the striatum dorsal region, consistent with the previous findings31. Hmgb2 was shown to be expressed in neural stem/progenitor cells, which are highly active in the VL zone and contribute to the balance between neurogenesis and gliogenesis32, consistent with the observed high expression of Hmgb2 in the VL region in the spatial multi-omics data (Fig. 3f). Top2a is highly expressed in the VL region as shown in Fig. 3f, which supports previous findings that Top2a is associated with mitosis and cell cycle regulation and is highly expressed in the VL region33. Nr4a2 is highly expressed in the EPd region (Fig. 3f), consistent with previous findings that Nr4a2-expressing cells migrate to form the dorsal endopiriform nucleus34. Fth1 is highly expressed in the corpus callosum, particularly in oligodendrocytes, where it plays a critical role in neuroprotection and the maintenance of axonal health35, which aligns well with our observation that Fth1 is highly expressed in the CP region.
These expression patterns and the above regulatory inference derived from spatial analysis contribute to a more profound understanding of the diverse regions within the brain.
MultiGATE reveals spatial patterns in diverse spatial multi-omics technologies
SPOTS3 is a spatial multi-omics technology that jointly profiles whole transcriptome and protein markers while preserving tissue architecture. We first analyzed a SPOTS dataset3 generated from murine spleen, which consists of distinct cellular populations organized into well-defined structures such as germinal centers (GCs), marginal zones (MZs), and red pulp (The bright-field image is shown in Fig. 4a). The clustering results of MultiGATE, SpatialGlue, and Seurat WNN are shown in Fig. 4b. MultiGATE achieves the highest combined ICC score among the five methods, outperforming SpatialGlue, Seurat WNN, MOFA+, and totalVI (Supplementary Figs. 2D, 7C, and 8B). MultiGATE identified five clusters: T cell, B cell, and three distinct macrophage subtypes. Spatially, the T cells (purple) are located at the center, surrounded by B cells (green), and further encircled by successive layers of White Pulp Macrophages (WPM), Marginal Zone Macrophages (MZM), and Red Pulp Macrophages (RPM) (Supplementary Note S7 and Supplementary Fig. 5A, B). This anatomically consistent layering reflects spleen structure and highlights MultiGATEâs ability to distinguish spatially organized immune populations.
a Histology image of the mouse spleen replicate 1 sample, highlighting regions corresponding to magnified comparison region in (d). b Spatial clustering of immune cells in the mouse spleen as identified by MultiGATE, SpatialGlue, and Seurat WNN. c Box plots showing CD3 expression levels in T cells and B cells across clusters identified by MultiGATE and SpatialGlue. Box plots display the median (centerlines), interquartile range (box bounds), and 1.5âÃâinterquartile range (whiskers). n (MultiGATE/SpatialGlue): T cells 259/360; B cells 579/705. Statistical significance was assessed using two-sided MannâWhitneyâWilcoxon tests. Exact p values: MultiGATE vs. SpatialGlue in B cell cluster: Pâ=â2.81âÃâ10â1; MultiGATE vs. SpatialGlue in T cell cluster: Pâ=â1.22âÃâ10â5. No correction for multiple comparisons was applied. Significance: ns: Pâ>â0.05; *Pââ¤â0.05; **Pââ¤â0.01; ***Pââ¤â0.001; ****Pââ¤â0.0001. d Magnified comparison of CD3 and CD19 expression in regions identified by MultiGATE and SpatialGlue. e ADT (Antibody-Derived Tags) signatures associated with each spatial cluster, revealing distinct immune cell identities. Source data are provided as a Source Data file.
Notably, MultiGATE demonstrates a more precise clustering of T cells and B cells compared to SpatialGlue. We evaluated this by comparing the ADT expression of the CD3 protein, a canonical T cell marker, across T cell and B cell clusters identified by each method (Fig. 4c). Since CD3 is not expressed in B cells, a method that better separates these populations should yield a greater difference in CD3 expression between the two clusters. Indeed, the clustering of MultiGATE shows significantly greater separation in CD3 expression (Wilcoxon Pâ=â1âÃâ10â6, rank-biserial râ=ââ0.953, and Cliffâs δâ=â0.953) than SpatialGlue (Wilcoxon Pâ=â1âÃâ10â6, rank-biserial râ=ââ0.866, and Cliffâs δâ=â0.866; Supplementary Note S8). Additionally, within the T cell cluster itself, CD3 expression is significantly higher in the clustering produced by MultiGATE than SpatialGlue (Wilcoxon Pâ=â1.2âÃâ10â5; Supplementary Note S8), suggesting more precise identification of T cells. To visually compare clustering performance, we examined three annotated regions in the bright-field image (Fig. 4a). The T cell clusters (purple pixels) identified by MultiGATE show stronger concordance with CD3 protein expression than those identified by SpatialGlue (Fig. 4d). Furthermore, SpatialGlue identifies a significantly higher number of B cells (outliers in Fig. 4c) with elevated CD3 expressionâlikely T cells misclassified as B cells. In regions 2 and 3 (Fig. 4d), the highlighted pixels show high CD3 expression and low CD19 expression, indicating that these pixels should be categorized as T cells, which MultiGATE correctly identifies. In summary, these results suggest that MultiGATE achieves a more precise separation of T cells and B cells. Its spatial clustering result also aligns well with the protein-based deconvolution (Supplementary Note S9). In Fig. 4e, the T cell cluster shows strong CD3, CD4, and CD8 expression, consistent with known T cell profiles: CD8 for cytotoxic T lymphocytes36, and CD3/CD4 crucial for anti-tumor responses37. Additionally, CD169-Siglec is highly expressed in the MZM cluster (Fig. 4e), aligning with its known function as an I-type lectin that recognizes gangliosides and captures viruses38.
We also implemented Seurat WNN on this dataset (Fig. 4b). Even at the lowest resolution of 0.1 for Louvain clustering, Seurat WNN identified seven clusters, roughly capturing the T cell (Cluster 4) and B cell (Cluster 3) populations. However, because it does not consider the spatial information, its overall clustering result lacks the layered anatomical pattern in murine spleen-particularly for the macrophage clusters. For example, Cluster 5 likely corresponds to MZM, yet its pixels are scattered and mixed with other clusters, lacking clear layered pattern. In contrast, MultiGATE recovers the expected spatial pattern of macrophage subtypes relative to GCs: WPM appear closest, followed by MZM, and RPM farthest (Supplementary Fig. 5B and Supplementary Note S7). Beyond spatial clustering, MultiGATE identifies regulatory associations by linking proteins to related gene sets via cross-modality attention. Applied to CD3, it assigned higher attention scores to CD3-related genes than to B cell-related, macrophage-related, or random controls, confirming its ability to recover meaningful gene-protein relationships (Supplementary Note S10).
Next, we employed MultiGATE to analyze a Slide-tags dataset2 generated from metastatic melanoma samples, where open chromatin and RNA expression were profiled in single nuclei, along with spatial barcodes providing the spatial location of the nuclei. This dataset primarily consisted of two tumor clusters, as annotated in the original study2 (Supplementary Fig. 6A). Consistent with these annotations, MultiGATE, along with SpatialGlue, and Seurat WNN accurately divided the tumor cells into two clusters (Supplementary Fig. 6B and D). Further analysis of DEGs identified by MultiGATE between tumor Cluster 1 and Cluster 2 revealed distinct expression patterns (Supplementary Fig. 6C and E). Cluster 1 was characterized by a mesenchymal-like state, with elevated expression and chromatin accessibility of TNC, a marker of invasion and metastasis, alongside PLCB4 and CPEB2, which support YAP signalingâa key driver of epithelial-mesenchymal transition and cellular plasticity in melanoma39,40,41,42,43. Cluster 1 also showed downregulation of MHC genes (CTSB, HSP90AA1, HSP90AB1, and LGMN; Supplementary Fig. 6E), suggesting immune evasion and potential resistance to immunotherapy, a trait often linked to aggressive tumor behavior41. In contrast, Cluster 2 displayed a melanocytic-like state with high expression of DCT and APOE, both implicated in treatment resistance via maintenance of melanocyte identity and ferroptosis evasion44,45. This melanocytic state was further supported by the significant enrichment of Cluster 2 upregulated genes (in Supplementary Fig. 6E) in pigmentation (GO:0043473; Hypergeometric Pâ=â1.38âÃâ10â4) and in MITF-regulated pathways (R-HSA-9730414; Hypergeometric Pâ=â6.47âÃâ10â6), where MITF is a master regulator of melanocyte identity and drives melanocyte development. Additionally, Cluster 2 showed a relative upregulation of MHC genes within the KEGG antigen processing and presentation pathway, with 7 out of 27 genes showing a fold change greater than 1.2 (Hypergeometric test, Pâ=â4.63âÃâ10â6), indicating enhanced antigen presentation. This increased immune visibility likely promotes greater interaction with cytotoxic T cells, contributing to a more proliferative and therapy-sensitive phenotype in Cluster 246,47. Together, these findings reveal that heterogeneity in melanoma identified by MultiGATE reflects distinct cell states with divergent therapeutic vulnerabilities.
To further demonstrate MultiGATEâs versatility across measurement technologies and tissue types, we performed three additional evaluations: (1) integration of spatial transcriptomics and metabolomics (Supplementary Fig. 7 and Supplementary Note S11), showing that MultiGATE can jointly analyze modalities without genomic distance priors; (2) analysis of spatial RNAâ+âprotein data in human breast cancer (Supplementary Fig. 8 and Supplementary Fig. 2B); and (3) a simulated breast cancer-patterned spatial ATACâ+âRNA dataset (Supplementary Fig. 9 and Supplementary Note S12), confirming its applicability to tissues lacking stereotypical structure. In all cases, MultiGATE outperformed competing methods, accurately recovering spatial domains.
Discussion
Spatial multi-omics analysis has emerged as a powerful approach for understanding the interplay between genes, chromatin accessibility, protein, and spatial organization in biological systems. In this study, we propose MultiGATE, a two-level graph attention framework that leverages the graph attention mechanism to integrate spatial multi-omics data.
Compared to existing methods, MultiGATE provides a unique advantage in simultaneously embedding the pixels and inferring the cross-modality regulatory relationship, which is not available in alternative approaches such as SpatialGlue and Seurat WNN. The attention scores incorporate genomic distances as prior knowledge to model cross-modality regulatory interactions, providing new insights into transcriptional regulation. Furthermore, the latent representations learned by MultiGATE enable accurate spatial clustering and enhanced visualization.
In conclusion, MultiGATE is a significant approach in the analysis of spatial multi-omics data. Its integration of multi-omics and spatial information, generation of high-quality embeddings, and inference of regulatory interactions make it a valuable tool for unraveling gene regulation and spatial organization. We anticipate that MultiGATE will facilitate discoveries and advancements across various fields of biological research.
Beyond modeling cis-regulation, MultiGATE can also capture trans-regulatory interactions by incorporating TF binding priors and learning TFâpeakâgene attention scores. While initial results are promising, large-scale validation remains challenging due to the limited availability of high-quality ChIP-seq datasets in brain tissues. Future efforts will aim to broaden trans-regulatory validation as more tissue-specific datasets become accessible.
Currently, MultiGATE does not incorporate histology images into pixel embeddings, as only bright-field images were provided in the spatial ATACâRNAâseq datasets1. Previous studies have demonstrated that combining histology images with gene expression data improves cell segmentation48,49 and enables high-resolution gene expression50,51. As a future direction, subsequent iterations of MultiGATE could integrate an additional pre-trained encoder designed to process histology images.
Methods
The framework of MultiGATE
MultiGATE integrates spatial multi-omics data and the key feature of MultiGATE is that it provides both the cross-modality connection of the features, which reveals the regulatory relationship across modalities (for example, the peakâgene association in spatial ATACâRNAâseq data), and the latent embeddings of the spatial spots, which can be used for downstream analysis including clustering and visualization of the spatial spots. These are achieved through a two-level graph attention auto-encoder that seamlessly integrates the cross-modality features and the spatial proximity of the spots. The first-level auto-encoder employs the feature connectivity graph to integrate the features of different modalities in spatial multi-omics data. It effectively combines the multi-modality information and learns attention scores, which capture the cross-modality connection of the features, facilitating inference of the regulatory relationships. The second-level auto-encoders use the neighboring graph of the spots to capture the spatial information of the spots in each modality, respectively, which encourages spatial smoothness for the enhancement of the latent embeddings of the spots. The latent embeddings and cross-modality regulatory scores provided by MultiGATE play a vital role in facilitating subsequent integration and downstream analysis for spatial multi-omics data, thereby enabling a more comprehensive and insightful exploration of the data.
The two-level graph attention auto-encoder
Let \({{{\bf{X}}}}_{1}\in {{\mathbb{R}}}^{N\times p}\) be the preprocessed data matrix from one modality with N spots and p features, \({{{\bf{X}}}}_{2}\in {{\mathbb{R}}}^{N\times q}\) be the preprocessed data matrix from another modality with N spots and q features.
The first-level auto-encoder in MultiGATE, referred to as the cross-modality graph attention auto-encoder, employs the feature connectivity graph as a prior to integrate cross-modality features of different modalities in spatial multi-omics data. The feature connectivity graph is an adjacency matrix denoted by A, where each entry Aij represents the connection between feature i and feature j. The features in the adjacency matrix are the union of two modalities in spatial multi-omics data. For spatial ATACâRNAâseq data1, features are the union of genes and chromatin peaks. The value of Aij lies in the range of \(\left(1,e\right]\) and is determined by the base pair distance between peak i and gene j, with the value decreasing as the base pair distance increases, while being restricted within 150âkb. For SPOTS data3, features encompass the union of genes and proteins. In this scenario, Aijâ=â1 if and only if the gene i encodes the protein j or the subunits of protein j. In the feature connectivity graph, self-loops are added for each feature: Aiiâ=â1, for all i. The encoder in the first-level cross-modality auto-encoder generates the integrated feature matrix by aggregating information from the features that are connected across modalities through the feature connectivity graph A. More specifically, we transpose X1 and X2, concatenate \({{{\bf{X}}}}_{1}^{T}\) and \({{{\bf{X}}}}_{2}^{T}\) along the dimension of the features and obtain the merged data matrix \({{{\bf{X}}}}^{T}\in {{\mathbb{R}}}^{(p+q)\times N}\). Let \({{{\bf{X}}}}_{(f)}^{T}\) be the data vector for feature f, â f â {1, 2, ⯠â, p + q}. The encoder integrates the cross-modality features as follows:
where \({\bar{{{\bf{X}}}}}_{(f)}^{T}\) is the integrated data for feature f, Ï1 is the ReLU activation function, Af represents the set of features that are connected to feature f in the feature connectivity graph A, attgf is the attention score between feature g and feature f, and attff is the self-attention for feature f. These attention scores are learned adaptively based on a self-attention mechanism52. For cross-modality attentions, we first calculate the attention coefficient from feature f to feature g as follows:
where v1 and v2 are trainable parameters, and the Sigmoid activation function is used. Next, the attention scores are computed with a Bayesian-like approach:
where
distfg is the genomic distance between features f and g. bp_width is the bandwidth in the kernel set as 400 (A sensitivity analysis in Supplementary Note S13). Afg represents the prior knowledge that quantifies the connection between features f and g based on their genomic distance, and its value decays when the genomic distance increases; The attention coefficient efg represents the connection between the features derived from the observed data. In addition to the cross-modality attention attfg, we also incorporate a self-attention attff in computing \({\bar{{{\bf{X}}}}}_{(f)}^{T}\), to preserve the information in feature f itself. attff is computed the same as attfg, with distff set to 0 in computing Aff. In summary, the first-level auto-encoder integrates the cross-modality features, and the learned attention scores attfg reflect the strength of the connections between the cross-modality features and can be used for inferring their regulatory relationships.
Following the acquisition of integrated features \({\bar{{{\bf{X}}}}}^{T}\), we proceed to the second-level auto-encoders in MultiGATE, which are the within-modality graph attention auto-encoders. The second-level auto-encoders employ the spatial neighborhood graph to aggregate information from neighboring spots in spatial multi-omics data. The spatial neighborhood graph captures the similarity of neighboring spots using an adjacency matrix denoted as S, where Sijâ=â1 if spot j is an immediate spatial neighbor of spot i (and Siiâ=â1 for all i). Neighborhoods are defined based on direct spatial adjacency4,53. The spatial RNA+protein3 technique uses the 10à Visium platform, each spot has six neighbors, following a hexagonal grid layout53. For Spatial ATACâRNAâseq data, neighborhood size reflects pixel resolution: in the 50âμm human hippocampus dataset, each pixel has four adjacent neighbors (up/down/left/right), while in the 20âμm P22 mouse brain dataset, each pixel has eight neighbors in a 3âÃâ3 grid. The average number of neighbors per spot across datasets is visualized in Supplementary Note S14. The spatial neighborhood graph allows us to effectively represent the spatial relationships between spots and incorporate their similarity in the auto-encoder. The output from the first-level cross-modality encoder, \({\bar{{{\bf{X}}}}}^{T}\), is first transposed to \(\bar{{{\bf{X}}}}\) and then split to \({\bar{{{\bf{X}}}}}_{1}\) and \({\bar{{{\bf{X}}}}}_{2}\) according to the modality. \({\bar{{{\bf{X}}}}}_{1}\) and \({\bar{{{\bf{X}}}}}_{2}\) are the input for the second-level auto-encoders. For the k-th modality, we utilize an auto-encoder with the structure described below to capture the information from neighboring spots, where k â {1, 2}. Let \({\bar{{{\bf{x}}}}}_{ki}\) be the integrated feature vector for modality k of spot i, â i â {1, 2, ⯠, N}. Let L be the number of layers in the encoder. The output of the encoder in the l-th layer (l â {1, 2, ⯠, L â 1}) is calculated as follows:
where \({{{\bf{W}}}}_{k}^{l}\) is the trainable projection matrix in layer l of modality k, Ï2 is the ELU activation function, Si is the set of neighbors for spot i in the spatial neighborhood graph S (including spot i itself) and \({{{\rm{att}}}}_{ij}^{(l)}\) is the spatial attention score between spot i and spot j in the l-th graph attention layer. To obtain \({{{\rm{att}}}}_{ij}^{(l)}\), we first calculate the attention coefficient from spot i to neighboring spot j in the l-th encoder as follows:
where vself and vnei are trainable vectors. The attention score \({{{\rm{att}}}}_{ij}^{(l)}\) is then computed as softmax
Our proposed attention scores differ from the standard graph attention auto-encoder52 in the following two important aspects. Firstly, the calculation of cross-modality attention scores incorporates prior information on the connection of cross-modality features: it considers the prior knowledge that a peak is less likely to regulate a gene when the genomic distance is larger. Secondly, the trainable projection matrices in the cross-modality attention coefficients in equation (2) are fixed as identity matrices: we set \({{{\bf{v}}}}_{1}^{T}({{{\bf{X}}}}_{(f)}^{T})\) instead of \({{{\bf{v}}}}_{1}^{T}({{{\bf{W}}}}_{1}{{{\bf{X}}}}_{(f)}^{T})\). This decision is based on the intrinsic strong connection of the cross-modality features: regulatory regions/peaks regulate gene expression, and gene expression directly affects protein level. Performing an early projection could potentially disrupt this information, hence, we avoid including the projection matrices.
The last encoder layer (i.e., L-th layer) does not employ the attention layer, and its output is calculated as follows:
where the output \({\bar{{{\bf{x}}}}}_{ki}^{(L)}\) is the latent embedding learned for the k-th modality.
After the input passes through the cross-modality encoder and the within-modality encoder, we obtain the latent embeddings \({\bar{{{\bf{X}}}}}^{(L)}\). Next, these latent embeddings undergo reconstruction first through the within-modality decoder and then through the cross-modality decoder, where both decoders are symmetric to their corresponding encoders. More specifically, the within-modality decoder takes the latent embeddings \({\bar{{{\bf{X}}}}}^{(L)}\) as the input (i.e., \({\tilde{{{\bf{x}}}}}_{ki}^{(L)}={\bar{{{\bf{x}}}}}_{ki}^{(L)}\)), its output in the (l â 1)-th layer (l â {L, ⯠, 2}) is calculated as follows:
Symmetric to the within-modality encoder, the output in the outermost within-modality decoder layer is calculated as follows:
MultiGATE sets the parameters in the within-modality decoder to be the same as those in the within-modality encoder to avoid overfitting: \({\hat{{{\bf{W}}}}}_{k}^{l}={({{{\bf{W}}}}_{k}^{l})}^{T}\), for lâ=â1, ⯠â, L and \({\hat{{{\rm{att}}}}}_{ij}^{(l)}={{{\rm{att}}}}_{ij}^{(l)}\), for lâ=â1, ⯠, L â 1. We concatenate the output of the within-modality decoder for both modalities and denote it as \(\tilde{{{\bf{X}}}}\). Next, the cross-modality decoder takes \(\tilde{{{\bf{X}}}}\) as the input, and outputs the reconstructed data:
where \({\hat{{{\bf{X}}}}}_{(f)}^{T}\) is the reconstructed data for feature f. MultiGATE sets the cross-modality decoder to be symmetric to the cross-modality encoder to avoid overfitting: \({\hat{{{\rm{att}}}}}_{gf}={{{\rm{att}}}}_{gf}\) and \({\hat{{{\rm{att}}}}}_{ff}={{{\rm{att}}}}_{ff}\).
The overall architecture of MultiGATE
In all datasets, the cross-modality mechanism consists of one GAT (Graph Attention Network) layer with two learnable vectors to compute attention scores between cross-modality features. In the cross-modality attention mechanism, we integrate the information of each gene with its neighboring peaks (or proteins) according to attention scores, and each peak (or protein) with its neighboring genes. The cross-modality encoder then outputs two enhanced data matricesâone of size (PeaksâÃâSpots) and one of size (GenesâÃâSpots)âwhich integrate information from the other modality and are passed on to the within-modality attention step.
For the within-modality attention autoencoders, each encoder has two hidden layers (dimensions 512 and 30), and the decoder is symmetric to the encoder. A sensitivity analysis for a latent dimension of 30 is presented in Supplementary Note S15.
Loss function and training details
There are two components in the loss function. The first component is the reconstruction loss for the modalities.
The second component is the CLIP loss54, which encourages the latent embeddings for different modalities, \({\bar{{{\bf{X}}}}}_{1}^{(L)}\) and \({\bar{{{\bf{X}}}}}_{2}^{(L)}\) to be similar and further facilitates cross-modality integration, in addition to the cross-modality graph attention. The CLIP loss is calculated as follows:
where the function \(\it sim\) represents cosine similarity, \({{{\bf{f}}}}_{i}={{{\bf{M}}}}_{1}^{T}{\bar{{{\bf{x}}}}}_{1i}^{(L)}\), \({{{\bf{g}}}}_{i}={{{\bf{M}}}}_{2}^{T}{\bar{{{\bf{x}}}}}_{2i}^{(L)}\), M1 and M2 are trainable projection matrices for modality 1 and modality 2, respectively. The embeddings fi and gi, â i â {1, 2, ⯠â, N} serve as the final output latent embeddings generated by MultiGATE, which are subsequently utilized for downstream analysis. temp is the temperature parameter in CLIP loss, which is set to 1 by default.
MultiGATE employs an Adam optimizer to minimize the loss with an initial learning rate of 1e-4. The weight decay is set as 1e-4. There are two kinds of activation functions: Ï1 is ReLU, while Ï2 is ELU. The default number of iterations is set to 1000. For details on runtime performance and memory usage across datasets, please refer to Supplementary Notes S16 and S17.
MultiGATE attention-based cis-regulatory inference and validation
We use the following procedure to get peakâgene regulatory pairs from MultiGATEâs cross-modality attention scores and evaluate the peakâgene regulatory pairs:
-
1.
Cross-modality attention computation
In the cross-modality autoencoder, for each peak f and gene g, we first compute an unnormalized attention coefficient
$${e}_{fg}={{\rm{Sigmoid}}}\left({{{\bf{v}}}}_{1}^{T}{{{\bf{X}}}}_{(f)}^{T}+{{{\bf{v}}}}_{2}^{T}{{{\bf{X}}}}_{(g)}^{T}\right),$$where \({{{\bf{X}}}}_{(f)}^{T}\) and \({{{\bf{X}}}}_{(g)}^{T}\) are the observed data for peak f and gene g, and v1, v2 are learned vectors.
We then incorporate a genomic distance prior
$${A}_{fg}=\exp \left({\left(\frac{{{{\rm{dist}}}}_{fg}+{{\rm{bp}}}\_{{\rm{width}}}}{{{\rm{bp}}}\_{{\rm{width}}}}\right)}^{-0.75}\right)\quad ({{\rm{bp}}}\_{{\rm{width}}}=400\,{{\rm{bp}}}),$$and normalize efg to obtain the cross-modality attention score
$${{{\rm{att}}}}_{fg}=\frac{{A}_{fg}\,\exp ({e}_{fg})}{{\sum }_{h\in {{{\mathcal{N}}}}_{f}}{A}_{fh}\,\exp ({e}_{fh})},$$where \({{{\mathcal{N}}}}_{f}\) is the set of features connected to peak f in the feature connectivity graph.
-
2.
Rescaling and thresholding
First, we pooled all inter-feature attention scores (i.e., excluding self-attention) and performed a linear transformation to map them onto the interval [0, 1]. The resulting distribution was bimodal (Supplementary Note S18). We fitted a two-component Gaussian mixture model to these normalized scores and defined our threshold θ as the intersection point of the two Gaussian densities-i.e., the value at which the posterior probability of belonging to either component is equal (Supplementary Note S18). This data-driven cutoff separates low, noise-driven attention scores from high, biologically meaningful ones. Empirically, we obtained
$${\theta }_{{{\rm{Human}}}\,{{\rm{hippocampus}}}}=0.204,\quad {\theta }_{{{\rm{Mouse}}}\,{{\rm{P}}}22}=0.141.$$These thresholds were then used to call peakâgene links.
To validate the peakâgene regulatory pairs identified in the human hippocampus data, we used the human hippocampus eQTL17 data. To validate the peakâgene regulatory pairs identified in the mouse brain data, we used the EnhancerAtlas28 and EGAS29 data obtained from the mouse cortex and striatum: The EnhancerAtlas obtains regulatory information by predicting consensus enhancers based on high-throughput datasets, including histone modification, CAGE, GRO-seq, transcription factor binding, and DHS; The EGAS dataset utilizes single-cell sequencing data and a non-parametric permutation-based procedure to predict the genome-wide enhancer-gene associations. However, mouse eQTL data was not employed due to the absence of the corresponding tissues of interest. During validation, we restricted our analysis to genes present in these external datasets and evaluated whether their linked peaks were supported by independent evidence.
A peakâgene pair is considered to be supported by EnhancerAtlas or EGAS if the peak overlaps with a genomic region that regulates the corresponding gene. A peakâgene pair is considered to be supported by eQTL if there is an eQTL locus within the peak and the locus is associated with the corresponding gene.
Data description
We utilize MultiGATE to analyze diverse types of spatial multi-omics data, including spatial ATACâRNAâseq, spatial snRNA-ATAC-seq, spatial protein-RNA-seq, and spatial RNAâ+âmetabolomics. In the context of spatial ATACâRNAâseq, Spatial Epigenome-Transcriptome Co-profiling1 generates data from the juvenile mouse brain as well as the adult human brain. This technique allows for the simultaneous profiling of epigenetic and transcriptomic characteristics, providing insights into the regulatory relationships between the genome and gene expression within a spatial context. For spatial snRNA-ATAC-seq, Slide-tags2 generates multi-omics measurements capturing open chromatin and RNA within the same cells derived from metastatic melanoma. This approach enables the investigation of both chromatin accessibility and gene expression profiles within individual cells, facilitating a deeper understanding of the molecular characteristics of melanoma cells. In the case of spatial protein-RNA-seq, SPOTS3 generates high-throughput data that enables simultaneous profiling of spatial transcriptomics and protein expression within the mouse spleen. By integrating transcriptomic and proteomic information, SPOTS offers a comprehensive view of the spatially resolved molecular landscape of the spleen, shedding light on the interplay between gene expression and protein abundance. For spatial RNA and metabolomics data, the SMA55 protocol enabled the generation of mouse brain spatial RNAâ+âmetabolomics profiles by integrating spatial transcriptomics (SRT) and MALDI-MSI on the same tissue section. For the breast cancer spatial RNAâ+âprotein dataset, the Visium CytAssist Spatial Gene and Protein Expression assay56 was used to simultaneously profile gene and protein expression with spatial resolution in formalin-fixed, paraffin-embedded tissue samples, generating the human breast cancer dataset.
Data preprocessing
For the scRNA-seq or snRNA-seq data, we first log-transform and normalize the raw gene expression by library size using Scanpy57. Next, we select the top 3000 highly variable genes as input. For the scATAC-seq or snATAC-seq data, we first select the top 50,000 highly variable peaks and then run TF-IDF on the chromatin accessibility matrix. Next, we log-transform and normalize the processed chromatin accessibility matrix by sequencing depth. For the protein data, we apply CLR (centered log-ratio transformation) to the protein matrix using muon58.
Clustering
We employ two clustering strategies for the latent embeddings obtained from MultiGATE. The first strategy involves averaging the latent embeddings for each spot and subsequently applying Louvain clustering59, as utilized in Slide-tags tumor dataset or mclust60, to the averaged embeddings in SPOTS spleen. The second strategy involves employing Seurat WNN (FindMultiModalNeighbors function)7 to construct a weighted graph based on the latent embeddings, and then we apply Louvain clustering using this graph in the mouse brain as well as the adult human brain from spatial ATACâRNAâseq analyses. In general, both clustering methods yield similar results, with the WNN approach providing more refined details. A sensitivity analysis of the Louvain clustering resolution is presented in Supplementary Note S19.
Identifying differentially expressed genes
Wilcoxon test is employed to discover DEGs for each identified spatial domain (BenjaminiâHochberg adjustment).
Gene Ontology enrichment analysis
For the Spatial ATACâRNAâseq dataset1 generated from the adult human brain hippocampus, we conducted the gene set enrichment analysis implemented in the GSEAPY package61 to discover the enriched GO terms for spatially variable genes in the detected domain with adjusted P valueâ <â0.01.
Parameter settings for comparison methods
For all comparing methods, we used default hyperparameters as specified in their official documentation: SpatialGlue (https://spatialglue-tutorials.readthedocs.io/en/latest/), Seurat WNN (https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis). Full parameter settings are detailed in Supplementary Note S20.
Quantitative metrics calculation
We employed multiple clustering evaluation metrics to quantitatively evaluate the performance of spatial clustering results across different methods.
Datasets with ground truth labels
For the adult human hippocampus dataset (spatial ATACâRNAâseq), the mouse brain dataset (spatial transcriptomicsâ+âmetabolomics), the breast cancer-patterned spatial ATACâ+âRNA dataset (Supplementary Fig. 9), which have ground truth spatial domain annotations, we computed the following metrics to assess clustering agreement: Rand Index (RI), ARI, Adjusted Mutual Information, Normalized Mutual Information, Homogeneity, Completeness, V-measure, Fowlkes-Mallows Index.
Datasets without ground truth labels
For the P22 mouse brain dataset (spatial ATACâRNAâseq), the mouse spleen dataset (spatial RNAâ+âprotein), and human breast cancer (spatial RNAâ+âprotein), where no ground truth labels are available, we adopted the ICC27 to evaluate clustering quality (Supplementary Note S21).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All datasets analyzed in this paper have already been published and are available on public websites. All data utilized in the paper are accessible for open download. The Spatial epigenome-transcriptome, including the human hippocampus and mouse brain1 is provided by the UCSC cell browser (https://brain-spatial-omics.cells.ucsc.edu/). The Slide-tags dataset is collected from the Single Cell Portal (https://singlecell.broadinstitute.org/single_cell/study/SCP2176). The mouse spleen data (spatial RNAâ+âProtein) generated by SPOTS can be found at (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE198353). The mouse brain spatial RNAâ+âmetabolomics dataset can be found at (https://data.mendeley.com/datasets/w7nw4km7xd/1). The human breast cancer data (spatial RNAâ+âProtein) can be found at (https://www.10xgenomics.com/datasets/gene-and-protein-expression-library-of-human-breast-cancer-cytassist-ffpe-2-standard). The processed datasets used in this study have been deposited to Figshare and can be accessed at: https://doi.org/10.6084/m9.figshare.27978765. Source data are provided with this paper.
Code availability
The code used to develop the model, perform the analyses, and generate the results of this study has been deposited in the GitHub repository cuhklinlab/MultiGATE (https://github.com/cuhklinlab/MultiGATE) under the Apache License 2.0. The specific version of the code associated with this publication (v0.1.0) has been archived on Zenodo and is accessible via 10.5281/zenodo.1631079262.
References
Zhang, D. et al. Spatial epigenomeâtranscriptome co-profiling of mammalian tissues. Nature 616, 113â122 (2023).
Russell, A. J. C. et al. Slide-tags enables single-nucleus barcoding for multimodal spatial genomics. Nature 625, 101â109 (2024).
Ben-Chetrit, N. et al. Integration of whole transcriptome spatial profiling with protein markers. Nat. Biotechnol. 41, 788â793 (2023).
Hu, J. et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods 18, 1342â1351 (2021).
Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739 (2022).
Shang, L. & Zhou, X. Spatially aware dimension reduction for spatial transcriptomics. Nat. Commun. 13, 7203 (2022).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573â3587 (2021).
Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 21, 1â17 (2020).
Gayoso, A. et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods 18, 272â282 (2021).
Ashuach, T. et al. MultiVI: deep generative model for the integration of multimodal data. Nat. Methods 20, 1222â1231 (2023).
Ma, Y., Sun, Z., Zeng, P., Zhang, W. & Lin, Z. JSNMF enables effective and accurate integrative analysis of single-cell multiomics data. Brief. Bioinform. 23, bbac105 (2022).
Zhang, W. & Lin, Z. iPoLNGâAn unsupervised model for the integrative analysis of single-cell multiomics data. Front. Genet. 14, 998504 (2023).
Long, Y. et al. Deciphering spatial domains from spatial multi-omics with spatialglue. Nat. Methods 21, 1658â1667 (2024).
Amaral, D. G., Scharfman, H. E. & Lavenex, P. The dentate gyrus: fundamental neuroanatomical organization (dentate gyrus for dummies). Prog. Brain Res. 163, 3â790 (2007).
Boldrini, M. et al. Antidepressants increase neural progenitor cells in the human hippocampus. Neuropsychopharmacology 34, 2376â2389 (2009).
Cappaert, N. L., Van Strien, N. M. & Witter, M. P. Chapter 20: hippocampal formation. In The Rat Nervous System (Fourth Edition) (ed Paxinos, G.) 511â573 (Academic Press, San Diego, 2015).
Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204â213 (2017).
Gilad, Y., Rifkin, S. A. & Pritchard, J. K. Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet. 24, 408â415 (2008).
Pliner, H. A. et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol. Cell 71, 858â871.e8 (2018).
Ciccone, L., Cerri, C., Nencetti, S. & Orlandini, E. Carbonic anhydrase inhibitors and epilepsy: State of the art and future perspectives. Molecules 26, 6380 (2021).
Lemon, N., Canepa, E., Ilies, M. A. & Fossati, S. Carbonic anhydrases as potential targets against neurovascular unit dysfunction in Alzheimerâs disease and stroke. Front. Aging Neurosci. 13, 772278 (2021).
Esteban-Martos, A. et al. A functional pipeline of genome-wide association data leads to midostaurin as a repurposed drug for Alzheimerâs disease. Int. J. Mol. Sci. 24, 12079 (2023).
Jorfi, M., Maaser-Hecker, A. & Tanzi, R. E. The neuroimmune axis of alzheimerâs disease. Genome Med. 15, 6 (2023).
Karpf, J. et al. Dentate gyrus astrocytes exhibit layer-specific molecular, morphological and physiological features. Nat. Neurosci. 25, 1626â1638 (2022).
Inoue, K. PLP1-related inherited dysmyelinating disorders: Pelizaeus-Merzbacher disease and spastic paraplegia type 2. Neurogenetics 6, 1â16 (2005).
Koo, T. K. & Li, M. Y. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J. Chiropr. Med. 15, 155â163 (2016).
Coleman, K. et al. Resolving tissue complexity by multimodal spatial omics modeling with MISO. Nat. Methods 22, 530â538 (2025).
Gao, T. & Qian, J. EnhancerAtlas 2.0: an updated resource with enhancer annotation in 586 tissue/cell types across nine species. Nucleic Acids Res. 48, D58âD64 (2020).
Xie, F. et al. Robust enhancer-gene regulation identified by single-cell transcriptomes and epigenomes. Cell Genom. 3, 100342 (2023).
Zhang, L.-Y. et al. Effects of expression level of DNA repair-related genes involved in the NHEJ pathway on radiation-induced cognitive impairment. J. Radiat. Res. 54, 235â242 (2013).
Siuciak, J. A. et al. Genetic deletion of the striatum-enriched phosphodiesterase PDE10A: evidence for altered striatal function. Neuropharmacology 51, 374â385 (2006).
Bronstein, R., Kyle, J., Abraham, A. B. & Tsirka, S. E. Neurogenic to gliogenic fate transition perturbed by loss of HMGB2. Front. Mol. Neurosci. 10, 153 (2017).
Cebrian-Silla, A. et al. Single-cell analysis of the ventricular-subventricular zone reveals signatures of dorsal and ventral adult neurogenesis. eLife 10, e67436 (2021).
Watson, C. & Puelles, L. Developmental gene expression in the mouse clarifies the organization of the claustrum and related endopiriform nuclei. J. Comp. Neurol. 525, 1499â1508 (2017).
Mukherjee, C. et al. Oligodendrocytes provide antioxidant defense function for neurons by secreting ferritin heavy chain. Cell Metab. 32, 259â272 (2020).
Wu, Z. et al. CD3+ CD4-CD8-(double-negative) T cells in inflammation, immune disorders and cancer. Front. Immunol. 13, 816005 (2022).
Seung, E. et al. A trispecific antibody targeting HER2 and T cells inhibits breast cancer growth via CD4 cells. Nature 603, 328â334 (2022).
Sewald, X. et al. Retroviruses use CD169-mediated trans-infection of permissive lymphocytes to establish infection. Science 350, 563â567 (2015).
Zhang, X. et al. The Hippo pathway oncoprotein YAP promotes melanoma cell invasion and spontaneous metastasis. Oncogene 39, 5267â5281 (2020).
Savorani, C. et al. A dual role of YAP in driving TGFβ-mediated endothelial-to-mesenchymal transition. J. Cell Sci. 134, jcs251371 (2021).
Fukunaga-Kalabis, M. et al. Tenascin-C promotes melanoma progression by maintaining the ABCB5-positive side population. Oncogene 29, 6115â6124 (2010).
Phan, H. T., Kim, N. H., Wei, W., Tall, G. G. & Smrcka, A. V. Uveal melanomaâassociated mutations in PLCβ4 are constitutively activating and promote melanocyte proliferation and tumorigenesis. Sci. Signal. 14, eabj4243 (2021).
Zeng, P. et al. CPEB2 enhances cell growth and angiogenesis by upregulating ARPC5 mRNA stability in multiple myeloma. J. Orthop. Surg. Res. 18, 384 (2023).
Pak, B. J. et al. Radiation resistance of human melanoma analysed by retroviral insertional mutagenesis reveals a possible role for dopachrome tautomerase. Oncogene 23, 30â38 (2004).
More, S. et al. Secreted Apoe rewires melanoma cell state vulnerability to ferroptosis. Sci. Adv. 10, eadp6164 (2024).
Rodig, S. J. et al. MHC proteins confer differential sensitivity to CTLA-4 and PD-1 blockade in untreated metastatic melanoma. Sci. Transl. Med. 10, eaar3342 (2018).
Johnson, D. B. et al. Melanoma-specific MHC-II expression represents a tumour-autonomous phenotype and predicts response to anti-PD-1/PD-L1 therapy. Nat. Commun. 7, 10582 (2016).
Chen, H., Li, D. & Bar-Joseph, Z. SCS: cell segmentation for high-resolution spatial transcriptomics. Nat. Methods 20, 1237â1243 (2023).
Pang, M., Roy, T. K., Wu, X. & Tan, K. CelloType: a unified model for segmentation and classification of tissue images. Nat. Methods 22, 348â357 (2025).
He, B. et al. Integrating spatial gene expression and breast tumour morphology via deep learning. Nat. Biomed. Eng. 4, 827â834 (2020).
Xie, R. et al. Spatially resolved gene expression prediction from histology images via bi-modal contrastive learning. Adv. Neural Inf. Process. Syst. 36, 70626â70637 (2023).
Salehi, A. & Davulcu, H. Graph attention auto-encoders. In 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), (eds Alamaniotis, M. & Pan, S.) 989â996. (IEEE Computer Society, 2020).
Zhao, E. et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat. Biotechnol. 39, 1375â1384 (2021).
Radford, A. et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 8748â8763 (PMLR, 2021).
Vicari, M. et al. Spatial multimodal analysis of transcriptomes and metabolomes in tissues. Nat. Biotechnol. 42, 1046â1050 (2024).
10x Genomics. Visium CytAssist gene and protein expression library of human breast cancer, IF, 6.5 mm (FFPE) dataset. 10x Genomics [https://www.10xgenomics.com/resources/datasets/gene-and-protein-expression-library-of-human-breast-cancer-cytassist-ffpe-2-standard] (2023).
Wolf, F. A., Angerer, P. & Theis, F. J. Scanpy: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Bredikhin, D., Kats, I. & Stegle, O. MUON: multimodal omics analysis framework. Genome Biol. 23, 42 (2022).
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
Scrucca, L., Fop, M., Murphy, T. B. & Raftery, A. E. mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J. 8, 289 (2016).
Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics 39, btac757 (2023).
Miao, J. & Li, J. cuhklinlab/MultiGATE: v0.1.0. Zenodo. https://doi.org/10.5281/zenodo.16310792 (2025).
Acknowledgements
We thank Prof. Hongyu Zhao from Yale University for his valuable and constructive suggestions on the manuscript. This work is supported by the Chinese University of Hong Kong startup grant (4930181 to Z.L.), the Chinese University of Hong Kong Science Facultyâs Collaborative Research Impact Matching Scheme (CRIMS 4620033 to Z.L.), the Chinese University of Hong Kong direct grants (4053540 to Z.L., 4053586 to Z.L.), and Research Grants Council, University Grants Committee (GRF 14301120 to Z.L., GRF 14300923 to Z.L.). This work is supported in part by the Innovation and Technology Commission (ITCPD/17-9 to C.Y.); Hong Kong Research Grants Council Grants (16301419 to C.Y., 16308120 to C.Y., 16307221 to C.Y., 16307322 to C.Y., 16302823 to C.Y., 16309424 to C.Y.); The Hong Kong University of Science and Technology Startup Grants (R9405 to C.Y.) and the Big Data Institute (Z0428 to C.Y.). This work is supported in part by the National Key Research and Development Project of China (Grant No. 2023YFF1204802 to Y.Z.), National Science and Technology Innovation 2030 Major Program (Grant No. 2021ZD0200100 to Y.Z.). This work is supported in part by the Open Research Fund of Key Laboratory of Mathematical Sciences (Central China Normal University), P. R. China (MPL2025ORG013 to J.T.).
Author information
Authors and Affiliations
Contributions
Z.L. and C.Y. supervised this study. J.L., J.M., and Z.L. proposed and developed MultiGATEâs computational method. J.M., J.L., and J.X. conducted data analysis. C.Y., Y.Z., J.T., M.G., J.Q., and X.Z. provided advice and guided data analysis. J.M., J.X., and J.L. developed and released MultiGATE software package. J.M., J.L., J.X., Y.Z., C.Y., and Z.L. drafted the manuscript. J.M., C.Y., Y.Z., J.T., M.G., J.Q., and X.Z. revised the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Yuzhou Chang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Miao, J., Li, J., Xin, J. et al. MultiGATE: integrative analysis and regulatory inference in spatial multi-omics data via graph representation learning. Nat Commun 16, 9403 (2025). https://doi.org/10.1038/s41467-025-63418-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-63418-x



