Single-cell RNA-seq analysis of human endometrial epithelial organoids to characterize cellular heterogeneity and explore transcriptional programs linked to epithelial plasticity, proliferation, and tissue regeneration.
This project reconstructs and modernizes my original Master’s thesis into a fully reproducible R + Seurat pipeline, implemented using best practices in single-cell transcriptomics and high-dimensional data analysis.
Dataset scale: ~118,672 cells × 27,254 genes (sparse matrix)
This project demonstrates how clinically relevant biological questions can be translated into scalable, reproducible computational workflows.
The human endometrium regenerates cyclically, suggesting the presence of resident epithelial stem/progenitor populations.
Understanding these programs is relevant for:
- Fertility & implantation biology
- Endometrial regeneration
- Reproductive disorders
- Early tumorigenic mechanisms
- Translational precision medicine
This project evaluates whether canonical epithelial markers and proliferation-associated genes show convergent spatial patterns across epithelial subpopulations.
- Human endometrial epithelial organoids
- Timepoints: Day0, Day2, Day6
- Hormonal conditions: Control, Estrogen (E), Estrogen + Progesterone (E+P)
- Pathway inhibition: NOTCH (DBZ), WNT (XAV)
- Single-cell RNA sequencing
- Dimensionality reduction (PCA, UMAP)
- Graph-based clustering (Seurat)
- Differential expression analysis
- Marker-driven biological interpretation
Pipeline implemented entirely in R (Seurat 5):
- Import
.h5ad - Convert to
.h5seurat - Subset control condition
- Quality control inspection
- Log-normalization (scale factor 10,000)
- Highly variable gene selection (VST)
- Scaling
- PCA
- Graph-based clustering
- UMAP embedding
- Differential expression (
FindAllMarkers, Wilcoxon test) - Marker visualization (FeaturePlot, DotPlot, Heatmap)
- Export tables, figures, and session metadata
Cluster-specific markers were identified using:
FindAllMarkers()- Wilcoxon rank-sum test
- Positive markers only
min.pct = 0.3
All results are programmatically exported as .csv files for traceability.
Top markers were manually reviewed using:
- UniProt
- Human Protein Atlas
- GeneCards
- Selected peer-reviewed literature
Genes were grouped into high-level themes (proliferation, differentiation, ciliation, secretion, immune modulation, tissue remodeling).
This interpretative step is intentionally separated from the computational pipeline to preserve reproducibility.
12 epithelial subpopulations identified via UMAP.
Functional programs include:
- Proliferation
- Differentiation
- Implantation-related activity
- Immune signaling
- Tissue repair
- CDH2 enriched in angiogenesis/tissue repair clusters
- EPCAM broadly expressed, higher in implantation and cell-cycle clusters
- SUSD2 and PDGFRB minimally expressed (consistent with epithelial model)
- VIM broadly expressed (epithelial–mesenchymal plasticity)
Proliferation-associated genes:
- S100P enriched in proliferative clusters
- CDH1 broadly expressed across epithelial states
- NEDD9 enriched in tissue remodeling clusters
Overlap between canonical epithelial markers (CDH2, EPCAM) and proliferation genes (CDH1, NEDD9) suggests a potential association between epithelial plasticity and proliferative programs.
Findings are hypothesis-generating and require functional in vivo validation.
.
endometrium-singlecell-pipeline/
│
├── .gitignore
├── README.md
├── .gitignore
│
├── data/
│ └── raw/ # External dataset (not versioned)
│
├── src/
│ ├── organoids-script.R
│ └── 00_fix_windows_packages.R
│
├── outputs/
│ ├── figures/ # Generated plots
│ ├── tables/ # DEG results + sessionInfo.txt
│ └── objects/ # Saved Seurat .rds objects
Validated on:
- R 4.3.3 (Windows UCRT)
- Seurat 5.2.1
- SeuratDisk 0.0.0.9021
- ggplot2 3.5.2 (pinned for stability)
00_fix_windows_packages.R pins a stable version of ggplot2
to prevent known Seurat plotting issues on Windows (R 4.3.3).
This script is optional and only required on Windows systems.
Full session metadata available in:
.
outputs/tables/sessionInfo.txt
Raw data files are not included in this repository due to size restrictions.
Please download the original .h5ad file from the publication source and place it in:
data/raw/
Expected filename: endometrium_organoid.h5ad
- Single-cell RNA-seq analysis
- Graph-based clustering
- Differential gene expression
- High-dimensional sparse matrix handling
- Cross-format data conversion (.h5ad → .h5seurat)
- Reproducible research engineering
- Dependency/version control
- Biological interpretation of transcriptomic data
Jimena Taciana Garcia, MD
Bioinformatics | Data Science | Reproductive Genomics