Official Repository for Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment
PathSearch is an accurate and scalable system for multimodal pathology retrieval, featuring an attentive mosaic mechanism to boost slide-to-slide retrieval accuracy, while leveraging slide-report alignment to further improve semantic understanding of the slide and enable multimodal retrieval support.
PathSearch demonstrates higher slide-to-slide retrieval accuracy and faster slide encoding & matching speed than existing frameworks, making it suitable for real-world clinical applications.
⚠️ Note: The code has been verified for training and inference. If you still find certain files missing, please raise an issue for it. We will continue to ensure that the code behaves the same as in our experiments.
To preprocess WSIs in a unified style, EasyMIL Toolbox is highly recommended.
To process .kfb, .sdpc format slides in Python, please use the ASlide library.
You will need the following libraries to reproduce or deploy PathSearch (tested on Python 3.9.19):
- torch 2.4.0
- timm 0.9.8 (switch to the modified version 0.5.4 for CTransPath/CHIEF, provided in EasyMIL)
- einops 0.8.0
- numpy 1.25.1
- scipy 1.13.1
- scikit-learn 1.6.1
- pandas
The complete experimental environment will be included in the requirements.txt file. However, not all libraries listed there are required by PathSearch. The installation time varies between different devices but normally would not takes more than 15 minutes.
You can download the TCGA data and corresponding labels from the NIH Genomic Data Commons, of which the detailed list is provided in PathSearch/dataset/TCGA_file_list.txt.
The Camelyon16 and Camelyon17 datasets are available on the Grand Challenge and Camelyon17 platforms.
The DHMC-LUAD dataset can be obtained from the Department of Pathology and Laboratory Medicine at Dartmouth–Hitchcock Medical Center via registration and request (link). You can also prepare your own datasets as long as you have the whole slide images available.
You may continuously add different types of samples to your search archive, building your own diagnostic library.
⚠️ Note: You will need to use EasyMIL for tiling and feature extraction of these slides. Please visit EasyMIL's official page for more information about its usage. Kindly note that there is already a demo dataset provided in this repo for some quick tests.
Clone the repository by running:
git clone [email protected]:Dootmaan/PathSearch.git
Then navigate into the project directory:
cd PathSearch
We provide a demo dataset containing 30 TCGA slides for quick testing and verification. The demo dataset is located in demo_dataset/ and includes pre-extracted CONCH v1.5 features in .pt format. The demo right now outputs the index of candidate WSIs and does not include the thumbnail visualization of the retrieved samples.
Run the demo retrieval:
# Run on CPU (default)
bash shells/test_demo.shThis will output retrieval results to demo_retrieval_results.csv. The demo has been verified and the demo_retrieval_results.csv has already been generated in the directory, which can be used for reproducibility verification.
Generally speaking, you can directly use the released weights for the attentive mosaic generator and the report encoder in the PathSearch framework.
These weights can be found on Zenodo.
To train PathSearch with the TCGA data pairs, simply run:
bash shell/train_pathsearch.shto train the model from scratch with the default hyperparameters.
This repository provides four ready-to-run scripts for the four public datasets used in the study, three of which are external. Simply run:
bash shell/test.sh
to test the model on these datasets. Be sure to specify the path to your archive.
Note: During testing, cache file will be automatically generated to boost future use. You may need to refresh these cache files manually after making modifications to the pipeline.
We used CONCH for generating patch-level embeddings via EasyMIL. We have partially borrowed code from CLIP and TransMIL to construct PathSearch; therefore, PathSearch will also follow the GPL v3 LICENSE upon publication.
We sincerely thank these teams for their dedicated efforts in advancing this field. We also would like to thank the authors from the PathologySearchComparison project for the PyTorch reproduction of existing methods.
If you find this work helpful in your research, please consider citing:
@misc{wang2025accuratescalablemultimodalpathology,
title={Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment},
author={Hongyi Wang and Zhengjie Zhu and Jiabo Ma and Fang Wang and Yue Shi and Bo Luo and Jili Wang and Qiuyu Cai and Xiuming Zhang and Yen-Wei Chen and Lanfen Lin and Hao Chen},
year={2025},
eprint={2510.23224},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.23224},
}