Skip to content

ELM-Research/ECG-Language-Models

Repository files navigation

A Training and Evaluation Framework for ECG-Language Models (ELMs)

Our pipeline.

News

  • [February 23, 2026] We have officially moved from willxxy/ECG-Bench to ELM-Researc/ELM. There are major updates to the documentation and flow of the code. Please read the documentation and feel free to post any issues!

Overview

A research framework for finetuning and evaluating ECG-language models (ELMs). Supports multiple architectures, training objectives, and data representations with distributed training out of the box. Prepare datasets with ecg_preprocess before use. Additionally, if you want to pretrain an ECG encoder, please view ecg_nn.

We hope to continuously update the repository to support more features, ELMs, and datasets. Please feel free to contribute to the repository! If there are any questions or bugs, please do not hesitate to reach out to wjhan{@}andrew{dot}cmu{edu} or submit an issue with corresponding details.

Status: Beta.

Setup

We use torch 2.9 with cuda 12.8 and primarily use H100 GPUs.

git clone https://github.com/ELM-Research/ELM.git
cd ELM && uv sync

For BPE symbolic representation with ECG-Byte, compile the Rust tokenizer:

cd src/dataloaders/data_representation/bpe
maturin develop --release

If Rust is not installed: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- --default-toolchain=1.82.0 -y

ECG Datasets

First, preprocess the ECGs using the ecg_preprocess repository. The structure in which the data folder should be in is the following:

data
├── csn
│   ├── preprocessed_1250
│   ├── preprocessed_500
│   └── preprocessed_2500
├── cpsc
│   └── ...
├── ptb_xl
│   └── ...
├── mimic_iv
│   └── ...
└── code15
    └── ...

We support the following datasets in a unified way through datasets from HuggingFace. These datasets will include the ecg_path which is the path to the .npy files in the data folder. It will also include the conversational data (text).

--data Link
ecg-qa-ptbxl-250-2500 willxxy/ecg-qa-ptbxl-250-2500
ecg-qa-mimic-iv-ecg-250-2500 willxxy/ecg-qa-mimic-iv-ecg-250-2500
pretrain-mimic-250-2500 willxxy/pretrain-mimic-250-2500
ecg-grounding-250-2500 willxxy/ecg-grounding-250-2500
ecg-instruct-pulse-250-2500 willxxy/ecg-instruct-pulse-250-2500
ecg-bench-pulse-250-2500 willxxy/ecg-bench-pulse-250-2500
ecg-instruct-45k-250-2500 willxxy/ecg-instruct-45k-250-2500

Note that we support mixing different datasets via specifying multiple datas like so:

--data ecg-qa-ptbxl-250-2500 ecg-qa-mimic-iv-ecg-250-2500

We also released synthetic classification datasets on Hugging Face for signal-type identification tasks, where the model predicts whether the input signal is ECG, noise, or flatline. Dataset names follow this format: ecg-comp-ecg-noise-flatline-20000-250-2500. In this example, the dataset contains 20,000 instances per class (ECG, noise, and flatline) in total across training and test splits. We also provide binary classification variants, such as ecg-comp-noise-flatline-30000-250-2500. This indicates a binary task with noise and flatline classes, with 30,000 instances per class across the train and test splits. For additional datasets and task details, see HF_DATASETS of src/configs/constants.py and src/dataloaders/system_prompts/.

ECG Representations

--data_representation Description
signal Raw ECG matrix $X \in \mathbb{R}^{C \times L}$ (leads × samples)
symbolic BPE-tokenized symbolic sequence $X \in V^m$ via ECG-Byte compression
stacked_signal Synthetic three-channel version of signal, denoted $X \in \mathbb{R}^{C \times L \times 3}$, by stacking signal three times along the color dimension
rgb Derived from signal via plotting and is represented as a tensor $X \in \mathbb{R}^{H \times W \times C′}$, where H and W denote the image height and width, respectively, and C′ is the number of color channels

LLMs

We utilize the following pretrained LLMs from HuggingFace.

LLM --llm
Llama 3 llama-3.2-3b-instruct
Llama 3 llama-3.2-1b-instruct
Gemma 2 gemma-2-2b-it
Qwen 2.5 qwen2.5-7b-instruct
Qwen 2.5 qwen2.5-1.5b-instruct

Encoders

ECG Encoders

We utilize the following ECG-specific encoders.

ECG Encoders --encoder --data_representation
MERL merl signal
MLAE mlae signal
MTAE mtae signal
ST-Mem st_mem signal

Vision Encoders

We utilize the following pretrained vision encoders from HuggingFace.

Vision Encoders --encoder --data_representation
Siglip2 siglip2-so400m-patch16-naflex rgb, stacked_signal
ViT vit-base-patch16-224-in21k rgb, stacked_signal
CLIP clip-vit-base-patch32 rgb, stacked_signal

ELMs

We implement several ELMs and describe how to train each variant.

Llava

We implement a Llava-like architecture where we connect the encoder to the LLM with a projection layer.

uv run src/main_trainer.py \
  --data pretrain-mimic-250-2500 \
  --data_representation $DATA_REPRESENTATION \
  --llm qwen2.5-1.5b-instruct \
  --encoder $ECG_ENCODER or $VISION_ENCODER \
  --elm llava

For multi-gpu training, launch the same script like so. This is general to all ELMs.

CUDA_VISIBLE_DEVICES=0,1,2,3 \
uv run torchrun --standalone --nproc_per_node=4 \
  src/main_trainer.py \
  --data pretrain-mimic-250-2500 \
  --data_representation $DATA_REPRESENTATION \
  --llm qwen2.5-1.5b-instruct \
  --encoder $ECG_ENCODER or $VISION_ENCODER \
  --elm llava \
  --distributed

For ECG Encoders, you will have to pretrain your own ECG Encoder using ecg_nn. We plan to release pretrained encoders soon! To load in the pretrained encoder during ELM training run the following:

uv run src/main_trainer.py \
  --data pretrain-mimic-250-2500 \
  --data_representation signal \
  --llm qwen2.5-1.5b-instruct \
  --encoder $ECG_ENCODER \
  --elm llava \
  --encoder_ckpt $ENCODER_CHECKPOINT.pt

To update the encoder during ELM training, specify like so:

uv run src/main_trainer.py \
  --data pretrain-mimic-250-2500 \
  --data_representation $DATA_REPRESENTATION \
  --llm qwen2.5-1.5b-instruct \
  --encoder $ECG_ENCODER or $VISION_ENCODER \
  --elm llava \
  --update_encoder

Encoder-free

We implement an encoder-free ELM, similar to Fuyu-8b.

uv run src/main_trainer.py \
  --data pretrain-mimic-250-2500 \
  --data_representation signal \
  --llm qwen2.5-1.5b-instruct \
  --elm fuyu

ECG-Byte

We implement ECG-Byte and provide a trained BPE tokenizer (src/dataloaders/data_representation/bpe/ecg_byte_tokenizer_10000.pkl). Note that you can also train your own BPE tokenizer in ecg_preprocess, however we find ECG-Byte to be generalizable across different datasets. To train an ELM with ECG-Byte run the following:

uv run src/main_trainer.py \
  --data pretrain-mimic-250-2500 \
  --data_representation symbolic \
  --llm qwen2.5-1.5b-instruct \
  --ecg_tokenizer src/dataloaders/data_representation/bpe/ecg_byte_tokenizer_10000.pkl \
  --elm ecg_byte

Evaluate

To evaluate your model, just execute the main_evaluator.py file while specifying your trained ELM checkpoint via --elm_ckpt:

uv run src/main_evaluator.py \
  --data ecg-qa-mimic-iv-ecg-250-2500 \
  --data_representation signal \
  --llm qwen2.5-1.5b-instruct \
  --encoder merl \
  --elm llava \
  --encoder_ckpt $ENCODER_CHECKPOINT.pt \
  --elm_ckpt $PATH_TO_ELM_CKPT.pt

Chat

To chat with your model, please have a sample *.npy file and a trained ELM checkpoint. Then run the following:

CUDA_VISIBLE_DEVICES=0 uv run src/main_chat.py \
--llm qwen2.5-0.5b-instruct \
--elm patch_elf \
--system_prompt src/dataloaders/system_prompts/system_prompt.txt \
--peft \
--elm_ckpt $ELM_CHECKPOINT.pt \
--num_encoder_tokens 100 \
--data_representation signal

After running the script, please load in the ECG by typing the following in the first turn:

============================================================
  ELM Chat Interface
============================================================

Commands:
  /ecg <path>   Load an ECG signal (.npy file)
  /clear        Clear conversation history
  /quit         Exit

You: /ecg $PATH_TO_SAMPLE.npy

After this turn, you can ask any question for N turns and all answers after will be conditioned on this loaded ECG. We do not currently support adding additional ECGs into one conversation.

Key Flags

Flag Description
--torch_compile torch.compile the model
--data_subset Use dataset fraction for quick runs
--augment_ecg / --augment_rgb Enable augmentations
--perturb noise, zeros, or only_text
--optimizer adam, adamw, muon

Research

We list the research that has been conducted using this repository. Please feel free to add your own research here!

Contributions

We welcome contributions to the repository! Please feel free to open an issue or pull request for any bugs or features you would like to add. We are always looking for new ECG datasets to benchmark our methods on. If you have any recommendations, please let us know! Also, a good place to start is by looking at the TODO section.

For most processes, we have a --dev flag to run in a smaller scale and add some verbosity for debugging. Feel free to add this flag when needed!

Contributors

We thank the following people for their contributions to the repository:

Acknowledgements

This work is done in collaboration with the Mario Lemieux Center for Heart Rhythm Care at Allegheny General Hospital.

We thank Chaojing Duan, Michael A. Rosenberg, Emerson Liu, Ding Zhao, Hyoeun Kang, Wenhao Ding, Haohong Lin, Shiqi Liu, Xiaoyu (Simon) Song, Tony Chen, Atharva Mhaskar, Zhepeng Cen, Yihang Yao, and Dylan Leong for their helpful discussions, feedbacks, and support in developing the initial ECG-Bench which turned into the current ELM repository.

We thank the authors of ECG-Byte, MERL, ST-MEM, ECG-QA, ECG-Chat, PULSE, and GEM for their code and publicly released datasets.

Lastly, we thank HuggingFace for providing the APIs for the models.

License

MIT, except st_mem.py, mlae.py, mtae.py which are CC BY-NC 4.0.