This library offers comprehensive support for widely used WHAR (Wearable Human Activity Recognition) datasets, including:
- automated downloading from original sources and data extraction
- parsing into a unified, standardized data format
- configurable pre-processing (e.g., resampling, windowing) and post-processing (e.g., normalization)
- dataset splitting for common evaluation protocols such as LOSO and K-Fold cross-validation
- built-in caching and multi-processing for improved performance
- seamless integration with PyTorch and TensorFlow
The library currently includes out-of-the-box support for 33 datasets (listed below). Additional WHAR datasets can be easily integrated by defining a custom configuration with an associated parser and registering it with the framework.
This library does not host any datasets. To use a dataset, please visit its original website and make sure you understand and agree to the dataset’s terms and conditions.
pip install "git+https://github.com/teco-kit/whar-datasets.git"
from whar_datasets import (
Loader,
LOSOSplitter,
PostProcessingPipeline,
PreProcessingPipeline,
TorchAdapter,
WHARDatasetID,
get_dataset_cfg,
)
# create cfg for WISDM dataset
cfg = get_dataset_cfg(WHARDatasetID.WISDM)
# create and run pre-processing pipeline
pre_pipeline = PreProcessingPipeline(cfg)
activity_df, session_df, window_df = pre_pipeline.run()
# create LOSO splits
splitter = LOSOSplitter(cfg)
splits = splitter.get_splits(session_df, window_df)
split = splits[0]
# create and run post-processing pipeline for the specific split
post_pipeline = PostProcessingPipeline(cfg, pre_pipeline, window_df, split.train_indices)
samples = post_pipeline.run()
# create dataloaders for the specific split
loader = Loader(session_df, window_df, post_pipeline.samples_dir, samples)
adapter = TorchAdapter(cfg, loader, split)
dataloaders = adapter.get_dataloaders(batch_size=64)| Supported | Name | Year | Paper | Citations |
|---|---|---|---|---|
| ✅ | WISDM | 2010 | Activity Recognition using Cell Phone Accelerometers | 3862 |
| ✅ | UCI-HAR | 2013 | A Public Domain Dataset for Human Activity Recognition using Smartphones | 3372 |
| ✅ | UTD-MHAD | 2015 | UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor | 997 |
| ✅ | HAPT | 2016 | Transition-aware human activity recognition using smartphones. | 939 |
| ✅ | USC-HAD | 2012 | USC-HAD: A Daily Activity Dataset for Ubiquitous Activity Recognition Using Wearable Sensors | 753 |
| ✅ | UniMiB-SHAR | 2017 | Unimib shar: a dataset for human activity recognition using acceleration data from smartphones | 712 |
| ✅ | MotionSense | 2019 | Mobile Sensor Data Anonymization | 345 |
| ✅ | RealLifeHAR | 2020 | A Public Domain Dataset for Real-Life Human Activity Recognition Using Smartphone Sensors | 208 |
| ✅ | WISDM-19-PHONE | 2019 | WISDM: Smartphone and Smartwatch Activity and Biometrics Dataset | 198 |
| ✅ | WISDM-19-WATCH | 2019 | WISDM: Smartphone and Smartwatch Activity and Biometrics Dataset | 198 |
| ✅ | KU-HAR | 2021 | KU-HAR: An open dataset for heterogeneous human activity recognition | 187 |
| ✅ | Hang-Time | 2023 | Hang-time HAR: A benchmark dataset for basketball activity recognition using wrist-worn inertial sensors | 52 |
| ✅ | CAPTURE-24 | 2024 | CAPTURE-24: A large dataset of wrist-worn activity tracker data collected in the wild for human activity recognition | 45 |
| Supported | Name | Year | Paper | Citations |
|---|---|---|---|---|
| ✅ | PAMAP2 | 2012 | Introducing a New Benchmarked Dataset for Activity Monitoring | 1758 |
| ✅ | OPPORTUNITY | 2010 | Collecting complex activity datasets in highly rich networked sensor environments | 1024 |
| ✅ | HHAR | 2015 | Smart Devices are Different: Assessing and Mitigating Mobile Sensing Heterogeneities for Activity Recognition | 1019 |
| ✅ | MHEALTH | 2014 | mHealthDroid: A Novel Framework for Agile Development of Mobile Health Applications | 887 |
| ✅ | DSADS | 2010 | Comparative study on classifying human activities with miniature inertial and magnetic sensors | 780 |
| ✅ | SAD | 2014 | Fusion of Smartphone Motion Sensors for Physical Activity Recognition | 752 |
| ✅ | Daphnet | 2009 | Ambulatory monitoring of freezing of gait in Parkinson’s disease | 652 |
| ✅ | RealWorld | 2016 | On-body Localization of Wearable Devices: An Investigation of Position-Aware Activity Recognition | 482 |
| ✅ | UP-Fall | 2019 | UP-fall detection dataset: A multimodal approach | 462 |
| ✅ | UMAFall | 2017 | Umafall: A multisensor dataset for the research on automatic fall detection | 243 |
| ✅ | REALDISP | 2014 | Dealing with the Effects of Sensor Displacement in Wearable Activity Recognition | 216 |
| ✅ | HuGaDB | 2018 | HuGaDB: Human Gait Database for Activity Recognition from Wearable Inertial Sensor Networks | 154 |
| ✅ | HARTH | 2021 | HARTH: A Human Activity Recognition Dataset for Machine Learning | 132 |
| ✅ | w-HAR | 2020 | w-HAR: An Activity Recognition Dataset and Framework Using Low-Power Wearable Devices | 100 |
| ✅ | WEAR | 2024 | Wear: An outdoor sports dataset for wearable and egocentric activity recognition | 66 |
| ✅ | HAR70+ | 2021 | A machine learning classifier for detection of physical activity types and postures during free-living | 55 |
| ✅ | UCA-EHAR | 2022 | UCA-EHAR: A Dataset for Human Activity Recognition with Embedded AI on Smart Glasses | 35 |
| ✅ | GOTOV | 2022 | A recurrent neural network architecture to model physical activity energy expenditure in older people | 33 |
If you use the WHAR Datasets library in your research, please cite our paper:
@inproceedings{burzer2025whar,
title={WHAR Datasets: An Open Source Library for Wearable Human Activity Recognition},
author={Burzer, Maximilian and King, Tobias and Riedel, Till and Beigl, Michael and R{\"o}ddiger, Tobias},
booktitle={Companion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing},
pages={1315--1322},
year={2025}
}