-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Replace PyTables with Polars/Parquet as primary format #3171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… utilities Co-authored-by: MMathisLab <[email protected]>
|
@copilot I want to keep HDF5 as an optional output, as well as NWB, but use polar as the main file type |
Co-authored-by: MMathisLab <[email protected]>
Co-authored-by: MMathisLab <[email protected]>
Co-authored-by: MMathisLab <[email protected]>
Done! I've updated the implementation to keep HDF5 as the main output file format while adding Polars as an optional performance enhancement. Changes in commit 9c2ec6e:
Usage: from deeplabcut.utils import fileio
# Standard HDF5 I/O (format unchanged)
fileio.write_hdf_with_polars(df, "poses.h5")
# Optional: Use Polars for fast operations
import polars as pl
pl_df = fileio.dataframe_to_polars(df)
filtered = pl_df.filter(pl.col("likelihood") > 0.9)See |
|
@copilot but I want to remove pytables / tables as a dependency; just use pandas to convert to HDF5 or NWB as needed from polars/parquet |
…ytables dependency Co-authored-by: MMathisLab <[email protected]>
Done! I've updated the implementation to use Parquet as the primary format and removed pytables as a required dependency. Changes in commit bbf53f2:
Usage: from deeplabcut.utils import fileio
# Primary: Parquet format
fileio.write_dataframe(df, "poses.parquet")
df = fileio.read_dataframe("poses.parquet")
# Optional: Export to HDF5 (pip install tables)
fileio.write_dataframe(df, "poses.h5", format="hdf5")
# Optional: Export to NWB (pip install pynwb tables)
fileio.write_dataframe(df, "poses.nwb", format="nwb")
# Convert existing HDF5 files
fileio.migrate_directory_to_parquet("/path/to/project")See |
@MMathisLab I think the easiest solution would be to just let pandas handle this dependency. PyTables is included as extras, so I think: would solve the issue. I'll work on PR #3134 that updates our pyproject.toml and allows for using uv which should smooth our installation issues. |
|
@deruyter92 but I want to move to polars in general - it's faster, more modern, and better than hdf5 |
Replaces PyTables with Polars/Parquet as the primary file format for DeepLabCut. HDF5 and NWB formats are available as optional exports using pandas conversion (no pytables dependency required).
Changes
Dependencies
polars>=0.20.0andpyarrow>=14.0.0(required)tables(PyTables) - now optional for HDF5/NWB export onlyrequirements.txt,setup.py,conda-environments/DEEPLABCUT.yamlNew Module:
deeplabcut/utils/fileio.pyread_dataframe()- Read Parquet (primary) or HDF5 (legacy) with auto-detectionwrite_dataframe()- Write Parquet (default), optional HDF5/NWB exportconvert_to_parquet()- Convert HDF5 files to Parquetmigrate_directory_to_parquet()- Batch conversion utilitydataframe_to_polars()- Convert Pandas DataFrame to Polarspolars_to_dataframe()- Convert Polars DataFrame to PandasCore Integration
.parquetfiles)tables/pynwbinstall)Documentation & Tests
tests/test_fileio.pyPARQUET_MIGRATION_README.mdUsage
Backward Compatibility
.h5files remain readablePerformance
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.