SlothDB is an embedded SQL database that runs everywhere: on your laptop, on a server, and in the browser. Built from scratch as a DuckDB alternative. Up to 5x faster on real workloads (138 ms vs 540 ms on a 5-query warm JOIN batch; 5.43x peak on Avro SUM; 16-query suite median 1.70x). Built-in readers for Parquet, CSV, JSON, Avro, Arrow, Excel, and SQLite.
pip install slothdb
python -c "import slothdb; slothdb.demo()"Generates a 100 000-row CSV, runs three queries, and prints the side-by-side with DuckDB shown above. No files to find, no setup.
import slothdb
db = slothdb.connect()
df = db.sql("SELECT region, SUM(revenue) FROM 'sales.parquet' GROUP BY region").fetchdf()No server. No import step. No CREATE TABLE. Point SQL at files on disk.
- Nested aggregates work everywhere.
ROUND(AVG(x)),AVG(x) + 1,SUM(x) / COUNT(*),CAST(SUM(y) AS DOUBLE)and similar shapes that wrap an aggregate inside a scalar function or arithmetic used to throw "Function execution for: AVG". Fixed. ORDER BYby aggregate alias works.SELECT region, COUNT(*) AS cnt ... ORDER BY cnt DESCno longer silently sorts by column 0.- Arithmetic type promotion fixed.
AVG(x) + 1no longer drops the+1andAVG(x) / COUNT(*)no longer returnsinf. - 408 unit tests, 131,537 assertions, green on Windows, Linux, macOS.
Same embedded model as DuckDB and SQLite. You link it into your process and point SQL at files. Different defaults:
- 7 file formats built in - Parquet, CSV, JSON, Avro, Arrow, SQLite, Excel. DuckDB needs extensions for Avro and SQLite.
- Faster than DuckDB on real workloads. 5-query warm JOIN batch: 138 ms vs 540 ms (3.9x). Peak speedups: 5.43x on Avro SUM, 5.08x on CSV COUNT(*), 2.83x on Parquet COUNT(*). Median across the 16-query suite: 1.70x. Full numbers on GitHub.
- Stable C ABI. Numeric error codes don't shift between releases. Bindings built against 0.1.x keep working.
- ~1-4 MB single binary, fully self-contained.
import slothdb
# In-memory
db = slothdb.connect()
# Query files directly
db.sql("SELECT * FROM 'data.csv' WHERE score > 90").show()
db.sql("SELECT COUNT(*) FROM 'logs.parquet'").show()
db.sql("SELECT * FROM read_json('events.json') LIMIT 5").show()
db.sql("SELECT * FROM sqlite_scan('app.db', 'users')").show()
# Persistent database
db = slothdb.connect("analytics.slothdb")
# DataFrame integration
df = db.sql("SELECT region, SUM(revenue) FROM 'sales.csv' GROUP BY region").fetchdf()- No multi-writer transactions (single writer, crash-safe checkpoint).
- No distributed execution. Single-node embedded engine.
- No secondary indexes. Scan-based execution; zone-map pruning helps on sorted data, but no B-tree / hash index for point lookups.
- Window-function coverage is partial. Plain OVER / PARTITION BY works;
ROWS BETWEEN ...frames and cumulativeSUM OVER (ORDER BY)shapes have known gaps. - Authenticated S3 not implemented.
s3://URLs work for anonymous public-bucket reads only. - Some SQL corners still surprise you. Open an issue with a repro.
- 0.2.x, about a year old. Treat as beta.
| Format | Query | SlothDB | DuckDB | Speedup |
|---|---|---|---|---|
| Parquet | COUNT(*) |
12 ms | 34 ms | 2.83× |
| CSV | COUNT(*) |
33 ms | 170 ms | 5.08× |
| CSV | GROUP BY region |
100 ms | 191 ms | 1.91× |
| JSON | SUM(revenue) |
242 ms | 314 ms | 1.30× |
| Avro | SUM(revenue) |
140 ms | 760 ms | 5.43× |
| Avro | GROUP BY region |
170 ms | 800 ms | 4.71× |
1M-row dataset, warm cache, 5-run median. Full 15-query table + methodology →
There's a Discord: discord.gg/XJWyGmX5G. Bug reports, install help, weird query plans, "is this slower than it should be", feature ideas - any of it. The maintainer reads everything. GitHub issues are still the canonical tracker; the server is for the questions that come before you file one.
- Source: https://github.com/SouravRoy-ETL/slothdb
- Discord: https://discord.gg/XJWyGmX5G
- Changelog: https://github.com/SouravRoy-ETL/slothdb/blob/main/CHANGELOG.md
- Issues: https://github.com/SouravRoy-ETL/slothdb/issues
- SQL reference: https://github.com/SouravRoy-ETL/slothdb/blob/main/docs/DOCUMENTATION.md
