A versatile Python script for easily importing CSV, JSON, and PDF data into CouchDB.
- CSV Support: Includes column validation, a progress indicator, and a fix for the common "first row skip" bug.
- JSON & JSONL Support: Supports various structures (CouchDB standard
{"docs": [...]}, simple lists, or JSON Lines format). - PDF Support: Extracts text content using
PyPDF2and automatically attaches the original PDF file as a CouchDB attachment. - Fault Tolerant: Automatically wraps simple data types into objects and includes network timeouts to prevent hanging.
- Auto-DB Creation: Automatically creates the target database if it doesn't already exist. 🚀
git clone https://github.com/wanjus/couchdb-importer.git
cd couchdb-importer
pip install -r requirements.txtThe script is controlled via the command line:
python CouchDB-Importer.py --file <FILEPATH> --db <DB_NAME> [OPTIONS]--file: Path to the source file (.csv,.json, or.pdf). (Required)--db: Name of the target database. (Required)--url: CouchDB URL (Default:http://localhost:5984).--user: CouchDB username.--password: CouchDB password.--id-column: Column or field name to be used as the document ID (_id).
python CouchDB-Importer.py --file test.json --db test --user admin --password adminpython CouchDB-Importer.py --file test.csv --db test --user admin --password adminpython CouchDB-Importer.py --file document.pdf --db docs --user admin --password adminpython CouchDB-Importer.py --file data.csv --db my_db --id-column customer_idDeveloped by Team Wanju & KI Gemini.
This project is released under the MIT License. See LICENSE for details.
