A high-performance Node.js API that converts CSV files to JSON, transforms the data, loads it into PostgreSQL, and generates age distribution reports.
- Custom CSV Parser - Built from scratch without external CSV libraries, handles complex scenarios (quoted fields, escaped quotes, empty fields)
- Streaming Processing - Memory-efficient handling of large files (50,000+ records) using Node.js streams
- Batch Insertion - Optimized PostgreSQL batch inserts (1000 records per batch) with transaction safety
- Nested JSON Transformation - Converts dot-separated headers (e.g.,
address.city) into nested JSON objects - Data Validation - Robust validation for mandatory fields and data types
- Age Distribution Report - Automatic generation of age-group statistics
git clone https://github.com/Jia2005/Csv-Converter-Api.git
cd Csv-Converter-Apinpm installCreate a PostgreSQL database:
CREATE DATABASE kelp_db;The application will automatically create the required users table on startup.
Create a .env file in the root directory:
DB_USER=your_postgres_username
DB_HOST=localhost
DB_DATABASE=kelp_db
DB_PASSWORD=your_password
DB_PORT=5432
APP_PORT=3000
CSV_FILE_PATH=data/user_samples.csvnpm startThe server will start on http://localhost:3000
Visit or send a GET request to:
http://localhost:3000/api/convert
This will:
- Parse the CSV file from the configured path
- Transform data to JSON with nested structures
- Insert records into PostgreSQL
- Print age distribution report to console
http://localhost:3000/api/distribution
http://localhost:3000/api/health
The project includes comprehensive test coverage (22 tests total):
npm run test:allCSV Parser Tests (6 tests)
npm run test:parserTests for: simple parsing, quoted fields, empty fields, escaped quotes, whitespace handling
Integration Tests (7 tests)
npm run test:integrationTests the complete CSV → JSON → Database pipeline with real data
Edge Case Tests (9 tests)
npm run test:edgeTests validation logic: missing fields, invalid ages, float values, negative numbers
data/user_samples.csv- Sample data for normal operation (7 records)data/edge_cases.csv- Edge case scenarios for validation testing (9 records)
CREATE TABLE public.users (
id serial4 NOT NULL,
name varchar NOT NULL,
age int4 NOT NULL,
address jsonb NULL,
additional_info jsonb NULL,
CONSTRAINT users_pkey PRIMARY KEY (id)
);Field Mapping:
name= firstName + lastName (space-separated)age= Must be a non-negative integeraddress= Alladdress.*fields as JSONBadditional_info= All remaining fields as JSONB
Input CSV:
name.firstName,name.lastName,age,address.line1,address.city,address.state,gender
Rohit,Prasad,35,A-563 Rakshak Society,Pune,Maharashtra,maleOutput Database Record:
{
"id": 1,
"name": "Rohit Prasad",
"age": 35,
"address": {
"line1": "A-563 Rakshak Society",
"city": "Pune",
"state": "Maharashtra"
},
"additional_info": {
"gender": "male"
}
}Console Report:
==================================================
AGE DISTRIBUTION REPORT
==================================================
┌─────────┬────────────────┬─────────────────┐
│ (index) │ Age-Group │ % Distribution │
├─────────┼────────────────┼─────────────────┤
│ 0 │ '< 20' │ '14.29%' │
│ 1 │ '20 to 40' │ '42.86%' │
│ 2 │ '40 to 60' │ '28.57%' │
│ 3 │ '> 60' │ '14.29%' │
└─────────┴────────────────┴─────────────────┘
Total Users: 7
==================================================
- Name: Both
firstNameandlastNameare mandatory - Age: Must be a non-negative integer (floats and negative values rejected)
- CSV Format: First line must be headers, comma-delimited
- Records: Invalid records are logged and skipped (processing continues)
npm start # Start the server
npm run dev # Start with nodemon (auto-reload)
npm test # Run parser and integration tests
npm run test:all # Run all tests (parser + integration + edge cases)
npm run test:parser # Run CSV parser unit tests
npm run test:integration # Run integration tests
npm run test:edge # Run edge case validation testsKey design decisions are documented in Assumptions.md
- Runtime: Node.js with Express.js
- Database: PostgreSQL with
pgdriver - Configuration: dotenv for environment management
- Performance: Streaming API + Batch inserts (1000 records/batch)
Jia Harisinghani - GitHub