Subhodee Pal sp-202

Hi 👋, I'm Subhodeep Pal

Data Engineer | Platform Architect | Open Source Enthusiast

🚀 About Me

I am a Data Engineer focused on building robust, scalable Lakehouse Architectures. I specialize in deploying distributed compute engines on Kubernetes, optimizing Spark workloads, and automating complex data pipelines.

🔭 Current Project: Building a production-grade Data Lakehouse on Kubernetes featuring:
- Compute: Apache Spark 4.0.1 (Spark Connect) with custom Docker images.
- Storage: Delta Lake 4.0 on MinIO (S3) with Unity Catalog OSS.
- Serving: StarRocks for sub-second BI query performance.
- Orchestration: Apache Airflow with complex DAG dependencies.
🌱 Currently Learning: Advanced Observability (Loki/Promtail) and Bare Metal K8s networking.
👯 Looking to Collaborate: Open source projects related to Data Engineering, Spark, or Cloud Infrastructure.
💬 Ask Me About: - Apache Spark (Optimization, K8s Deployment)
- Kubernetes (Operators, Helm, Networking)
- Delta Lake & Data Strategy
- CI/CD for Data

🛠️ The Tech Stack

Big Data & Distributed Systems

Infrastructure & DevOps

Languages

📈 Github Stats

"I automate everything I can, and I optimize what I can't."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly