I am a Data Engineer focused on building robust, scalable Lakehouse Architectures. I specialize in deploying distributed compute engines on Kubernetes, optimizing Spark workloads, and automating complex data pipelines.
- 🔭 Current Project: Building a production-grade Data Lakehouse on Kubernetes featuring:
- Compute: Apache Spark 4.0.1 (Spark Connect) with custom Docker images.
- Storage: Delta Lake 4.0 on MinIO (S3) with Unity Catalog OSS.
- Serving: StarRocks for sub-second BI query performance.
- Orchestration: Apache Airflow with complex DAG dependencies.
- 🌱 Currently Learning: Advanced Observability (Loki/Promtail) and Bare Metal K8s networking.
- 👯 Looking to Collaborate: Open source projects related to Data Engineering, Spark, or Cloud Infrastructure.
- 💬 Ask Me About: - Apache Spark (Optimization, K8s Deployment)
- Kubernetes (Operators, Helm, Networking)
- Delta Lake & Data Strategy
- CI/CD for Data
"I automate everything I can, and I optimize what I can't."



