We are looking for a hands-on data Architect to design, build, and optimize a cutting-edge data Lakehouse solution from the ground up. This role requires deep technical expertise in Big Data architectures, data modeling, ETL /ELT pipelines, cloud & on-prem solutions, and Real-Time analytics. As part of a fast-paced startup, youll be directly involved in coding, implementing, and scaling the platform not just designing, but building alongside the team. Youll take ownership of data strategies, governance, and architecture to enable high-performance analytics, AI, and business intelligence
Key Responsibilities Hands-On Architecture & Development: Build and deploy a scalable, open-source data Lakehouse integrating structured, semi-structured, and unstructured data.
* Design Real-Time and batch data processing pipelines using open-source frameworks (Apache Spark, Flink, Trino, Iceberg, Delta Lake, etc.).
* Develop cost-effective, high-performance data Storage strategies (columnar formats: Parquet, ORC).
* Implement best practices for data security, governance, access control, and compliance (GDPR, CCPA, etc.).
* Ensure seamless data integration across cloud and on-prem environments. data Engineering & ETL Pipelines:
* Develop high-performance ETL /ELT pipelines to ingest data from diverse sources (APIs, databases, IoT, logs).
* Optimize query performance using indexing, caching, materialized views, and distributed computing
* Implement metadata-driven, schema evolution-friendly data ingestion strategies.
* Ensure data quality, lineage, and observability across the entire pipeline. Collaboration & Execution: Work hands-on with data Engineers, BI Analysts, and ML Engineers to build an integrated platform.
* Define data cataloging and self-service analytics capabilities for internal users.
* Drive technical decision-making while balancing speed and scalability.
* Stay ahead of emerging trends in data architecture, ensuring best practices are implemented.
Requirements: 8+ years of hands-on experience in data Architecture, Big Data, or Cloud data Engineering. Expertise in open-source data Lakehouse technologies (Apache Iceberg, Delta Lake, Hudi, Presto/Trino, DuckDB, etc.).
* Deep hands-on experience with distributed computing frameworks (Apache Spark, Flink, Kafka, etc.).
* Strong coding skills in SQL, Python, and data modeling techniques
* Experience in designing and deploying scalable ETL /ELT pipelines (Apache Airflow, Dagster).
* Proven track record working with cloud platforms (AWS, GCP, Azure) and hybrid on-prem architectures.
* Strong knowledge of BI tools, analytical querying, and text search (Elasticsearch, OpenSearch). Proactive, execution-driven mindsetable to build, iterate, and scale quickly.
This position is open to all candidates.