Were looking for a Solutions Engineer with deep experience in Big Data technologies, real-time data pipelines, and scalable infrastructuresomeone whos been delivering critical systems under pressure, and knows what it takes to bring complex data architectures to life. This isnt just about checking boxes on tech stacksits about solving real-world data problems, collaborating with smart people, and building robust, future-proof solutions.
In this role, youll partner closely with engineering, product, and customers to design and deliver high-impact systems that move, transform, and serve data at scale. Youll help customers architect pipelines that are not only performant and cost-efficient but also easy to operate and evolve.
We want someone whos comfortable switching hats between low-level debugging, high-level architecture, and communicating clearly with stakeholders of all technical levels.
Key Responsibilities:
Build distributed data pipelines using technologies like Kafka, Spark (batch & streaming), Python, Trino, Airflow, and S3-compatible data lakesdesigned for scale, modularity, and seamless integration across real-time and batch workloads.
Design, deploy, and troubleshoot hybrid cloud/on-prem environments using Terraform, Docker, Kubernetes, and CI/CD automation tools.
Implement event-driven and serverless workflows with precise control over latency, throughput, and fault tolerance trade-offs.
Create technical guides, architecture docs, and demo pipelines to support onboarding, evangelize best practices, and accelerate adoption across engineering, product, and customer-facing teams.
Integrate data validation, observability tools, and governance directly into the pipeline lifecycle.
Own end-to-end platform lifecycle: ingestion → transformation → storage (Parquet/ORC on S3) → compute layer (Trino/Spark).
Benchmark and tune storage backends (S3/NFS/SMB) and compute layers for throughput, latency, and scalability using production datasets.
Work cross-functionally with R&D to push performance limits across interactive, streaming, and ML-ready analytics workloads.
Operate and debug object storebacked data lake infrastructure, enabling schema-on-read access, high-throughput ingestion, advanced searching strategies, and performance tuning for large-scale workloads.
Requirements: 24 years in software / solution or infrastructure engineering, with 24 years focused on building / maintaining large-scale data pipelines / storage & database solutions.
Proficiency in Trino, Spark (Structured Streaming & batch) and solid working knowledge of Apache Kafka.
Coding background in Python (must-have); familiarity with Bash and scripting tools is a plus.
Deep understanding of data storage architectures including SQL, NoSQL, and HDFS.
Solid grasp of DevOps practices, including containerization (Docker), orchestration (Kubernetes), and infrastructure provisioning (Terraform).
Experience with distributed systems, stream processing, and event-driven architecture.
Hands-on familiarity with benchmarking and performance profiling for storage systems, databases, and analytics engines.
Excellent communication skillsyoull be expected to explain your thinking clearly, guide customer conversations, and collaborate across engineering and product teams.
This position is open to all candidates.