It starts with you - an engineer driven to build modern, real-time data platforms that help teams move faster with trust. You care about great service, performance, and cost. Youll architect and ship a top-of-the-line open streaming data lake/lakehouse and data stack, turning massive threat signals into intuitive, self-serve data and fast retrieval for humans and AI agents - powering a unified foundation for AI-driven mission-critical workflows across cloud and on-prem.
If you want to make a meaningful impact, join mission and build best-in-class data systems that move the world forward - this role is for you.
:Responsibilities
Build self-serve platform surfaces (APIs, specs, CLI/UI) for streaming and batch pipelines with correctness, safe replay/backfills, and CDC.
Run the open data lake/lakehouse across cloud and on-prem; enable schema evolution and time travel; tune partitioning and compaction to balance latency, freshness, and cost.
Provide serving and storage across real-time OLAP, OLTP, document engines, and vector databases.
Own the data layer for AI - trusted datasets for training and inference, feature and embedding storage, RAG-ready collections, and foundational building blocks that accelerate AI development across the organization.
Enable AI-native capabilities - support agentic pipelines, self-tuning processes, and secure sandboxing for model experimentation and deployment.
Make catalog, lineage, observability, and governance first-class - with clear ownership, freshness SLAs, and access controls.
Improve performance and cost by tuning runtimes and I/O, profiling bottlenecks, planning capacity, and keeping spend predictable.
Ship paved-road tooling - shared libraries, templates, CI/CD, IaC, and runbooks - while collaborating across AI, ML, Data Science, Engineering, Product, and DevOps. Own architecture, documentation, and operations end-to-end.
Requirements: 6+ years in software engineering, data engineering, platform engineering, or distributed systems, with hands-on experience building and operating data infrastructure at scale.
Streaming & ingestion - Technologies like Flink, Structured Streaming, Kafka, Debezium, Spark, dbt, Airflow/Dagster
Open data lake/lakehouse - Table formats like Iceberg, Delta, or Hudi; columnar formats; partitioning, compaction, schema evolution, time-travel
Serving & retrieval - OLAP engines like ClickHouse or Trino; vector databases like Milvus, Qdrant, or LanceDB; low-latency stores like Redis, ScyllaDB, or DynamoDB
Databases - OLTP systems like Postgres or MySQL; document/search engines like MongoDB or ElasticSearch; serialization with Avro/Protobuf; warehouse patterns
Platform & infra - Kubernetes, AWS, Terraform or similar IaC, CI/CD, observability, incident response
Performance & cost - JVM tuning, query optimization, capacity planning, compute/storage cost modeling
Engineering craft - Java/Scala/Python, testing, secure coding, AI coding tools like Cursor, Claude Code, or Copilot
This position is open to all candidates.