As a Senior Software Engineer specializing in Python and the Data Ecosystem, you'll be a core contributor owning and evolving critical parts of ClickHouse's data engineering ecosystem. This role sits at the intersection of high-performance database engineering and developer experience. You'll craft tools that enable Data Engineers and Data Scientists to harness ClickHouse's speed and scale in the frameworks they already use.
We're looking for someone who has lived the Data Engineer or Data Scientist experience firsthand. The data practitioner's world is shifting rapidly: databases are no longer just query targets, but they're becoming active participants in AI-powered workflows, serving as vector stores for RAG pipelines, backends for LLM-powered agents, and real-time feature stores for ML inference. You understand these workflows not from the outside, but because you've operated within them. You don't just build integrations, you bring product-level insight into what we should build and why.
You'll own the full lifecycle of key Python integrations, driving architecture, performance, and feature direction across:
Orchestration Platforms: Apache Airflow, Dagster, Prefect
Transformation Tools: dbt, SQLMesh.
AI & LLM Ecosystem: LangChain, LlamaIndex, n8n, and broader AI tooling: embedding pipelines, retrieval-augmented generation with us as a vector store, ML feature stores, and LLM-powered data applications.
our columnar architecture and query performance make it exceptionally well-positioned in this new landscape. Your job is to make that potential real: building the robust, production-ready connectors that make us the natural choice when data practitioners design their next-generation AI and data systems.
What you'll do
Own and evolve our Python connector and SDK ecosystem, raising the bar on performance, reliability, and API design
Build and maintain integrations with orchestration platforms (Airflow, Dagster, Prefect) and transformation tools (dbt) to enterprise-grade quality standards
Drive the AI/LLM integration strategy: designing connectors and patterns that make us a natural fit in RAG architectures, ML feature pipelines, and LLM-powered data applications
Engage actively with the open-source community: triage issues, support contributors, advocate for users, and shape the roadmap based on real-world feedback
Collaborate with Product, Cloud, and other engineering teams to align integration work with broader platform priorities
Bring a practitioner's perspective to roadmap decisions, grounding prioritization in genuine Data Engineer and Data Scientist workflows
Requirements: About you:
7+ years of software development experience, including hands-on time as a Data Engineer, Data Scientist, or ML Engineer.
Deep, proven experience designing, building, and maintaining production-grade Python connectors, SDKs, or integrations for at least one major platform (orchestration, BI, MLOps, or data transformation).
Hands-on experience applying AI/ML in production data-engineering contexts: embedding generation, vector search, feature pipelines, or LLM-powered tooling that shipped and ran in production.
Solid experience with the Python data ecosystem: Pandas, NumPy, Pydantic, and related libraries.
Strong database fundamentals: SQL, data modeling, query optimization, and familiarity with OLAP/analytical databases.
Solid experience with concurrent Python: threading, multiprocessing, and async patterns.
Outstanding written and verbal communication; comfortable collaborating across engineering functions and with open-source communities.
Bonus points for:
Prior experience as a Data Engineer or Data Scientist in a product-facing or platform role.
Familiarity with ClickHouse or similar high-performance OLAP platforms.
Familiarity with the JVM ecosystem.
Experience deploying AI/ML models in production, including inference APIs and vector databases.
This position is open to all candidates.