We are looking for a hands-on Tech Lead to join the Core Platform team within our ML. Our engineering teams build the foundational systems behind global artifact storage, replication, and distribution - and increasingly power the next generation of AI/ML operations and governance. Our platform is the backbone for ML workloads: managing model binaries, versioning, and scalable runtime environments for ML and AI applications. This role combines deep distributed systems with modern ML infrastructure challenges such as high-throughput inference, safe model rollouts, and multi-cloud GPU efficiency. You will also help evolve core libraries and developer-facing tools, including logging, observability, and visibility components.
As a senior technical leader, you will influence architecture across squads, lead complex development efforts, and remain heavily hands-on.
As a Tech Lead in Core Platform you will
Design and evolve components for managing and distributing ML/AI models and artifacts at scale.
Extend the platform to support reliable, high-performance inference and training workflows.
Lead cross-team technical initiatives and serve as a reference for distributed systems and ML infra design.
Write maintainable, high-quality code in performance-critical areas.
Mentor engineers and drive strong engineering practices.
Collaborate with adjacent teams to ensure seamless end-to-end ML platform behavior.
Improve the reliability, efficiency, and observability of core services.
Requirements: To be a Tech Lead in Core Platform you need...
7+ years building large-scale backend or distributed systems.
Strong foundation in distributed systems (consistency, replication, concurrency, fault tolerance).
Proficiency in Java / Go or similar languages.
Hands-on experience with high-performance, scalable, and reliable systems.
Ability to lead design discussions and influence technical direction across teams.
Curiosity and willingness to work with ML systems and workload patterns.
Experience with Kubernetes, container orchestration, or cloud-native infrastructure.
Thrive in a collaborative, ownership-driven engineering culture.
Bonus Points:
Experience with ML model serving, vector DBs, model versioning, or GPU orchestration.
Background in secure software supply chain workflows.
Strong performance debugging and optimization skills.
This position is open to all candidates.