We are looking for an exceptional MLOps Team Lead to own, build, and scale the infrastructure and automation that powers state-of-the-art Large Language Models (LLMs) and AI systems.
This is a technical leadership role that blends hands-on engineering with strategic vision. You will define MLOps best practices, build high-performance ML infrastructure, and lead a world-class team working at the intersection of AI research and production-grade ML systems.
You will work closely with LLM Algorithm Researchers, ML Engineers, and Data Scientists to enable fast, scalable, and reliable ML workflows covering everything from distributed training to real-time inference optimization.
If you have deep technical expertise, thrive in high-scale AI environments, and want to lead the next generation of MLOps, we want to hear from you.
Requirements: 3+ years of experience in MLOps, ML infrastructure, or AI platform engineering.
2+ years of hands-on experience in ML pipeline automation, large-scale model deployment, and infrastructure scaling.
Expertise in deep learning frameworks (like PyTorch, TensorFlow, JAX) and MLOps platforms (like Kubeflow, MLflow, TFX).
Proven track record of building production-grade ML systems that scale to billions of predictions daily.
Deep knowledge of Kubernetes, cloud-native architectures (AWS/GCP), and infrastructure as code (Terraform, Helm, ArgoCD).
Strong software engineering skills in Python, Bash, and Go, with a focus on writing clean, maintainable, and scalable code.
Experience with observability & monitoring stacks (Prometheus, Grafana, Datadog, OpenTelemetry).
Strong background in security, compliance, and model governance for AI/ML systems.
Leadership & Execution:
Proven ability to lead high-impact engineering teams in a fast-paced AI environment.
Ability to drive technical strategy while remaining hands-on in critical areas.
Strong cross-functional collaboration skills, working closely with research and engineering teams.
Passion for automation, efficiency, and designing scalable self-service MLOps solutions.
Experience in mentoring and coaching engineers, fostering a culture of innovation and continuous learning.
It Would Be Great If You Have:
Experience working with LLMs and large-scale generative AI models in production.
Expertise in optimizing model inference latency and cost at scale.
Contributions to open-source MLOps tools or AI infrastructure projects.
This position is open to all candidates.