We are recruiting an MLOps / AI-Ops Engineer to join our data & AI Division in Haifa.
This role is responsible for managing, optimizing, and scaling the organizations AI/ML infrastructure across both cloud and on-prem environments, ensuring high availability, performance, and cost efficiency.
Role Overview:
* Manage day-to-day AI infrastructure operations, ensuring reliability, performance, and scalability.
* Deploy, configure, maintain, and troubleshoot AI platforms and tools, including GPU orchestration, Kubernetes, MLflow/Kubeflow, and vector databases.
* Monitor resource utilization (CPU, GPU, memory, Storage, network) and implement performance optimizations.
* Support CI/CD pipelines and infrastructure automation using Infrastructure as Code (Terraform, Ansible).
* Implement security best practices, manage access control, and ensure compliance with AI governance policies.
* Troubleshoot operational incidents and collaborate with AI Platform and AI Security Engineers.
* Contribute to documentation, runbooks, and knowledge sharing.
* Identify opportunities to improve AI infrastructure processes and tooling.
This position is open to all candidates.