Required Senior DevOps Engineer- Israel
Job Description
Does an opportunity to build solutions for large-scale suit you?
Do hybrid local/cloud infrastructures interest you?
Join our Engineering team
The group is part of Cloud Security Intelligence, managing one of Israel's largest Big Data environments. Responsibilities include developing innovative Intelligence Security products within this ecosystem. This role contributes to building Aegis, a cloud-native Big Data and Machine Learning platform. Aegis accelerates research-to-production workflows, allowing teams to train, deploy, and monitor ML pipelines efficiently on Linode.
Make a difference in your own way
Assist in creating an innovative Big Data platform for engineering teams, enhancing development, efficiency, and secure services. Responsibilities include designing and maintaining an ML infrastructure supporting experiment tracking, GPU training, deployment, and monitoring.
As a DevOps, you will be responsible for:
Designing and implementing infrastructure solutions using Azure, Linode, Kubernetes, Kafka, vault, and storage systems.
Developing and provisioning infrastructure applications and monitoring tools, including OpenSearch, OpenTelemetry, Prometheus, Grafana, Pushgateway.
Building and maintaining CI/CD pipelines using Jenkins, In addition building GitOps solutions such as ArgoCD.
Working in all stages of the software release process in all development and production environments
Building and maintaining MLOps pipelines using Argo Workflows, integrated with Spark, notebooks, and inference endpoints
Deploying and managing GPU-enabled Kubernetes nodes, integrating ML tools leveraging GPU acceleration like PyTorch, TensorFlow, CUDA.
Supporting machine learning tools like MLflow, Feast, KServe, Triton, TorchServe, and scalable model serving solutions.
Do what you love.
Requirements: To be successful in this role you will:
Have 5+ years of Proven experience as a DevOps Engineer.
Be proficient in working in Linux/Unix environments, and demonstrate solid experience in Python and shell scripting.
Have proven experience in designing and implementing solutions for Kubernetes
Demonstrate expertise in deploying container technologies and migrating systems to cloud platforms like Azure/AWS/GCP.
Be responsible, self-managed, self-motivated, and able to work with little or no supervision.
Have exceptional attention to detail and excellent troubleshooting skills.
Demonstrate expertise in MLOps ecosystems and experience supporting production data science workflows with GPUs.
This position is open to all candidates.