דרושים » מחשבים ורשתות » Site Reliability Team Leader 22846

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 21 שעות
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
We are looking for a Site Reliability Engineering (SRE) & Production Team Leader to join our Engineering team. Someone who has a passion for observability, monitoring, automation, and high-availability systems, and who has a desire to solve complex technological challenges with a proactive approach to continuous improvement.
We use an interesting and mixed technology stack: Kubernetes, Terraform, CI/CD pipelines, Datadog, Prometheus, and cloud-native architectures.
In this position, you will use your expertise in building and scaling SRE operations, and will design, implement, and operate a world-class reliability strategy.
About Us
we are a key player the network security field, striving to provide the leading SASE platform in the market. Our innovative approach, merging cloud and on-device protection, redefines how businesses connect in the era of cloud and remote work.
Key Responsibilities
Design, build, and manage our SRE framework to ensure observability, resilience, and high availability.
Develop and automate solutions for proactive monitoring, incident response, and performance optimization.
Improve and maintain our alerting and monitoring stack, leveraging tools like Datadog, Prometheus, and Grafana.
Lead post-mortem analysis and implement continuous improvement initiatives.
Collaborate with DevOps, Engineering, and Product teams to ensure smooth and efficient delivery of reliable services.
Requirements:
SRE & Production Manager with 5+ years of experience in SRE, Production Engineering, or DevOps, including 2+ years in a leadership role.
Experience with monitoring and observability tools like Datadog, Prometheus, and Grafana.
A problem solver, capable of finding creative solutions and getting things done.
Fluent with incident management, RCA processes, and operational best practices.
Experience with AWS (EKS, EC2, RDS, S3, networking configurations).
It would be great if you also have:
Experience in high-scale distributed systems.
Background in security and compliance for cloud infrastructure.
Understanding of cost optimization and resource management in cloud environments.
Familiarity with machine learning or predictive analytics for proactive reliability management.
Proficiency in Python, Go, or Bash for automation and scripting.
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8259881
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
We are looking for a Site Reliability Engineer (SRE) to join our Engineering team. Someone who has a passion for observability, monitoring, automation, and high-availability systems, and who has a desire to solve complex technological challenges with a proactive approach to continuous improvement.
We use an interesting and mixed technology stack: Kubernetes, Terraform, CI/CD pipelines, Datadog, Prometheus, and cloud-native architectures.
In this position, you will use your expertise in building and scaling SRE operations, and will design, implement, and operate a world-class reliability strategy.
About Us
we are a key player the network security field, striving to provide the leading SASE platform in the market. Our innovative approach, merging cloud and on-device protection, redefines how businesses connect in the era of cloud and remote work.
Key Responsibilities
Develop and maintain our monitoring, alerting, and logging systems, ensuring high visibility into production environments.
Implement automation to improve system reliability, scalability, and efficiency.
Troubleshoot and resolve production incidents, leading root cause analyses and implementing permanent fixes.
Collaborate with software engineers and DevOps teams to enhance application performance and resilience.
Continuously improve operational processes, focusing on reducing toil and improving reliability.
Requirements:
3+ years of experience as an SRE, DevOps Engineer, or in a similar role.
Hands-on experience with monitoring and observability tools like Datadog, Prometheus, and Grafana.
Strong understanding of Linux systems, networking, and cloud-native architectures.
Experience with Kubernetes, Terraform, and CI/CD pipelines.
A problem solver, capable of finding creative solutions and getting things done.
Fluent with incident management, RCA processes, and operational best practices.
It would be great if you also have:
Experience in high-scale distributed systems.
Background in security and compliance for cloud infrastructure.
Familiarity with AWS (EKS, EC2, RDS, S3, networking configurations).
Proficiency in Python, Go, or Bash for automation and scripting.
Understanding of cost optimization and resource management in cloud environments.
Familiarity with machine learning or predictive analytics for proactive reliability management.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8258448
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
2 ימים
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a Site Reliability Engineer to join our DevOps team. You will ensure the reliability, performance, and scalability of our back-office solutions, which serve as the foundation for the entire purchasing process. This role will lead the development of SRE capabilities, meeting SLI/SLO/SLA targets, and establishing effective monitoring systems. You will enhance our Software Development Lifecycle by integrating reliability and scalability, working with cross-functional teams, and supporting production environments. Additionally, you will implement incident management processes and conduct post-mortem analyses to drive continuous improvement. If you have a strong engineering and automation background and are passionate about the E-commerce field, then we would love to hear from you.
Roles and Responsibilities:
Develop and implement SRE capabilities to enhance the reliability, availability, and performance of Admin solutions.
Design and maintain proactive monitoring and alerting systems for deep visibility into critical business flows, beyond simple statuses, to identify functional issues.
Drive improvements in the Software Development Lifecycle (SDLC) for reliability and scalability from design to deployment.
Collaborate with development and operations teams to troubleshoot production incidents affecting the purchase flow through root cause analysis.
Lead SRE initiatives to boost system resilience and operational efficiency.
Implement best practices for incident management and conduct blameless post-mortems, contributing to capacity planning and performance testing to ensure scalability.
Requirements:
5+ years of experience as a Site Reliability/DevOps Engineer
Deep understanding of E-commerce flows, specifically with back-office operations and order processing - must
Experience as an Automation/Software Engineer with a strong understanding of software development principles and in building, testing, and deploying distributed systems - must
Experience in designing, implementing, and utilizing monitoring and observability platforms such as DataDog, NewRelic, Prometheus/Grafana, or ELK stack - must
Proficiency in scripting and automation using languages such as Python, Java, etc. - must
Ability to create dashboards, alerts, and insightful queries - must
Experience with AWS services to build and operate scalable and resilient applications (e.g., EC2, ECS/EKS, RDS, S3, Lambda, CloudWatch) - plus
Experience in automating infrastructure provisioning, application deployments, and repetitive operational tasks - plus
Proactive approach with excellent problem-solving skills
Strong collaborator, with an ability to work with cross-functional teams
Proficient in English.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8255386
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
Cyberint, a market leader in External Risk Management, empowers global organizations to detect, respond, and remediate external threats efficiently. Now part of our company, Cyberint continues to grow and innovate at the intersection of cybersecurity and cloud-native SaaS technologies.
Join Our Operations Team
We are seeking a proactive, experienced Site Reliability Engineer (SRE) to join our dynamic Operations team. Youll be working on a cutting-edge SaaS solution that runs on AWS (EKS-based Kubernetes infrastructure), supporting an architecture with many moving parts. If you're driven by reliability engineering, love automation, and want to make an impact on mission-critical platforms, this role is for you.
What Youll Do
As an SRE at Cyberint, you will be instrumental in ensuring the observability, stability, and scalability of our platform. You will develop automated solutions and monitoring tools to proactively detect and respond to incidents, improve system resilience, and collaborate with engineering teams across the company to embed operational excellence into our product lifecycle.
Additionally, you will help evolve our AI-driven operational and monitoring tooling, including our on-call assistant bot, which leverages AI technologies to streamline incident resolution, automate repetitive tasks, and support real-time decision-making for engineers.
Key Responsibilities
Design, implement, and maintain monitoring and alerting systems (e.g., Prometheus, Grafana) to detect and prevent reliability issues.
Develop tools and automation (Python, Bash, etc.) for improving infrastructure reliability and operational efficiency.
Collaborate with R&D and Product teams to embed reliability-first principles into every stage of the development process.
Participate in and improve incident response processes, including running blameless postmortems and implementing preventive measures.
Enhance our Infrastructure-as-Code (IaC) and CI/CD practices to streamline deployments and reduce risk.
Maintain and extend internal AI-driven tools, such as bots that support SRE workflows (on-call management, triaging, etc.).
Document infrastructure, playbooks, and operational procedures to facilitate onboarding and knowledge sharing.
Requirements:
3+ years of experience in an SRE, DevOps, or similar role in a SaaS/cloud-native environment.
Strong experience with Kubernetes, AWS, and cloud-based distributed systems.
Hands-on experience building or maintaining monitoring stacks such as Prometheus, Grafana, ELK, etc.
Proficiency in Python, Bash, or similar scripting languages.
Experience with Infrastructure as Code tools (Terraform, Helm, etc.).
Familiarity with CI/CD tools (e.g., GitHub Actions, Jenkins, ArgoCD).
Solid analytical and problem-solving skills with a passion for operational excellence.
Exposure to AI-based tooling (e.g., OpenAI API, LLM-based bots) to automate operations or enhance incident response processes.
Nice to Have
Experience with incident management platforms (e.g., PagerDuty).
Security-minded mindset and experience in the cybersecurity industry.
Experience with service mesh, zero-downtime deployments, or chaos engineering.
Contributions to AI-assisted SRE initiatives or platform operations & monitoring automation.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8257631
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
Were looking for a Site Reliability Engineer (SRE) to enhance the reliability, performance, and scalability of our production infrastructure. This role goes beyond keeping systems runningyoull be a key player in shaping the culture of reliability, driving self-healing mechanisms, proactive alerting strategies, and automation to reduce toil and improve operational efficiency. You'll work closely with engineering teams to ensure high availability, observability, and smooth incident management processes.
Responsibilities
Ensure reliability & scalability of our production environment across multiple cloud providers.
Define and implement SRE best practicesfostering a culture of ownership, continuous improvement, and automation.
Automate everythingfrom infrastructure deployment to self-healing mechanisms that eliminate manual intervention.
Design and improve observability solutions (monitoring, logging, tracing) to enable faster detection and resolution of issues.
Optimize alerting strategies to ensure actionable, high-quality alerts while minimizing noise and fatigue.
Improve system resilience, driving chaos engineering, failover strategies, and automatic recovery processes.
Enhance incident response processes, including on-call strategies, root cause analysis, and post-mortems to drive long-term stability.
Collaborate with development teams to build reliable, scalable, and efficient architectures, ensuring seamless deployment and rollback processes.
Promote a culture of reliability, educating teams on best practices, service ownership, and production-readiness.
Requirements:
3+ years of experience as an SRE, DevOps Engineer, or in a similar role.
Strong expertise in Kubernetes and container orchestration in production.
Hands-on experience with cloud platforms (AWS, Azure, or GCP).
Proven experience with monitoring & observability tools (Prometheus, ELK, Grafana, Coralogix, etc.).
Strong scripting/programming skills (Python, Go, Bash, or similar).
Experience with Infrastructure as Code (IaC)Terraform, Helm, or similar tools.
Track record of improving system reliability, scalability, and performance.
Experience designing and implementing self-healing mechanisms to minimize human intervention.
Ability to foster a strong reliability culture across engineering teams, leading by example.
Excellent problem-solving skills, with a proactive and ownership-driven mindset.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8228722
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
2 ימים
Location: Tel Aviv-Yafo and Netanya
Job Type: Full Time
At our company, were reinventing DevOps to help the worlds greatest companies innovate -- and we want you along for the ride. This is a special place with a unique combination of brilliance, spirit and just all-around great people. Here, if youre willing to do more, your career can take off. And since software plays a central role in everyones lives, youll be part of an important mission. Thousands of customers, including the majority of the Fortune 100, trust our company to manage, accelerate, and secure their software delivery from code to production -- a concept we call liquid software. Wouldn't it be amazing if you could join us in our journey?
We are looking for a Site Reliability Engineering Manager to lead our Israel SRE team. In this role, you'll drive best practices in reliability engineering, ensuring the stability, availability, and performance of our companys SaaS services. You'll collaborate with global SRE leaders, refine processes, and foster a culture of accountability and continuous improvement.
As a Site Reliability Engineering Manager at our company you will
Lead, mentor, and develop a high-performing SRE Israel team, fostering collaboration, innovation, and accountability
Ensure SaaS reliability, performance, and availability, meeting or exceeding service-level objectives
Drive SRE best practices, including capacity planning, incident management, chaos engineering, and disaster recovery
Implement proactive monitoring, alerting, and anomaly detection aligned with SaaS standards
Collaborate with P&E and Cloud engineering teams to embed reliability into the SDLC
Oversee incident management, ensuring swift identification, escalation, and resolution
Maintain comprehensive SRE documentation, including processes, incident reports, and system architecture
Evaluate and adopt tools, technologies, and methodologies to enhance uptime and reliability.
Requirements:
3+ years of management experience leading a team of SRE, DevOps, or a similar SaaS role
Bachelors degree in Computer Science, Engineering, or related field (or equivalent experience)
Strong expertise in cloud platforms (AWS, GCP, or Azure), containers (Kubernetes, Docker), and configuration management (Terraform, Ansible)
Proficiency in Python or Go for automation and system optimization, as well as GitOps experience with SCM tools (e.g., Git, Bitbucket)
Strong leadership, communication, and collaboration skills, working across globally distributed teams
Familiarity with Agile methodologies, CI/CD pipelines, and orchestration tools (Jenkins, ArgoCD, StackStorm)
Familiarity with Chaos Engineering (e.g., Gremlin, Litmus, Chaos Toolkit)
Hands-on with alerting & observability tools (e.g., PagerDuty, OpsGenie, New Relic, Coralogix)
Strong understanding of scalability, high availability, and security best practices in cloud & Kubernetes environments.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8255508
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 21 שעות
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
we are a global leader in cybersecurity, dedicated to protecting organizations from cyber threats. Our team is at the forefront of developing innovative cloud solutions, and we are looking for a Senior DevOps Engineer to join our Cloud Network Security group.
Key Responsibilities
As a DevOps Engineer at our company, you will design, implement, and manage CI/CD pipelines, collaborate with cross-functional teams, and ensure the high availability and reliability of our cloud-based services and solutions.
Responsibilities:
Design, implement, and manage CI/CD pipelines to automate the deployment of SaaS
Collaborate with development, QA, and operations teams to ensure smooth and reliable software releases.
Monitor system performance and troubleshoot issues to ensure high availability and reliability of our services.
Implement and manage infrastructure as code (IaC) using tools like Terraform, CloudFormation and ARM.
Optimize system performance, scalability, and security.
Develop and maintain documentation for infrastructure and deployment processes.
Requirements:
Your Knowledge & Skills:
2-4 years of experience in DevOps or a related role, working with distributed systems and SaaS applications.
Proficiency with CI/CD tools such as Gerrit, GitLab CI, GitHub
Experience with Cloud Providers like: AWS, Azure, GCP
Solid foundation in Cloud account users management & cost optimizations (FinOps principles)
Solid understanding of networking, security, and system administration.
Familiarity with logging and monitoring stacks (e.g., Elasticsearch, CloudWatch, Grafana, Prometheus).
Proficiency in scripting (Python, Bash) for automation and tooling.
Solid grasp of IaC & GitOps principles and best practices (Terraform, Helm, ArgoCD, Crossplane).
Knowledge of agile methodologies and practices
Strong knowledge of distributed systems, microservices, and orchestration technologies
Expertise in containerization and orchestration tools like Docker and Kubernetes
Mindset & Traits:
An innovative approach, with strong communication and collaboration skills
Independent, autodidact, and passionate about new DevOps challenges
Passion for automation, self-service, and continuous improvement
Comfortable working in fast-paced SaaS environments with cross-functional teams
Excellent problem-solving skills and attention to detail
Advantages:
Network Security background
Knowledge in our company's products.
Bachelors degree in Computer Science or a related technical field
Certifications in AWS, Azure, or other relevant technologies.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8259831
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 21 שעות
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
our company's Infinity External Risk Management, otherwise known as Cyberint, continuously reduces external cyber risk by managing and mitigating an array of digital threats with a unified solution.
At Cyberint, we help organizations protect their digital presence by delivering cutting-edge Attack Surface Management (ASM) and Threat Intelligence (TI) solutions. As a member of our R&D organization, youll play a key role in ensuring the scalability, reliability, and performance of our cloud-native SaaS platform operating at scale.
Key Responsibilities
As a DevOps Engineer, you will be a core member of our DevOps & Infrastructure team, focused on building and maintaining distributed, scalable, and highly available systems in a dynamic SaaS environment. You will collaborate closely with development, QA, and support teams to enhance automation, improve CI/CD pipelines, and drive operational excellence across the board.
Key Responsibilities:
Design, build, and maintain infrastructure in a modern cloud-native SaaS ecosystem (primarily AWS).
Contribute to the scalability and reliability of distributed systems supporting high-volume data processing and real-time operations.
Develop and enhance CI/CD pipelines to support rapid and reliable deployments across multiple environments.
Implement and manage Infrastructure as Code (IaC) using Terraform for consistent, scalable infrastructure.
Operate and optimize Kubernetes (EKS) clusters to support distributed microservices architectures.
Monitor and respond to system alerts, troubleshoot issues, and contribute to incident prevention and response strategies.
Build self-service tools and automation frameworks to empower R&D teams and enhance delivery velocity.
Work cross-functionally with developers, QA, and support to ensure infrastructure meets evolving product needs.
Write and maintain scripts (Python, Bash) to automate recurring tasks and streamline operations.
Continuously identify and execute improvements in system performance, availability, and cost-efficiency.
Requirements:
Experience:
25 years of experience in DevOps, SRE, or infrastructure engineering roles, working with distributed systems and SaaS applications.
Hands-on experience with public cloud providers (AWS strongly preferred).
Production experience with tools such as Kubernetes, Terraform, CI/CD platforms (Jenkins, ArgoCD), and monitoring systems (Prometheus, Grafana).
Skills:
Solid grasp of Infrastructure as Code principles and best practices.
Strong knowledge of distributed systems, microservices, and orchestration technologies.
Proficiency in scripting (Python, Bash) for automation and tooling.
Familiarity with logging and monitoring stacks (e.g., Elasticsearch, Redis, CloudWatch, Grafana, Prometheus).
Awareness of DevOps security practices and cloud cost optimization strategies.
Mindset & Traits:
A strong sense of ownership and accountability for system health and performance.
Passion for automation, self-service, and continuous improvement.
Excellent communication and collaboration skills.
Comfortable working in fast-paced SaaS environments with cross-functional teams.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8259928
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
18/06/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking a talented, self-driven and passionate Senior Infrastructure Engineer to build and maintain the cloud infrastructure for our highly available SaaS application as well as our machine learning and data engineering stack.

As a Senior Infrastructure Engineer, you will be responsible for designing, implementing, and maintaining the cloud infrastructure and DevOps processes that power our products and internal tooling. You will work closely with all data and development teams and lead the companys security and compliance vectors. You will ensure a highly reliable, scalable, and secure infrastructure that supports our rapid growth and product innovation, while maintaining observability and cost-effectiveness of our cloud resources and data.

What Youll Do

Cloud Infrastructure Management: Architect, deploy, and manage our cloud infrastructure (AWS), ensuring high availability, scalability, and security.
Software Engineering: Be a top notch SW engineer, harnessing your coding and architectural skills, as well as researching skills, for our infra stack.
Infrastructure as Code (IaC): Define and maintain infrastructure using tools like Terraform, CloudFormation, or Pulumi to manage resources efficiently and reproducibly.
Monitoring & Incident Management: Build and manage monitoring and alerting systems to ensure uptime, and respond to incidents with root cause analysis and remediation.
DevOps & Automation: Implement and maintain CI/CD pipelines to streamline development workflows and automate deployment processes across development, staging, and production environments, and across different parts of our solution. While our development teams are expected to write and maintain their own CI, you will act as a supervisor and professional authority, and maintain cross team and complex automation.
Collaboration and technical leadership: Partner with software engineers, data engineers, and machine learning teams to support their infrastructure needs and guide the evolution of our infrastructure team.
Cost Optimization: Monitor cloud spend and optimize resources to ensure cost-effective infrastructure without sacrificing performance or security.
Security & Compliance: Implement security best practices, including access control, network security, monitoring and ensuring the infrastructure is compliant with relevant industry standards (e.g., SOC2, GDPR).
דרישות:
5+ years of hands-on experience in cloud infrastructure, DevOps and platform engineering in production environments.
Expertise in managing cloud infrastructure on at least one of the major providers: AWS, GCP, Azure. Proficient in Infrastructure as Code tools such as Terraform, CloudFormation, or Pulumi.
Solid experience with Docker and Kubernetes.
Monitoring & Logging: Hands-on experience with monitoring tools (Prometheus, Grafana) and logging systems (ELK, Splunk, or equivalent).
Proficient Software engineering, architecture, as well as scripting languages such as Python, Bash, or Go. Full control of version control systems such as Git.
Strong experience with CI/CD pipelines and automation using Jenkins, CircleCI, GitHub Actions, GitLab CI, or similar.
Strong understanding of cloud networking, VPNs, VPCs, DNS, and firewalls.
Experience implementing cloud security best practices, including IAM, encryption, and key management.
Previous experience in a fast-paced startup environment, where adaptability and hands-on execution are key.
Strong communication skills and ability to work cross-functionally with different teams.
Advantages:

Experience supporting machine learning pipelines and deploying ML models to production environments.
Familiarity with data engineering tools like Apache Spark, Airflow, or similar ETL tools.
Experience with serverless technologies such as AWS Lambda, GCP Functions, or Azure Functions.
AWS Certified Security Specialty, or equivalent certifications in cloud security.
Experience and knowledge with regulatory complia המשרה מיועדת לנשים ולגברים כאחד.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8222324
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
16/06/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a DevOps DevOps Engineer to take ownership of our Cloud Infrastructure and Platform Engineering strategy, enabling high-scale, cutting-edge GenAI products running across 40+ Kubernetes clusters on GCP and AWS.
This role is a hands-on engineering , requiring deep expertise in cloud-native technologies, Kubernetes at scale, and modern DevOps principles. You will work closely with engineering teams to design and implement scalable infrastructure solutions, optimize developer workflows, and ensure reliability and efficiency across our platform.
Role and Responsibilities:
Cloud & Kubernetes Expertise: Design and implement highly scalable multi-cluster Kubernetes environments across GCP & AWS.
Developer Experience & Enablement: Lead the development of self-service tools and automation that improve efficiency for R&D teams.
Incident & Reliability Engineering: Work with engineering teams to optimize cost, performance, and reliability of production infrastructure through monitoring, capacity planning, and scaling strategies.
Security & Governance: Contribute to best practices for RBAC, IAM, cloud security, and compliance while ensuring infrastructure reliability.
Automation & Infrastructure as Code: Drive adoption of GitOps workflows and Infrastructure as Code (Terraform, Helm, Crossplane) to enhance automation and consistency.
Mentorship & Team Growth: Provide technical mentorship within the platform engineering team and contribute to knowledge-sharing across R&D.
Cross-Team Collaboration: Work closely with engineering teams to align cloud infrastructure goals with business needs and reliability requirements.
Technology Assessment: Assess and advocate for new technologies that improve reliability, efficiency, and scalability within the platform.
Requirements:
Technical Expertise:
5+ years of DevOps, or SRE experience
3+ years working with public cloud platforms (AWS, GCP) at scale
Deep Kubernetes expertise, including managing large-scale, multi-cluster enterprise-grade Kubernetes environments
Experience designing and managing Custom Resource Definitions (CRDs) and custom controllers
Strong background in Infrastructure as Code (Terraform, Helm) and GitOps principles (ArgoCD, Crossplane, FluxCD, etc.)
Hands-on experience in observability & monitoring (Prometheus, Grafana, Datadog, OpenTelemetry, etc.)
Proficiency in scripting & automation (Python, Go, Bash) for infrastructure automation
Expertise in cloud networking (VPC, load balancers, service meshes) and security best practices (RBAC, IAM, security groups, network policies, etc.)
Experience with CI/CD pipelines, optimizing for performance, security, and developer velocity
Nice-to-Have:
Experience with self-hosted on-prem deployments and managed private VPC deployments (Bring Your Own Cloud models)
Advanced expertise in Helm and Crossplane for Kubernetes resource management.
Other cloud provider experience
Experience in GenAI or large-scale SaaS platforms
Familiarity with SQL/NoSQL databases and distributed systems
DevSecOps experience, with a strong understanding of security automation and compliance frameworks.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8218726
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
2 ימים
Location: Tel Aviv-Yafo and Netanya
Job Type: Full Time
At our company, were reinventing DevOps to help the worlds greatest companies innovate -- and we want you along for the ride. This is a special place with a unique combination of brilliance, spirit and just all-around great people. Here, if youre willing to do more, your career can take off. And since software plays a central role in everyones lives, youll be part of an important mission. Thousands of customers, including the majority of the Fortune 100, trust our company to manage, accelerate, and secure their software delivery from code to production -- a concept we call liquid software. Wouldn't it be amazing if you could join us on our journey?
our company seeks a highly-skilled Senior Site Reliability Engineer to join our team! In this role, you will drive best practices, optimize operational workflows, and mentor junior engineers, fostering a culture of collaboration and innovation. This is an exciting opportunity for someone passionate about building and integrating services and systems that ensure the availability, performance, and reliability of our company SaaS environments. You will lead large-scale, cross-functional initiatives, You will work closely with P&E engineering and Cloud teams to design, build, and maintain scalable, resilient infrastructure while championing best practices for automation, monitoring, and incident response. If you're eager to make a significant impact in a fast-paced, high-growth environment, we encourage you to apply.
As a Senior Site Reliability Engineer in our company you will
Lead and groom the team towards technical solutions guided by a strong understanding of the latest and greatest technologies like Kubernetes, Helm, Terraform, and more
Advocate, build, and manage scalable and reliable services and infrastructure to support our company SaaS services
Apply SRE best practices, including incident management, performance and capacity planning, and disaster recovery flows
Drive the reliability, performance, and availability of our SaaS products, ensuring service-level objectives are met or exceeded
Design, develop, and manage large-scale systems with CI/CD in mind, to support multiple production environments and use cases
Tackle large-scale production issues and bring out-of-the-box thinking to the table
Evaluate new cloud-native technologies and vendor products to continuously improve our SaaS offering
Requirements:
5+ years of relevant DevOps or SRE experience in large-scale production environments
2+ years of infrastructure automation, configuration management, or container orchestration using Kubernetes, Docker, Terraform, and Ansible
2+ years in Python or any other advanced programming language
Strong ability to lead, design, and execute cross-organization projects
Experience in managing container and infrastructure orchestration tools (e.g. Kubernetes, Terraform)
Hands-on experience administering public clouds (AWS, GCP, or Azure)
Experience with building CI/CD pipelines for applications and microservices (Jenkins/ArgoCD)
Experience with chaos, alerting & observability tools (Gremlin, PagerDuty, Opsgenie, New Relic, Coralogix).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8255520
סגור
שירות זה פתוח ללקוחות VIP בלבד