דרושים » הנדסה » SRE Team Leader & Escalation Manager 25515

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 18 שעות
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a technically strong and AI-savvy SRE Team Lead & Escalation Manager to own production reliability, incident management, and cross-functional prioritization. This role leads our AI-driven automation strategy, drives self-healing infrastructure development, and sets a new standard for modern reliability engineering.
Key Responsibilities
Lead and mentor the SRE team; improve monitoring, alerting, and observability.
Own production incidents and escalations end-to-end - from mitigation to RCA and corrective action.
Lead the design and development of self-healing systems capable of detecting, diagnosing, and remediating incidents autonomously.
Drive automation of repetitive operational workflows using AI/ML-based solutions to reduce toil and MTTR.
Manage the cross-functional Squad handling customer and production issues; align priorities across Support, QA, R&D, and Sources.
Track key operational metrics and lead long-term reliability improvements.
Requirements:
3-5 years in SRE or Incident Management.
Mandatory: Hands-on experience applied to operational challenges (AIOps, anomaly detection, LLM-based automation, or auto-remediation).
Proven track record of automating workflows and reducing manual toil at scale.
Strong cloud background (AWS/Azure/GCP) and experience with Kubernetes, Docker, and CI/CD.
Proficiency with observability tools (Grafana, Prometheus, ELK) and scripting (Python, Bash).
Demonstrated leadership in high-pressure, cross-functional environments.
Advantages
Background in cybersecurity or SaaS platforms.
Experience with LLMOps, AI agents, or orchestration platforms (e.g., n8n, Temporal).
Key Attributes
Strong ownership, accountability, and composure under pressure.
Passionate about leveraging AI to automate workflows, reduce toil, and accelerate incident resolution.
Visionary about self-healing operations - able to both define the strategy and drive its implementation.
Collaborative leader with the ability to align cross-functional stakeholders.
Technically hands-on systems-level thinker with the drive to engineer scalable, long-term solutions.
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8720950
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
24/05/2026
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for an experienced SRE Team Lead to drive the reliability, observability, and automation practices across our private cloud infrastructure and operations. In this role, you will lead a team of site reliability engineers, own the engineering roadmap for monitoring and automation, and act as a key liaison between development, operations, and platform teams. You bring at least 3-4 years of hands-on people management experience and a deep technical background in SRE or DevOps disciplines.



What will you do?

Leadership & Team Management

Lead, mentor, and grow a team of SREs, providing technical direction, career development guidance, and day-to-day management.

Own the team roadmap for reliability, observability, and automation initiatives - prioritizing work, removing blockers, and driving delivery.

Conduct regular 1:1s, performance reviews, and hiring processes to build and sustain a high-performing team.

Foster a culture of operational excellence, blameless post-mortems, and continuous improvement.

Act as an escalation point for complex incidents and reliability issues, leading post-incident reviews and ensuring follow-through on action items.


Automation & Infrastructure

Design, develop, and maintain automation tools to support infrastructure and operations teams at scale.

Manage pipelines and infrastructure workflows using Jenkins, Ansible, Python, and Bash.

Drive the adoption of infrastructure-as-code practices across the organization.

Collaborate with system engineers to improve scalability, performance, and fault tolerance of critical systems.


Monitoring & Observability

Build and extend monitoring and alerting systems using Grafana, the ELK (Elastic) stack, Zabbix, and custom scripts.

Implement and enforce observability best practices to ensure full visibility into systems, applications, and infrastructure.

Define and track SLIs, SLOs, and error budgets across key services.

Partner with development teams to embed observability earlier in the software development lifecycle.


Database & Platform Support

Support monitoring and infrastructure integration for databases including MongoDB and PostgreSQL.

Maintain documentation and champion knowledge sharing around automation, monitoring, and reliability practices.
Requirements:
What you need:

Experience & Leadership

3-4+ years of experience in a people management or team lead capacity within SRE, DevOps, or infrastructure engineering.

5-8+ years of overall experience in SRE, DevOps, or infrastructure automation roles.

Proven track record of building, coaching, and retaining high-performing engineering teams.

Experience owning an engineering roadmap and driving cross-functional reliability initiatives.



Technical Skills

Strong scripting skills in Python and Bash; comfortable building and maintaining production-grade automation.

Hands-on experience with infrastructure automation tools, particularly Ansible.

Solid experience with monitoring and observability platforms - ELK stack, Grafana, and Zabbix.

Good understanding of CI/CD pipelines and related tooling, including Jenkins.

Familiarity with managing and monitoring MongoDB and PostgreSQL in a production environment.

Comfortable working in Linux-based environments.

Excellent problem-solving skills and strong written and verbal communication.



Ability to support the following:

Experience with cloud providers - AWS, GCP, or Azure.

Exposure to containerization technologies such as Docker and Kubernetes.

Familiarity with infrastructure provisioning using Terraform.

Experience introducing SRE practices (SLOs, error budgets, chaos engineering) at an organizational level.

Exposure and experience with migrating/ building AI tools to improve process.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8662300
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
24/05/2026
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for an experienced SRE Engineer to drive the reliability, observability, and automation practices across our private cloud infrastructure and operations. In this role, you will be in a team of site reliability engineers, own the engineering roadmap for monitoring and automation, and act as a key liaison between development, operations, and platform teams. You bring at least 4+ years of hands-on people management experience and a deep technical background in SRE or DevOps disciplines.

What will you do?

Automation & Infrastructure
- Design, develop, and maintain automation tools to support infrastructure and operations teams at scale.
- Manage pipelines and infrastructure workflows using Jenkins, Ansible, Python, and Bash.
- Drive the adoption of infrastructure-as-code practices across the organization.
- Collaborate with system engineers to improve scalability, performance, and fault tolerance of critical systems.

Monitoring & Observability
- Build and extend monitoring and alerting systems using Grafana, the ELK (Elastic) stack, Zabbix, and custom scripts.
- Implement and enforce observability best practices to ensure full visibility into systems, applications, and infrastructure.
- Define and track SLIs, SLOs, and error budgets across key services.
- Partner with development teams to embed observability earlier in the software development lifecycle.

Database & Platform Support
- Support monitoring and infrastructure integration for databases including MongoDB and PostgreSQL.
- Maintain documentation and champion knowledge sharing around automation, monitoring, and reliability practices.
Requirements:
What you need:

4+ years of overall experience in SRE, DevOps, or infrastructure automation roles.

Strong scripting skills in Python and Bash; comfortable building and maintaining production-grade automation.

Hands-on experience with infrastructure automation tools, particularly Ansible.

Solid experience with monitoring and observability platforms - ELK stack, Grafana, and Zabbix.

Good understanding of CI/CD pipelines and related tooling, including Jenkins.

Familiarity with managing and monitoring MongoDB and PostgreSQL in a production environment.

Comfortable working in Linux-based environments.

Excellent problem-solving skills and strong written and verbal communication.


Ability to support the following:
Experience with cloud providers - AWS, GCP, or Azure.
Exposure to containerization technologies such as Docker and Kubernetes.
Familiarity with infrastructure provisioning using Terraform.
Experience introducing SRE practices (SLOs, error budgets, chaos engineering) at an organizational level.
Exposure and experience with migrating/ building AI tools to improve process.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8662378
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for an experienced DevOps Manager to lead and grow our DevOps function. This role combines people leadership, technical direction, and ownership of the infrastructure, tooling, automation, and operational practices that power Stamplis production environment.

You will manage DevOps Engineers, hire and onboard an additional team member, and drive the strategy, execution, and evolution of Stamplis internal DevOps platform. You will work closely with Engineering, Data, AI, Product, and Security teams to improve developer experience, enable fast and safe delivery, and keep production stable.

This role requires a strong hands-on DevOps / Platform Engineering background, combined with proven leadership capabilities. If you believe DevOps should operate as a self-service platform, love automation, and think in systems and end-to-end flows, keep reading.


What You Will Do
Lead, mentor, and manage a DevOps team, fostering ownership, excellence, collaboration, and continuous improvement.
Own the DevOps roadmap, priorities, execution, and delivery, aligned with Engineering, Data, AI, Security, and business goals.
Provide technical and architectural guidance across infrastructure, CI/CD, cloud operations, automation, observability, security, and platform engineering initiatives.
Build and evolve our internal DevOps platform, creating self-service capabilities, internal services, and golden paths that scale across teams.
Own CI/CD end-to-end, including Jenkins, GitHub, and GitHub Actions pipelines from commit to production.
Oversee and evolve our AWS stack, including ECS, EKS, Lambda, DynamoDB, Redshift, S3, DocumentDB, networking, IAM, observability, and deployment patterns.
Enable MLOps and data workflows using tools such as Airflow, MLflow, and Jupyter Notebooks.
Drive an automation-first mindset through Infrastructure-as-Code, scripting, internal tooling, and reusable components.
Lead cost optimization efforts with a FinOps mindset, including visibility, budgets, rightsizing, and workload efficiency.
Ensure security is embedded into DevOps practices, including least privilege, secrets management, vulnerability scanning, secure SDLC, and incident readiness.
Leverage AI-assisted development tools such as Cursor, GitHub Copilot, Claude Code, and ChatGPT Enterprise to improve team productivity and delivery speed.
Collaborate closely with cross-functional stakeholders to unblock delivery, improve developer experience, and maintain production stability.
Requirements:
7+ years of experience in DevOps, SRE, Platform Engineering, or Infrastructure Engineering, ideally in a SaaS production environment.
2+ years of managerial or team leadership experience, including mentoring engineers, driving execution, and owning team delivery.
Strong hands-on technical background with the ability to guide architecture, review technical decisions, and stay close to execution when needed.
Strong development background: you write code comfortably, build internal tools, and approach infrastructure work with software engineering discipline.
Proven experience with AWS services such as ECS, EKS, Lambda, DynamoDB, Redshift, S3, and DocumentDB.
Strong CI/CD experience with Jenkins, GitHub, and GitHub Actions.
Experience with Infrastructure-as-Code, automation, observability, cloud networking, IAM, and production operations.
Experience with ML/data tooling such as Airflow, MLflow, and Jupyter Notebooks - an advantage.
Hands-on experience with AI-assisted development tools such as Cursor, GitHub Copilot, Claude Code, or similar.
Demonstrated experience in cost optimization, cloud security, secure SDLC, and operational security.
A wide-angle thinker who sees the whole system, understands dependencies, and builds solutions that scale across teams.
Strong people leadership, communication, collaboration, and prioritization skills.
Strong communication skills in English. Hebrew is an advantage.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8709491
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
As a Senior IT SRE Engineer, you will be a key player in ensuring the reliability, scalability, and performance of our critical IT infrastructure. You will leverage SRE principles and an automation-first mindset to build and maintain resilient hybrid cloud environments. This role is ideal for a candidate who thrives in a fast-paced, innovative setting and is passionate about solving complex challenges with cutting-edge technology.
Key Responsibilities
Provision, configure, and support resilient hybrid cloud deployment architectures using an Infrastructure-as-Code framework.
Proactively collaborate with development teams to ensure new applications are production-ready, scalable, and reliable from inception.
Develop and maintain tools and frameworks to automate operational tasks, including deployment, monitoring, and recovery.
Conduct thorough root cause analysis of production issues and implement preventative measures to improve system resilience, demonstrating strong problem-solving skills.
Manage CI/CD platforms, Linux infrastructure, and contribute to capacity planning and operational runbooks.
Design and implement proactive service monitoring, alerting, and trend analysis to maintain service availability and performance SLAs.
Participate in an on-call rotation to support critical applications and services, responding to and resolving incidents efficiently.
Contribute to comprehensive documentation related to infrastructure design, deployment, and operational procedures.
Requirements:
Your Expereience:
Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.
6+ years of Devops engineering experience on mission-critical, enterprise-level systems in a hybrid (both cloud and on-prem) environment.
3+ years of hands-on experience with cloud environments, preferably Google Cloud Platform (GCP).
Expertise in configuration management and Infrastructure-as-Code using frameworks such as Terraform and Ansible.
Strong programming/scripting knowledge in languages like Python, Bash, or Go for infrastructure automation.
Demonstrated experience with CI/CD pipelines (e.g., GitHub, Jenkins, Artifactory) and a strong foundation in Linux/Unix administration.
Preferred Qualifications
Experience with containerization and orchestration technologies, particularly Kubernetes.
Hands-on experience with monitoring and observability tools such as Datadog, Grafana, or Prometheus.
Understanding of networking principles including firewalls, load balancers, and complex network designs.
A curious and positive mindset with a passion for applied learning and challenging existing processes for continuous improvement.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8713868
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
we are looking for a passionate and detail-oriented Product Owner & Engineer to join our team. In this role, you will collaborate closely with development teams to define new features, contribute to functional requirement documentation (FRD), and ensure seamless UX design and implementation.
A core part of this role is maintaining close alignment with customers. You will actively engage with customers, gather feedback, understand real-world use cases, and translate their needs into clear product requirements. You will work closely with Support and QA to ensure our testing strategy and supportability reflect actual customer environments and workflows.
Beyond core development, you will assist in critical escalations, manage complex or high-visibility installations, and develop tools that enhance the overall user experience. You will leverage telemetry data, production insights, and direct customer feedback to continuously refine and improve our products, ensuring they deliver measurable value in real-world deployments.
Key Responsibilities
Collaborate closely with Product and Engineering teams to define, refine, and prioritize features that directly address real customer needs and business impact.
Translate customer requirements and field insights into clear, structured Functional Requirement Documents (FRDs), while actively contributing to UX discussions to ensure intuitive and seamless user experiences.
Work closely with QA and Support to align testing strategies and troubleshooting workflows with real-world customer environments, ensuring reliability, operability, and supportability at scale.
Serve as a technical and product focal point during critical customer escalations and high-visibility deployments, ensuring timely resolution and long-term improvements
Develop tools and scripts to enhance user experience and operational efficiency.
Leverage telemetry, usage analytics, and direct customer feedback to drive data-informed decisions and continuously improve product performance and adoption.
Proactively identify risks, gaps, and cross-team dependencies, removing roadblocks to ensure successful delivery and measurable customer outcomes.
Requirements:
Proven experience in software development, technical product management, or advanced technical support within complex infrastructure environments.
Demonstrated experience with storage technologies and protocols such as NFS, S3 (object storage), SMB, and familiarity with enterprise storage architectures and distributed file systems.
Strong hands-on experience with Linux, Python, and networking concepts (TCP/IP, routing, switching, large-scale deployments).
Ability to analyze and solve complex technical challenges in scale-out Linux environments, HPC workloads, AI training infrastructures, and advanced networking architectures.
Experience collaborating across cross-functional teams - Engineering, QA, and Support - using industry-standard tools such as Jira, Slack, GitLab, Git, unit testing frameworks, and QTest.
Strong analytical skills with the ability to leverage telemetry, usage data, and customer insights to guide product decisions and prioritize effectively.
Experience working with observability and data platforms, including time-series databases (e.g., Prometheus), multi-tenant log aggregation systems, Slack and Salesforce integrations, and AI-driven automation workflows - a strong advantage.
Familiarity with scripting and automation using Python, REST APIs, OpenTelemetry (OTEL), and Bash to improve operational efficiency and supportability.
Excellent communication skills, with the ability to bridge technical depth and customer-facing clarity.
A proactive, customer-first mindset with strong ownership and accountability.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8680897
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
Were looking for people who are relentlessly curious and committed to continuous learning. AI is reshaping every function across our business, and we enable every team member, regardless of role or level, to build fluency in AI tools and concepts. Those who thrive here actively seek out new solutions, experiment thoughtfully, and apply what they learn to drive better, faster, smarter outcomes.
As a Senior Staff Software Engineer on the Agent Platform Detection team, you will be the technical visionary responsible for defining and evolving the architecture of our detection, triage, and response platform. You will lead the design and execution of backend systems and SaaS services that power alert propagation, alert lifecycle management, and response capabilities at extraordinary scale. Your technical leadership will bridge the gap between long-term architectural strategy and high-velocity product delivery, influencing how detection and response workflows are built and operated across our company's platform.
What Will You Do?
Primary responsibilities include:
Architectural Vision: Define and drive the long-term technical roadmap for the Agent Platform Detection domain, owning the architecture of backend systems that power alert ingestion, triage, lifecycle management, and response workflows across our company's SaaS platform.
System Design at Scale: Lead the design and implementation of highly available, cloud-native services that process billions of security events daily, ensuring alerts move reliably and efficiently from ingestion through investigation and action for the world's largest enterprises.
Technical Leadership & Influence: Act as a key stakeholder in cross-organizational architectural reviews, ensuring the Detection platform provides the extensibility, reliability, and observability that other product teams depend on. Drive alignment across engineering, product, and design on the technical direction of the team's feature area.
Full-Stack Depth: While backend-oriented, bring practical depth across the stack - owning and evolving frontend components in the company console built in React and TypeScript, and ensuring complex detection and response workflows are exposed in a clean, usable, and performant way.
Operational Excellence: Champion engineering best practices across the team, including advanced observability, alert health reporting, performance optimization, and the continuous improvement of runbooks, diagnostics, and incident response processes.
Mentorship & Growth: Elevate the engineering bar by mentoring Staff and Senior engineers, fostering a culture of technical excellence, accountability, and proactive problem-solving across the team.
דרישות:
Ideal candidates will have:
Extensive Backend Expertise: 12+ years of professional experience in backend development, with deep production-level mastery of Golang, Java, Python, or similar languages, and a strong track record of building and operating high-scale distributed services.
Platform Thinking: Proven experience building and evolving platforms - not just features - with a focus on API design (gRPC, REST), service boundaries, multi-tenancy, and shared infrastructure in a high-scale SaaS environment.
Full-Stack Capability: Practical frontend experience with React and TypeScript, with the ability to own production UI components and lead the frontend direction for a backend-heavy product area.
Data & Distributed Systems: Expert-level knowledge of RDBMS (PostgreSQL), query optimization, and extensive experience with high-throughput messaging systems such as Kafka and distributed caches such as Redis.
Cloud-Native Proficiency: Deep experience with AWS/GCP, Kubernetes, Docker, and modern CI/CD patterns in a hyper-scale SaaS environment.
Strategic Communication: Ability to articulate complex technical trade-offs to both technical and non-technical stakeholders, including Product Management, Directors, and VPs.
Cybersecurity Context: (Bonus) Familiar#ENG המשרה מיועדת לנשים ולגברים כאחד.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8713782
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
22/06/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
Your Career:
Own and continuously improve AWS production infrastructure for scalability, reliability, security, performance, and cost.
Run and evolve Kubernetes environments that support fast, safe product delivery.
Drive developer velocity and production safety through better CI/CD pipelines, release workflows, deployment visibility, and GitOps practices.
Improve observability and incident response - reduce alert noise and raise signal quality.
Design and ship AI-assisted operational agents that change how engineers work - triaging monitoring alerts, summarizing incidents, proposing fixes, onboarding new services, answering questions and requests. This is a core part of the role, not a side project.
Build automation and self-service tooling that removes manual work from provisioning, monitoring, incident response, and developer workflows.
Analyze operational data across incidents, alerts, deployments, infra health, and cost to find reliability gaps, inefficiencies, and automation opportunities.
Partner with engineering, security, product, and leadership to remove bottlenecks and support safe production growth.
Evaluate and introduce new tools and AI-assisted approaches, balancing innovation with reliability, cost, and operational simplicity.
Your Impact:
You'll help scale production systems, improve deployment velocity and reliability, reduce operational overhead, and build automation and AI workflows that help engineering teams move faster and operate more efficiently.
This role is a strong fit for someone who enjoys ownership, collaboration, and operational innovation.
Requirements:
Your Experience:
4+ years operating production infrastructure in AWS.
Deep hands-on experience with Kubernetes, Helm, ArgoCD, Terraform, and CI/CD.
Strong experience with observability and alerting in Datadog or comparable platforms.
Solid grounding in Linux, networking, cloud security, and reliability best practices.
Strong scripting skills in Python and Bash.
Proven ability to own platform projects end-to-end, from design through production operation and ongoing improvement.
Strong troubleshooting across distributed systems, Kubernetes, CI/CD, and live incidents.
Collaborative mindset - comfortable working across engineering, security, product, and leadership.
Comfort in a fast-paced, high-ownership environment where priorities shift but production quality doesn't.
Genuine interest in applying AI, automation, and intelligent workflows to operational work.
Key qualities
Ownership-driven - You take responsibility for the systems you build and operate, from design through production support and continuous improvement.
Collaboration - You work effectively across engineering, security, product, and leadership to align priorities and drive shared outcomes.
Developer experience focus - You are committed to reducing friction for engineering teams through thoughtful automation, self-service workflows, and reliable internal tooling.
Innovation balanced with pragmatism - You actively explore new approaches, particularly in AI-assisted operations, while weighing them against reliability, maintainability, and operational simplicity.
Security mindset - You design and build with least privilege, auditability, and production safety as foundational principles rather than afterthoughts.
Clear communication - You articulate infrastructure, reliability, cost, and security tradeoffs precisely to both technical and non-technical stakeholders.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8705246
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
01/06/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
As a Senior DevOps Engineer within the XSPM team, you will be a critical, go-to technical expert responsible for the health, performance, and evolution of our database and infrastructure systems. When production databases degrade or behave unexpectedly, you are the person who dives deep, investigating root causes hands-on, understanding the underlying mechanics of the problem, and designing lasting solutions. Your mastery of database systems makes you the authority the team relies on to diagnose complex performance issues, architect better data solutions, and ensure our infrastructure scales with confidence.

Beyond databases, you will drive our DevOps practices end-to-end - CI/CD pipelines, infrastructure automation, and operational reliability across the XSPM platform. This is a high-impact, highly visible role at the intersection of database engineering and DevOps, where your expertise directly shapes how the team delivers and operates at scale.

We're a highly collaborative, friendly, inclusive and diverse group that prizes collaboration over competition. We provide opportunities to learn new skills, mentor fellow engineers, and contribute to the direction of both the team and the products for which we're responsible. We work in a distributed, high-trust environment where you manage your own time and have the flexibility to balance your work and personal life.

What You Will Do:

Serve as the team's database expert, the first person to investigate, diagnose, and resolve complex performance problems across our production database systems (MongoDB, OpenSearch, PostgreSQL, Cassandra).

Perform deep-dive root cause analysis on database performance issues, understanding query execution internals, resource consumption patterns, cluster behavior, and system-level interactions to identify the real source of problems, not just symptoms.

Design and propose better database architectures and solutions, recommending when to re-architect data models, migrate workloads, introduce new technologies, or redesign how services interact with their data layer.

* You will put in every effort within the team to ensure the data architecture is well designed.

Own capacity planning, scaling strategies, and high-availability designs for database clusters, ensuring systems are built to handle the team's growth trajectory.

Act as the bridge between development and infrastructure, advising engineers on how their application patterns impact database performance and guiding them toward sustainable solutions.

Build and maintain CI/CD pipelines, infrastructure-as-code (Terraform, Helm, Kubernetes manifests), and automated deployment workflows for the xspm team's services.

Design and manage observability stacks, dashboards, alerting rules, and SLOs, to maintain best-in-class availability for critical data pipelines and services.

Drive infrastructure automation to reduce operational toil, including automated scaling, self-healing systems, and configuration management.

Participate in on-call rotations, incident response, and post-incident reviews, driving root-cause analysis and long-term reliability improvements.

Evaluate and adopt new database technologies and infrastructure tooling that align with the team's evolving data architecture needs.
Requirements:
7+ years experience in DevOps, SRE, DBA, or infrastructure engineering, with significant hands-on responsibility for production database systems at scale.

Expert-level knowledge of a common DB such as MongoDB, Opensearch, Postgress, deep understanding of its internals, performance characteristics, replication, sharding, and the ability to diagnose and solve complex issues from first principles.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8675475
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
22/06/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
Join a team of senior engineers operating in a large-scale, multi-cloud production environment supporting tens of thousands of enterprise customers worldwide. This is not a typical SRE role - youll work at the core of a complex, high-impact system alongside experienced DevOps professionals in a fast-paced, cybersecurity-focused organization.
Your Impact:
Own and operate large-scale, global production environments across multiple cloud providers (GCP, AWS, Azure)
Actively monitor, investigate, and resolve incidents triggered by automated alerting systems (PagerDuty / Incident Response)
Drive end-to-end troubleshooting across complex, distributed systems with high context switching
Design, deploy, and improve monitoring and observability systems (e.g., Prometheus, Grafana) - not just react to alerts
Collaborate closely with internal teams (CX, CS, Engineering) to ensure system reliability and performance
Work hands-on with modern DevOps and infrastructure tools including Kubernetes, Terraform, CI/CD pipelines, and GitOps workflows
Develop and maintain automation and tooling (primarily in Python)
Gain deep understanding of system architecture and interconnected services
Contribute to a culture of operational excellence in a high-scale, high-availability environment
On call responsibilities:
Daytime hours (12:00-20:00)
Occasional weekends and holidays (rotation-based).
Requirements:
Your experience:
5+ years of experience in SRE roles in production environments at scale
Strong hands-on experience with Kubernetes and Terraform
Strong hands-on experience with at least one major cloud platform (GCP or AWS required)
Experience building and configuring monitoring systems (e.g., Prometheus, Grafana)
Familiarity with CI/CD and GitOps tools (GitLab CI, GitHub Actions, Jenkins, Flux)
Proficiency in Python for scripting and automation
Strong troubleshooting and problem-solving skills with a passion for incident handling
Ability to work in fast-paced environments with high context switching
Highly responsive, proactive, and ownership-driven
Strong collaboration and communication skills
Curious mindset and eagerness to learn.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8704900
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
21/06/2026
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
As a Senior IT SRE Engineer, you will be a key player in ensuring the reliability, scalability, and performance of our critical IT infrastructure. You will leverage SRE principles and an automation-first mindset to build and maintain resilient hybrid cloud environments. This role is ideal for a candidate who thrives in a fast-paced, innovative setting and is passionate about solving complex challenges with cutting-edge technology.
Key Responsibilities
Provision, configure, and support resilient hybrid cloud deployment architectures using an Infrastructure-as-Code framework.
Proactively collaborate with development teams to ensure new applications are production-ready, scalable, and reliable from inception.
Develop and maintain tools and frameworks to automate operational tasks, including deployment, monitoring, and recovery.
Conduct thorough root cause analysis of production issues and implement preventative measures to improve system resilience, demonstrating strong problem-solving skills.
Manage CI/CD platforms, Linux infrastructure, and contribute to capacity planning and operational runbooks.
Design and implement proactive service monitoring, alerting, and trend analysis to maintain service availability and performance SLAs.
Participate in an on-call rotation to support critical applications and services, responding to and resolving incidents efficiently.
Contribute to comprehensive documentation related to infrastructure design, deployment, and operational procedures.
Requirements:
Your Expereience:
Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.
6+ years of Devops engineering experience on mission-critical, enterprise-level systems in a hybrid (both cloud and on-prem) environment.
3+ years of hands-on experience with cloud environments, preferably Google Cloud Platform (GCP).
Expertise in configuration management and Infrastructure-as-Code using frameworks such as Terraform and Ansible.
Strong programming/scripting knowledge in languages like Python, Bash, or Go for infrastructure automation.
Demonstrated experience with CI/CD pipelines (e.g., GitHub, Jenkins, Artifactory) and a strong foundation in Linux/Unix administration.
Preferred Qualifications
Experience with containerization and orchestration technologies, particularly Kubernetes.
Hands-on experience with monitoring and observability tools such as Datadog, Grafana, or Prometheus.
Understanding of networking principles including firewalls, load balancers, and complex network designs.
A curious and positive mindset with a passion for applied learning and challenging existing processes for continuous improvement.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8703154
סגור
שירות זה פתוח ללקוחות VIP בלבד