דרושים » מחשבים ורשתות » Site Reliability Team Leader

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
24/05/2026
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for an experienced SRE Team Lead to drive the reliability, observability, and automation practices across our private cloud infrastructure and operations. In this role, you will lead a team of site reliability engineers, own the engineering roadmap for monitoring and automation, and act as a key liaison between development, operations, and platform teams. You bring at least 3-4 years of hands-on people management experience and a deep technical background in SRE or DevOps disciplines.



What will you do?

Leadership & Team Management

Lead, mentor, and grow a team of SREs, providing technical direction, career development guidance, and day-to-day management.

Own the team roadmap for reliability, observability, and automation initiatives - prioritizing work, removing blockers, and driving delivery.

Conduct regular 1:1s, performance reviews, and hiring processes to build and sustain a high-performing team.

Foster a culture of operational excellence, blameless post-mortems, and continuous improvement.

Act as an escalation point for complex incidents and reliability issues, leading post-incident reviews and ensuring follow-through on action items.


Automation & Infrastructure

Design, develop, and maintain automation tools to support infrastructure and operations teams at scale.

Manage pipelines and infrastructure workflows using Jenkins, Ansible, Python, and Bash.

Drive the adoption of infrastructure-as-code practices across the organization.

Collaborate with system engineers to improve scalability, performance, and fault tolerance of critical systems.


Monitoring & Observability

Build and extend monitoring and alerting systems using Grafana, the ELK (Elastic) stack, Zabbix, and custom scripts.

Implement and enforce observability best practices to ensure full visibility into systems, applications, and infrastructure.

Define and track SLIs, SLOs, and error budgets across key services.

Partner with development teams to embed observability earlier in the software development lifecycle.


Database & Platform Support

Support monitoring and infrastructure integration for databases including MongoDB and PostgreSQL.

Maintain documentation and champion knowledge sharing around automation, monitoring, and reliability practices.
Requirements:
What you need:

Experience & Leadership

3-4+ years of experience in a people management or team lead capacity within SRE, DevOps, or infrastructure engineering.

5-8+ years of overall experience in SRE, DevOps, or infrastructure automation roles.

Proven track record of building, coaching, and retaining high-performing engineering teams.

Experience owning an engineering roadmap and driving cross-functional reliability initiatives.



Technical Skills

Strong scripting skills in Python and Bash; comfortable building and maintaining production-grade automation.

Hands-on experience with infrastructure automation tools, particularly Ansible.

Solid experience with monitoring and observability platforms - ELK stack, Grafana, and Zabbix.

Good understanding of CI/CD pipelines and related tooling, including Jenkins.

Familiarity with managing and monitoring MongoDB and PostgreSQL in a production environment.

Comfortable working in Linux-based environments.

Excellent problem-solving skills and strong written and verbal communication.



Ability to support the following:

Experience with cloud providers - AWS, GCP, or Azure.

Exposure to containerization technologies such as Docker and Kubernetes.

Familiarity with infrastructure provisioning using Terraform.

Experience introducing SRE practices (SLOs, error budgets, chaos engineering) at an organizational level.

Exposure and experience with migrating/ building AI tools to improve process.
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8662300
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
24/05/2026
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for an experienced SRE Engineer to drive the reliability, observability, and automation practices across our private cloud infrastructure and operations. In this role, you will be in a team of site reliability engineers, own the engineering roadmap for monitoring and automation, and act as a key liaison between development, operations, and platform teams. You bring at least 4+ years of hands-on people management experience and a deep technical background in SRE or DevOps disciplines.

What will you do?

Automation & Infrastructure
- Design, develop, and maintain automation tools to support infrastructure and operations teams at scale.
- Manage pipelines and infrastructure workflows using Jenkins, Ansible, Python, and Bash.
- Drive the adoption of infrastructure-as-code practices across the organization.
- Collaborate with system engineers to improve scalability, performance, and fault tolerance of critical systems.

Monitoring & Observability
- Build and extend monitoring and alerting systems using Grafana, the ELK (Elastic) stack, Zabbix, and custom scripts.
- Implement and enforce observability best practices to ensure full visibility into systems, applications, and infrastructure.
- Define and track SLIs, SLOs, and error budgets across key services.
- Partner with development teams to embed observability earlier in the software development lifecycle.

Database & Platform Support
- Support monitoring and infrastructure integration for databases including MongoDB and PostgreSQL.
- Maintain documentation and champion knowledge sharing around automation, monitoring, and reliability practices.
Requirements:
What you need:

4+ years of overall experience in SRE, DevOps, or infrastructure automation roles.

Strong scripting skills in Python and Bash; comfortable building and maintaining production-grade automation.

Hands-on experience with infrastructure automation tools, particularly Ansible.

Solid experience with monitoring and observability platforms - ELK stack, Grafana, and Zabbix.

Good understanding of CI/CD pipelines and related tooling, including Jenkins.

Familiarity with managing and monitoring MongoDB and PostgreSQL in a production environment.

Comfortable working in Linux-based environments.

Excellent problem-solving skills and strong written and verbal communication.


Ability to support the following:
Experience with cloud providers - AWS, GCP, or Azure.
Exposure to containerization technologies such as Docker and Kubernetes.
Familiarity with infrastructure provisioning using Terraform.
Experience introducing SRE practices (SLOs, error budgets, chaos engineering) at an organizational level.
Exposure and experience with migrating/ building AI tools to improve process.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8662378
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
01/06/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
Required Manager, Site Reliability Engineering
Location:Tel-Aviv, Israel
Were committed to creating the best experience for business travelers, ensuring that our systems are always reliable, scalable, and efficient. As we continue to grow, were looking for a Site Reliability Engineering (SRE) Manager to join our team in headquarters based out of Tel-Aviv. In this role, you will lead a team of SREs, drive innovation in infrastructure design and automation, and ensure our systems run seamlessly at scale, serving thousands of travelers every day.
What Youll Do
Lead & Mentor the SRE Team: Guide and develop a high-performing team of SREs, fostering a culture of collaboration, reliability, and continuous improvement.
Drive Infrastructure Reliability & Automation: Collaborate with Engineering and Product teams to design and implement scalable, fault-tolerant systems. Leverage IaC tools (e.g., Terraform, CloudFormation) and microservices architectures to automate and improve infrastructure.
Incident Management: Improve incident response processes, reduce MTTR, and proactively mitigate risks. Apply resiliency patterns to ensure systems are fault-tolerant and highly available.
Define & Measure SLOs: Develop service-level objectives (SLOs) and KPIs to track and improve system reliability, using tools like NewRelic or DataDog for observability.
24x7 Production Support: Ensure system availability in a 24x7 environment, applying expertise in AWS (e.g., ECS, Lambda, DynamoDB) and database management for optimal performance.
Optimize CI/CD Pipelines: Automate and streamline deployment workflows using tools like Jenkins or GitHub Actions to ensure faster and more reliable deployments.
Resource Management: Manage team resources, including capacity planning, hiring, and upskilling, to meet evolving business needs.
Requirements:
8+ years in Site Reliability Engineering, DevOps, or Infrastructure roles, with at least 3 years in a leadership position.
Proven ability to lead and mentor teams, fostering a culture of collaboration and reliability.
Hands-on experience with AWS cloud technologies, Infrastructure as Code (Terraform/CloudFormation), microservices architectures, deployment automation (Jenkins/GitHub Actions), and observability tools (NewRelic/DataDog).
Strong background in designing scalable, fault-tolerant systems, improving incident response, and driving operational improvements.
Excellent interpersonal and communication skills, with the ability to work effectively across cross-functional teams.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8675381
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking a skilled and motivated DevOps engineer with deep familiarity in the streaming ecosystem to join our elite infrastructure team. If you're excited by the challenge of operating mission-critical systems at scale and optimizing the developer experience through automation and tooling, wed love to hear from you.

What you will do:

Automate Deployment and Operation
Oversee deployment of Kafka and RabbitMQ clusters (including Confluent Cloud & CFK). Build automation pipelines to ensure repeatability and resiliency across environments.

Monitor and Support Production Systems
Own production stability of global Kafka clusters. Handle on-call rotations, incident management, troubleshooting, and scaling challenges.

Improve Infrastructure Observability
Build and maintain observability systems: dashboards, alerting pipelines, metrics collection (Prometheus, Grafana, etc.).

Optimize System Performance
Collaborate with peers on benchmarking and optimization initiatives. Work on tuning Kafka brokers, cluster configurations, and runtime parameters.

Provide Developer Support and Training (Infra-focused)
Help developers configure topics, quotas, and consumers appropriately. Train service owners to interpret monitoring data and avoid pitfalls.

Develop and Maintain Infrastructure
Contribute to building infrastructure tools and scripts (IaC, Helm charts, etc.) that make provisioning and managing clusters reliable and efficient.

Secure Infrastructure Access
Configure and maintain secure access patterns across streaming infrastructure, ensuring proper authentication and role-based access controls are enforced for both developers and services.
Requirements:
What we expect:

8+ years of experience in DevOps, SRE, or Infrastructure Engineering roles.

Deep hands-on Kafka experience, including deploying, maintaining, scaling, and monitoring clusters.

Experience with RabbitMQ.

Extensive experience with Docker, Kubernetes, Helm, and GitOps-style deployments.

Infrastructure as Code experience (Terraform, Pulumi, etc.).

Strong skills in scripting and automation (Python, Bash, etc.).

Familiarity with Confluent Cloud, Confluent for Kubernetes, and similar tools.

Solid understanding of authentication and authorization mechanisms in distributed systems.

Production support mindset - with proven troubleshooting and incident resolution history.

Collaboration and communication skills - especially with dev teams depending on platform support.

Experience with Istio Service Mesh (bonus).

Experience with GovCloud (bonus).


Bonus Qualities:

Mentorship and leadership experience in infrastructure or SRE teams.

Contributions to automation or monitoring open-source tooling.

Active participant in SRE or DevOps communities.

Conference speaker or internal tech trainer.

Technical writing about infrastructure automation or reliability.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8695015
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
2 ימים
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
As a Senior IT SRE Engineer, you will be a key player in ensuring the reliability, scalability, and performance of our critical IT infrastructure. You will leverage SRE principles and an automation-first mindset to build and maintain resilient hybrid cloud environments. This role is ideal for a candidate who thrives in a fast-paced, innovative setting and is passionate about solving complex challenges with cutting-edge technology.
Key Responsibilities
Provision, configure, and support resilient hybrid cloud deployment architectures using an Infrastructure-as-Code framework.
Proactively collaborate with development teams to ensure new applications are production-ready, scalable, and reliable from inception.
Develop and maintain tools and frameworks to automate operational tasks, including deployment, monitoring, and recovery.
Conduct thorough root cause analysis of production issues and implement preventative measures to improve system resilience, demonstrating strong problem-solving skills.
Manage CI/CD platforms, Linux infrastructure, and contribute to capacity planning and operational runbooks.
Design and implement proactive service monitoring, alerting, and trend analysis to maintain service availability and performance SLAs.
Participate in an on-call rotation to support critical applications and services, responding to and resolving incidents efficiently.
Contribute to comprehensive documentation related to infrastructure design, deployment, and operational procedures.
Requirements:
Your Expereience:
Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.
6+ years of Devops engineering experience on mission-critical, enterprise-level systems in a hybrid (both cloud and on-prem) environment.
3+ years of hands-on experience with cloud environments, preferably Google Cloud Platform (GCP).
Expertise in configuration management and Infrastructure-as-Code using frameworks such as Terraform and Ansible.
Strong programming/scripting knowledge in languages like Python, Bash, or Go for infrastructure automation.
Demonstrated experience with CI/CD pipelines (e.g., GitHub, Jenkins, Artifactory) and a strong foundation in Linux/Unix administration.
Preferred Qualifications
Experience with containerization and orchestration technologies, particularly Kubernetes.
Hands-on experience with monitoring and observability tools such as Datadog, Grafana, or Prometheus.
Understanding of networking principles including firewalls, load balancers, and complex network designs.
A curious and positive mindset with a passion for applied learning and challenging existing processes for continuous improvement.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8703154
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
Required Al Infrastructure & Reliability Engineer
What this role is really about
Youll join a 3-person platform team within our Business Technology group -owning the internal infrastructure that our AI platform and its users depend on. This isnt a product engineering role, and it isnt ticket work or babysitting pipelines someone else built. Youre building and operating the internal foundation that the company runs on. The work covers the full stack of platform engineering: core cloud infrastructure (AWS, Kubernetes, IaC), CI/CD pipelines, AI-driven infrastructure components, and the SRE and observability practice that keeps it all honest -metrics, alerting, incident response, and reliability standards. As our AI capabilities grow, so does the complexity underneath them, and staying ahead of that is central to the role. If you treat infrastructure as a product -reusable, automated, observable, and built to last -this is your kind of role.
Job responsibilities
DevOps & AI-Driven Infrastructure - own CI/CD, deployment processes, and release reliability. Build and operate cloud infrastructure that is automated, intelligent, and continuously self-improving - not just managed.
Design and build our Terraform repository and IaC pipeline from scratch -AI-assisted generation, drift detection, and policy enforcement built in.
Build AI-driven GitHub Actions pipelines -automated code review, risk assessment, and intelligent deployment decisions.
Manage Kubernetes workloads across AWS accounts -zero downtime, fully automated, nothing left behind.
Embed AI into the operational layer -proactive drift detection, automated remediation, and intelligent scaling toward a self-healing runtime.
Reliability & SRE -improve uptime, resilience, and incident response.
Define and enforce SLOs/SLIs, error budgets, and on-call practices.
Lead incident response, postmortems, and systemic reliability improvements.
Own AI-specific reliability: model latency SLOs, token quota monitoring, rate limit handling, fallback and retry strategies, and cost-per-request alerting.
Observability & Telemetry - increase visibility, reduce noise, improve troubleshooting.
Establish and continuously evolve the observability stack: metrics, logs, distributed tracing, and alerting tuned for both application and AI workloads.
AI / LLM Operations- bringing AI systems to production and operating them at scale, with a focus on reliability, performance, and trust.
Own the AI infrastructure layer: rate limits, quota management, latency SLOs, and fallback strategies (retries, circuit breakers).
Operate LLM APIs in production with resilience and cost attribution per team/model.
Requirements:
2-4 years Hands-on DevOps, SRE, or infrastructure engineering in production SaaS environments.
Strong AWS experience: multi-account architecture, cross-account IAM, serverless and event-driven services (Lambda, SQS, SNS, EventBridge), and EKS cluster management.
Proven Kubernetes experience in production, including cross-account migrations and stateful workload management.
Proficiency with Terraform - repository structure design, module architecture, and CI/CD pipeline implementation.
Hands-on experience building and maintaining GitHub Actions pipelines for end-to-end CI/CD workflows.
Working Python proficiency for scripting, internal tooling, and workflow automation.
Practical experience implementing observability stacks from scratch: metrics, logging, distributed tracing, and alerting.
Experience owning reliability practices: SLOs, incident response, and postmortem culture.
Nice to have
Hands-on experience operating LLM APIs in production: rate-limit and quota management, cost attribution per team/model, latency monitoring, and resilience patterns (retries, fallbacks, circuit breakers).
FinOps experience across cloud, AI, and observability spend.
Experience introducing self-healing or auto-remediation patterns in production.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8659781
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
01/06/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
As a Senior DevOps Engineer within the XSPM team, you will be a critical, go-to technical expert responsible for the health, performance, and evolution of our database and infrastructure systems. When production databases degrade or behave unexpectedly, you are the person who dives deep, investigating root causes hands-on, understanding the underlying mechanics of the problem, and designing lasting solutions. Your mastery of database systems makes you the authority the team relies on to diagnose complex performance issues, architect better data solutions, and ensure our infrastructure scales with confidence.

Beyond databases, you will drive our DevOps practices end-to-end - CI/CD pipelines, infrastructure automation, and operational reliability across the XSPM platform. This is a high-impact, highly visible role at the intersection of database engineering and DevOps, where your expertise directly shapes how the team delivers and operates at scale.

We're a highly collaborative, friendly, inclusive and diverse group that prizes collaboration over competition. We provide opportunities to learn new skills, mentor fellow engineers, and contribute to the direction of both the team and the products for which we're responsible. We work in a distributed, high-trust environment where you manage your own time and have the flexibility to balance your work and personal life.

What You Will Do:

Serve as the team's database expert, the first person to investigate, diagnose, and resolve complex performance problems across our production database systems (MongoDB, OpenSearch, PostgreSQL, Cassandra).

Perform deep-dive root cause analysis on database performance issues, understanding query execution internals, resource consumption patterns, cluster behavior, and system-level interactions to identify the real source of problems, not just symptoms.

Design and propose better database architectures and solutions, recommending when to re-architect data models, migrate workloads, introduce new technologies, or redesign how services interact with their data layer.

* You will put in every effort within the team to ensure the data architecture is well designed.

Own capacity planning, scaling strategies, and high-availability designs for database clusters, ensuring systems are built to handle the team's growth trajectory.

Act as the bridge between development and infrastructure, advising engineers on how their application patterns impact database performance and guiding them toward sustainable solutions.

Build and maintain CI/CD pipelines, infrastructure-as-code (Terraform, Helm, Kubernetes manifests), and automated deployment workflows for the xspm team's services.

Design and manage observability stacks, dashboards, alerting rules, and SLOs, to maintain best-in-class availability for critical data pipelines and services.

Drive infrastructure automation to reduce operational toil, including automated scaling, self-healing systems, and configuration management.

Participate in on-call rotations, incident response, and post-incident reviews, driving root-cause analysis and long-term reliability improvements.

Evaluate and adopt new database technologies and infrastructure tooling that align with the team's evolving data architecture needs.
Requirements:
7+ years experience in DevOps, SRE, DBA, or infrastructure engineering, with significant hands-on responsibility for production database systems at scale.

Expert-level knowledge of a common DB such as MongoDB, Opensearch, Postgress, deep understanding of its internals, performance characteristics, replication, sharding, and the ability to diagnose and solve complex issues from first principles.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8675475
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a Senior DevOps Engineer to join our R&D team in developing the next rising product in the health tech landscape. If you are looking for a challenging, influential position and are passionate about making an impact, this might be the role for you.

As a Senior DevOps Engineer , youll play a key role in the design, development, testing, deployment, and monitoring of our infrastructure and products. In this position, you'll make significant contributions to our observability stack, helping build and maintain robust systems for logs, metrics, traces, and alerting.

Our ideal candidate is passionate about DevOps and observability, has strong communication skills, and thrives on constant improvement for both technology and processes. If you enjoy working on multiple projects in parallel and are a proactive team player, youll fit right in.

This is a unique opportunity to join the core team of a fast-growing startup, where your contributions will have a direct impact on our product and success.

Responsibilities

Support and collaborate with cross-functional engineering teams using cutting-edge technologies.
Contribute to the design, implementation, and maintenance of monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, Loki)
Secure, scale, and manage our cloud environments (AWS and GCP)
Design and implement automation solutions for both development and production
Manage and improve our CI/CD pipelines for fast and safe delivery
Lead best practices in infrastructure, observability, configuration management, and system hardening
Continuously assess and improve existing infrastructure in line with industry standards
Requirements:
BSc in Computer Science, Engineering, or equivalent experience
5+ years of experience as a DevOps Engineer or similar software engineering role
Proven experience with Docker and Kubernetes (EKS preferred)
Hands-on experience with monitoring and observability tools, including Prometheus, Grafana, Datadog, or similar.
Expertise in Terraform for AWS infrastructure-as-code deployments
Strong collaboration and interpersonal communication skills
Excellent analytical thinking and problem-solving mindset
Proficiency with relational databases
Solid knowledge of Python and Bash scripting
Experience with test automation - an advantage
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8671069
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
7 ימים
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a passionate and Senior DevOps Engineer to join our DevOps Core Team. In this role, you will be responsible for the design, implementation, and maintenance of cloud-native infrastructure on AWS and Kubernetes. You will work closely with development, operations, and quality assurance teams to streamline processes, own our Infrastructure as Code practices, and help evolve our platform reliability at scale.

How Will You Make an Impact?

Design, implement, and maintain Kubernetes clusters in production environments, ensuring high availability and scalability.

Build and manage Infrastructure as Code using CloudFormation and Crossplane as our primary IaC tools.

Own and operate cloud infrastructure primarily on AWS, with working knowledge of GCP environments.

Identify and implement process improvements to increase the efficiency and reliability of the DevOps Core team.

Provide technical leadership and mentoring to team members, fostering a culture of engineering excellence.

Work closely with engineering teams to define infrastructure needs and provide DevOps support and guidance.

Research, evaluate, and integrate new technologies into our stack.

Manage, monitor, scale, and troubleshoot a distributed, highly available, customer-facing software platform.

Create and maintain technical documentation for infrastructure, processes, and runbooks.
Requirements:
Strong, hands-on Kubernetes experience of 5+ years of proven experience running and operating clusters in production at scale is a must.

Deep expertise with Infrastructure as Code - primary experience with AWS CloudFormation and Crossplane.

Comprehensive knowledge of AWS cloud services (compute, networking, storage, IAM, observability) - with 5+ years of proven, hands-on AWS experience.

Working also with GCP - ability to operate, troubleshoot, and deploy in GCP environments.

Hands-on experience with ArgoCD and GitOps workflows - managing application delivery through Git as the source of truth.

Experience with CI/CD pipelines and automation tooling (Jenkins, CircleCI, or similar).

4+ years of scripting or coding experience (Python, Bash, or GoLang) for automation and tooling.

Advanced knowledge of Linux OS and networking fundamentals.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8695424
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Were looking for a DevOps Architect to help shape the infrastructure strategy behind our Revenue AI platform. This role sits at the center of our engineering ecosystem, driving architectural direction, improving operational excellence, and enabling teams to scale with confidence. Youll work across engineering groups to identify systemic gaps, define scalable standards, and accelerate execution without becoming a delivery bottleneck.
Youll Own:
Infrastructure Strategy & Standards: Define and evolve our cloud and infrastructure architecture across Kubernetes, networking, observability, security, and data platforms. Establish clear standards and scalable best practices that enable teams to move faster with consistency and reliability.
Technical Debt & System Health Visibility: Continuously identify, prioritize, and drive resolution of cross-team technical debt, architectural gaps, and operational inefficiencies. Create organizational visibility around the most critical infrastructure challenges and opportunities.
Cross-Org Technical Leadership: Partner closely with engineering leaders and teams to influence architectural decisions, challenge assumptions, and ensure solutions are scalable, maintainable, and secure. Lead through expertise and influence, not direct ownership.
Developer Enablement & Engineering Velocity: Provide frameworks, tooling direction, and lightweight prototypes or POCs that empower teams to execute independently with higher quality and efficiency.
Critical Infrastructure Initiatives: Drive major cross-functional initiatives around reliability, scalability, security, observability, and cost optimization from identification through execution and measurable impact.
Youll Solve:
Scaling Complexity: How do we maintain simplicity, reliability, and operational clarity while supporting rapid growth and increasingly complex distributed systems.
Cross-Team Alignment: How do we create architectural consistency across independent engineering groups without slowing down innovation and execution?
Operational Excellence at Scale: How do we proactively surface and resolve systemic weaknesses before they become production issues?
Balancing Speed & Sustainability: How do we enable fast delivery today while protecting the long-term health and scalability of the platform?
AI Infrastructure Evolution: How do we build infrastructure that supports modern AI/ML workloads, GPUs, large-scale data pipelines, and future platform requirements
Youll Impact:
Platform Reliability & Scalability: Your work will directly improve the resilience, scalability, and operational maturity of our infrastructure platform.
Engineering Efficiency: By creating better standards, tooling, and architectural guidance, youll act as a force multiplier for engineering teams across the company.
Long-Term System Health: Youll help reduce operational friction, minimize technical debt, and ensure our infrastructure can support long-term business growth.
Execution Quality Across Teams: Your influence will elevate engineering quality, decision-making, and operational discipline throughout the organization.
Requirements:
A Deep Technical Expert: Someone with 8+ years of hands-on experience with AWS and cloud-native infrastructure at scale, including strong Kubernetes expertise and distributed systems knowledge.
An Infrastructure Architect: Someone with deep experience in Infrastructure as Code and GitOps methodologies using tools like Terraform, Crossplane, or Pulumi.
A Pragmatic Builder: A strong engineer with programming experience in Python or Go who can build tools, prototypes, and automation when needed.
A Systems Thinker: Someone who can identify patterns, uncover systemic issues, and drive improvements across complex technical environments.
An Influential Technical Leader: Someone with proven experience leading cross-team initiatives and driving alignment without direct authority.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8665155
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
5 ימים
Location: Tel Aviv-Yafo
Job Type: Full Time
we are looking for a Software Engineering Manager (SRE Team).
In this role, you will be responsible for:
Lead a distributed team of 3 SRE engineers as a part of Global SRE group and report to the Director of SRE &DBOps.
Drive incident response and post-mortem processes, fostering a culture of continuous improvement.
Design, build and improve internal tools and automation software to make maintaining production services easier and safer.
Lead reliability-focused practices such as SLO (Service Level Objective) design and implementation, Failure Analysis, Load and Capacity Planning, Service Reviews, Architecture Designs, Incident Postmortems, and others.
Participate in the on-call rotation, providing expertise and support during critical system incidents and ensuring timely resolution.
Ensure high technical standards and overcome various technological challenges.
Be accountable for the team's output, empowering and coaching team members, setting goals, and ensuring engagement.
Embrace values of independence, natural curiosity, ownership, and continuous improvement.
Requirements:
7+ years in a software development or DevOps roles
3+ years of leading a software development or DevOps teams
Experience with Micro Services architecture
Experience with troubleshooting high-load Production issues
Experience working with one of the major Cloud providers (AWS, GC, Azure) on infrastructural level
Experience with Agile methodology
Experience with DevOps tools and practices and a mindset to learn new skills
Proven ability to lead, mentor and drive people
Strong verbal and written communication skills in Hebrew and in English
Preferred:
BS/MS in Computer Science or equivalent industry experience.
Experience working on large-scale, high-traffic platforms.
Distributed monitoring experience with logging, metrics and tracing using OpenTelemetry, Prometheus and other observability platforms
Additional scripting languages: bash, powershell, python
Previous experience working as SRE
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8700258
סגור
שירות זה פתוח ללקוחות VIP בלבד