דרושים » תוכנה » Site Reliability Engineering Manager

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
13/07/2025
משרה זו סומנה ע"י המעסיק כלא אקטואלית יותר
מיקום המשרה: נתניה ותל אביב יפו
סוג משרה: משרה מלאה
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
Required Site Reliability Engineer- Infra
Realize your potential by joining the leading performance-driven advertising company!
As a Site Reliability Engineer- infra, on our Infrastructure team at the TLV office, you will play a key role in ensuring the reliability, scalability, and performance of our critical systems. You will be responsible for managing and improving our core infrastructure, with a focus on automation, monitoring, and incident response. You will work with a wide range of technologies, including Kubernetes, monitoring and observability tools, configuration management systems, and core networking services.
How youll make an impact:
As a Site Reliability Engineer, youll bring value by:
Ensure the reliability, availability, and performance of our infrastructure services.
Manage and maintain our Kubernetes infrastructure, including KubeVirt.
Design, implement, and maintain our monitoring and observability stack (SensuGo, VictoriaMetrics, Prometheus, ELK).
Automate infrastructure provisioning, configuration, and deployment processes using Puppet and Ansible.
Manage and maintain core services such as DNS and networking.
Troubleshoot and resolve complex infrastructure issues in a timely and efficient manner.
Participate in on-call rotations and incident response.
Develop and maintain infrastructure-as-code (IaC).
Identify and implement proactive measures to prevent incidents and improve system reliability.
Collaborate with development teams to ensure smooth and reliable deployments.
Contribute to the design and implementation of new infrastructure solutions.
Drive improvements in system architecture, processes, and tools.
Mentor and coach other team members.
Requirements:
5+ years of experience in a Site Reliability Engineering, Systems Engineering, or similar role.
Deep understanding of Site Reliability Engineering principles and practices.
Extensive experience with Kubernetes, including deployment, management, and troubleshooting.
Strong experience with monitoring and observability tools such as SensuGo, Zabbix, VictoriaMetrics, Prometheus, and ELK.
Proficiency in configuration management tools such as Puppet and Ansible.
Solid understanding of Linux internals and networking.
Experience with managing and maintaining core services such as DNS and networking.
Strong programming skills in Python and/or Go.
Experience with both on-premises and cloud environments.
Experience with KubeVirt.
Excellent troubleshooting and problem-solving skills.
Strong communication and collaboration skills.
Ability to work in a fast-paced, dynamic environment.
Ability to participate in on-call rotations including weekends.
Preferred Qualifications:
Experience with large-scale, distributed systems.
Experience with other cloud providers (e.g., AWS, Azure, GCP).
Contributions to open-source projects.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8272676
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
3 ימים
חברה חסויה
Location: Tel Aviv-Yafo and Yokne`am
Job Type: Full Time
we are at the forefront of the AI revolution, delivering cutting-edge accelerated compute platforms for global impact. Our Network Insights group is seeking a talented and motivated Sr. DevOps Engineer to architect, scale, and optimize the DevOps infrastructure supporting our advanced networking simulation services. In this high-impact role, you will lay the foundations to scale a key insight product to reach 10100 times more users, design robust CI/CD pipelines, drive automation, and ensure the reliability, scalability, and security of our cloud-based, and on-prem platforms.. If you are passionate about solving complex infrastructure challenges and enabling world-class software delivery, we want to hear from you.
What You'll Be Doing:
Architect and optimize CI/CD pipelines for large-scale, high-availability simulation services, ensuring fast, reliable, and secure deployments.
Drive automation across infrastructure provisioning, configuration management, and monitoring to support rapid development cycles and minimize manual intervention.
Collaborate with software engineering and product teams to design and implement scalable, cloud-native solutions that meet evolving business needs.
Promote standard processes in infrastructure as code, containerization, and cloud security, ensuring compliance and resilience across environments.
Monitor, troubleshoot, and resolve infrastructure and deployment issues, maximizing uptime and ensuring efficient performance for internal and external customers.
Evaluate and integrate new tools and technologies to continually enhance the reliability, observability, and efficiency of our DevOps ecosystem.
Participate in incident response and post-mortem processes, driving root cause analysis and systemic improvements.
Requirements:
BSc or above in Computer Science, Computer Engineering, or a related field, or equivalent experience.
5+ overall years of hands-on experience in DevOps or Site Reliability Engineering roles.
Proven expertise in designing, building, and maintaining CI/CD pipelines (e.g., Jenkins, GitLab CI, GitHub Actions, or similar).
Deep knowledge of cloud platforms (AWS, preferably), On-Prem deployment, container orchestration (Kubernetes, Docker), and infrastructure as code.
Strong scripting and automation skills (Python, Bash, or similar).
Experience with monitoring, logging, and observability tools (Prometheus, Grafana, ELK, etc.).
Proven understanding of security standard methodologies in cloud & on-prem DevOps environments.
Excellent communication and interpersonal skills, with a track record of multi-functional collaboration.
Experience supporting large-scale, high-availability production systems.
Ways to Stand Out From the Crowd:
Prior background in networking or simulation environments.
Prior experience with building a new team from the grounds up.
Familiarity with performance tuning and cost optimization in cloud and on-prem environments.
Experience with building CI/CD pipelines from the ground up.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8322880
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Required Site Reliability Engineer
Realize your potential by joining the leading performance-driven advertising company!
As Site Reliability Engineer on the IT Production team in our TLV Office, youll play a vital role in building robust services and solving infrastructure challenges with automations while working with cutting-edge technologies and bringing those to their limits on our mostly on-prem cloud like infrastructure.
How youll make an impact:
As a Site Reliability Engineer, youll bring value by:
Ensure Reliability & Scalability: Design, implement and manage highly reliable and scalable distributed systems across our on-premise, cloud and AI/ML environments. Proactively optimize performance, efficiency, resource utilization and cloud cost.
Drive Automation: Automate repetitive tasks, infrastructure provisioning, configuration and deployments using IaC and scripting languages (e.g., Python, Go, Rust).
Develop Observability & Capacity: Implement comprehensive monitoring and alerting systems to ensure system health. Collaborate on capacity planning to meet future growth.
Maintain Security & Compliance: Integrate security best practices and ensure compliance with industry standards.
Lead Incident Management: Participate in on-call rotations, lead incident responses and conduct root cause analysis to minimize downtime.
Foster Collaboration & Improvement: Work closely with development, operations and security teams to drive shared responsibility and continuous improvement in SRE practices.
Our Tech Stack:
Linux, Kubernetes, nginx, Istio, AWS, GCP, Azure, Alicloud, Fastly, Terraform, Consul, Prometheus, Loki, Grafana, Airflow, Redis, Kafka, Vector, Hadoop, Cassandra, Vertica, MySQL, HDFS, ELK.
Requirements:
To thrive in this role, youll need:
7 years of experience as an SRE, DevOps Engineer, System Administrator in a large distributed environment with focus on Linux operating systems.
Experience supporting, troubleshooting and scaling large distributed systems in production.
Deep understanding of HTTP protocol, including HTTP/1.1, HTTP/2, caching semantics, TLS and gRPC delivery.
Experience configuring and operating CDN services (e.g., Akamai, Fastly, Cloudflare, AWS CloudFront).
Deep understanding in Linux system internals and system performance tuning.
Experience with Configuration Management Tools (Puppet, Ansible, Chef, Terraform).
Experience programming in at least one of the following languages (Python, Golang, Rust, Ruby, C++, Java).
Experience with monitoring and metrics collection systems (Prometheus, Grafana, ELK).
Experience with cloud providers and platforms (AWS, Azure, GCP, Alibaba).
Experience with containerization technologies (Kubernetes, Docker).
Deep understanding of networking principles (TCP/IP, DNS, load balancing).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8273985
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
05/08/2025
Location: Tel Aviv-Yafo
Job Type: Full Time
As a Principal DevOps Engineer in our Platform Engineering team, you will lead the design and implementation of cutting-edge CI/CD pipelines and cloud architecture that powers our development environment. You'll drive initiatives to enhance developer productivity through automation, tooling, and infrastructure improvements, working with a modern tech stack including Kubernetes, Python, cloud-native and high-scale technologies.
Your Impact
Architect and implement scalable, resilient CI/CD pipelines and cloud infrastructure that supports our engineering organization's evolving needs
Design and develop internal developer tools and platforms that significantly improve developer experience and productivity
Drive the evolution of our Kubernetes-based deployment infrastructure in Google Cloud Platform, ensuring security, reliability and performance
Optimize and scale our CI/CD infrastructure including Jenkins, GitLab, TeamCity, and artifact management systems
Mentor and guide other engineers on DevOps best practices, infrastructure design, and implementation strategies
Drive adoption of infrastructure-as-code, automated testing, and deployment methodologies
Collaborate with development teams to understand their needs and implement solutions that accelerate their workflow
Establish standards and best practices for infrastructure reliability, observability, and performance.
Requirements:
7+ years of experience in DevOps, Site Reliability Engineering, or Platform Engineering roles
Extensive experience with CI/CD pipeline design and implementation in complex environments
Advanced knowledge of Kubernetes administration, deployment patterns, and ecosystem tools
Strong programming skills in Python with solid understanding of OOP principles and design patterns
Deep understanding of cloud architecture, specifically with Google Cloud Platform services
Proven track record designing and implementing developer tooling and automation
Experience managing containerized applications and services in production environments
Strong system design skills with focus on scalability, reliability, and security
Knowledge of GitOps workflows and infrastructure-as-code using tools like Terraform, Pulumi, or equivalent
Familiarity with GitLab CI administration and pipeline development
participate in an on call rotation for working and non-working hours
Nice-to-Have
Knowledge of observability platforms and practices (Prometheus, Grafana, distributed tracing)
Familiarity with TeamCity administration and pipeline development
Experience implementing security best practices in CI/CD pipelines
Understanding of compliance requirements in software delivery pipelines
Experience with Infrastructure as Code testing frameworks
Knowledge of software architecture patterns and microservices design.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8290390
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
12/08/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
As a DevOps Architect, you are an advocate for efficiency and excellence. You will assess, advise, define, implement, and support world-class DevOps architectures and practices across complex business environments. You play a key role in shaping and guiding the adoption of modern DevOps methodologies, ensuring alignment between business goals, development processes, and operational needs. You bring leadership and strategic thinking to the table, making high-impact decisions and setting the direction for critical initiatives. With a deep understanding of software delivery, you prioritize quality, scalability, and best practices throughout the lifecycle. You have a strong affinity for data and cost optimization, with an eye for detail and a proactive approach to resource management. Your ability to influence and guide multiple cross-functional teams, align on clear objectives, and drive results in a matrixed environment positions you as a respected technical authority. Creative, analytical, and solution-oriented by nature, you thrive in high-pressure environments and are driven to continuously improve systems, processes, and outcomes.
What am I going to do?:
* Design and lead the implementation of scalable, secure, and cost-efficient DevOps architectures.
* Define and promote DevOps best practices, standards, and tooling across engineering teams.
* Collaborate closely with DevOps and development teams to ensure seamless delivery and operational excellence.
* Evaluate existing CI/CD pipelines, infrastructure, DBs and processes, identify gaps and drive improvements.
* Guide teams in adopting infrastructure as code, observability, automated testing, and deployment strategies.
* Monitor and manage infrastructure cost efficiency and performance, using data to inform technical and business decisions.
* Serve as a trusted advisor to engineering leadership, influencing decisions on architecture, operations, cost and development workflows.
* Drive initiatives that improve system reliability, deployment speed, cost, reliability and team productivity.
Equal opportunities:
were not about checklists. If you dont meet 100% of the requirements for this role but still feel passionate about the position and think you have the right skills and qualifications to excel at it, we want to hear from you. We prioritize diversity. We celebrate difference and embed it into every aspect of our workplace and product, as well as our community. We are proud and committed to providing equal opportunity employment to all individuals regardless of race, color, religion, sex, sexual orientation, citizenship, national origin, disability, Veteran status, or any other characteristic protected by law. In addition, we will provide accommodation to individuals with disabilities or a special need.
Requirements:
* Minimum 7 years of hands-on experience as a DevOps engineer, with at least 2 years in a senior or lead role.
* Proven expertise in designing and implementing scalable DevOps solutions across complex environments.
* Strong understanding of cloud infrastructure cost modeling, optimization strategies, and budget management.
* Experience managing and negotiating with infrastructure, tooling, and service vendors.
* Demonstrated ability to lead through influence in a matrixed organization. Driving alignment and results without direct authority.
* Deep knowledge of CI/CD, infrastructure as code (e.g., Terraform, Pulumi), container orchestration (e.g., Kubernetes), and monitoring/observability tools.
* Experience leveraging Generative AI tools to improve productivity, automation, or decision-making within DevOps workflows.
* Strong communication skills with the ability to engage technical and non-technical stakeholders effectively.
* Comfortable operating in fast-paced environments and managing competing priorities with a strategic mindset.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8299189
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 9 שעות
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a DevOps DevOps Engineer to take ownership of our Cloud Infrastructure and Platform Engineering strategy, enabling high-scale, cutting-edge GenAI products running across 40+ Kubernetes clusters on GCP and AWS.
This role is a hands-on engineering , requiring deep expertise in cloud-native technologies, Kubernetes at scale, and modern DevOps principles. You will work closely with engineering teams to design and implement scalable infrastructure solutions, optimize developer workflows, and ensure reliability and efficiency across our platform.
Role and Responsibilities:
Cloud & Kubernetes Expertise: Design and implement highly scalable multi-cluster Kubernetes environments across GCP & AWS.
Developer Experience & Enablement: Lead the development of self-service tools and automation that improve efficiency for R&D teams.
Incident & Reliability Engineering: Work with engineering teams to optimize cost, performance, and reliability of production infrastructure through monitoring, capacity planning, and scaling strategies.
Security & Governance: Contribute to best practices for RBAC, IAM, cloud security, and compliance while ensuring infrastructure reliability.
Automation & Infrastructure as Code: Drive adoption of GitOps workflows and Infrastructure as Code (Terraform, Helm, Crossplane) to enhance automation and consistency.
Mentorship & Team Growth: Provide technical mentorship within the platform engineering team and contribute to knowledge-sharing across R&D.
Cross-Team Collaboration: Work closely with engineering teams to align cloud infrastructure goals with business needs and reliability requirements.
Requirements:
5+ years of DevOps, or SRE experience
3+ years working with public cloud platforms (AWS, GCP) at scale
Deep Kubernetes expertise, including managing large-scale, multi-cluster enterprise-grade Kubernetes environments
Experience designing and managing Custom Resource Definitions (CRDs) and custom controllers
Strong background in Infrastructure as Code (Terraform, Helm) and GitOps principles (ArgoCD, Crossplane, FluxCD, etc.)
Hands-on experience in observability & monitoring (Prometheus, Grafana, Datadog, OpenTelemetry, etc.)
Proficiency in scripting & automation (Python, Go, Bash) for infrastructure automation
Expertise in cloud networking (VPC, load balancers, service meshes) and security best practices (RBAC, IAM, security groups, network policies, etc.)
Experience with CI/CD pipelines, optimizing for performance, security, and developer velocity
Nice-to-Have:
Experience with self-hosted on-prem deployments and managed private VPC deployments (Bring Your Own Cloud models)
Advanced expertise in Helm and Crossplane for Kubernetes resource management.
Other cloud provider experience
Experience in GenAI or large-scale SaaS platforms
Familiarity with SQL/NoSQL databases and distributed systems
DevSecOps experience, with a strong understanding of security automation and compliance frameworks
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8326421
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
24/08/2025
חברה חסויה
Location: Netanya
Job Type: Full Time
PassportCard Labs is where innovation meets impact. As the technological powerhouse behind PassportCard, we design and build cutting-edge solutions that revolutionize the insurance industry. Whether it’s real-time claims payments or predictive analytics, our Labs team is at the forefront of driving change and shaping the future of insurance. Ready to innovate with us? We are seeking a highly skilled and experienced DevOps Infrastructure Engineer with at least 2.5 years of hands-on experience working in cloud environments such as AWS and Azure . The ideal candidate will have a strong background in automating infrastructure deployments, managing cloud environments, and working with Azure DevOps pipelines to ensure smooth CI/CD processes. You will be responsible for developing, implementing, and maintaining infrastructure as code (IaC), optimizing cloud resources, and improving the scalability and reliability of our systems. Key Responsibilities: Cloud Infrastructure Management: Design, implement, and manage scalable and secure cloud infrastructure in both AWS and Azure environments. CI/CD Pipeline Development: Build and maintain efficient Azure DevOps pipelines to automate deployments and streamline application lifecycle management. Automation & Scripting: Develop and maintain automation scripts using tools such as Terraform Ansible , or CloudFormation to provision, configure, and manage cloud resources. Monitoring & Optimization: Implement monitoring solutions to track infrastructure performance and availability and optimize resources for cost efficiency. Collaboration: Work closely with development and operations teams to design and implement infrastructure solutions that support agile development and deployment processes. Security: Implement security best practices for cloud infrastructure, including identity management, data protection, and compliance with industry standards. Incident Response & Troubleshooting: Respond to incidents and troubleshoot infrastructure-related issues, ensuring high availability and performance. Documentation: Maintain clear and up-to-date documentation of infrastructure configurations, processes, and procedures.
Requirements:
* At least 2.5 years of experience working in a DevOps or infrastructure engineering role.
* Hands-on experience with cloud platforms AWS and Azure ) and their associated services (e.g., EC2, S3, Lambda, Virtual Machines, App Services, etc.).
* Strong knowledge and experience with Azure DevOps pipelines , including creating, configuring, and managing CI/CD pipelines.
* Proficiency in Infrastructure as Code (IaC) tools such as Terraform Azure Resource Manager (ARM) templates , or AWS CloudFormation
* Experience with scripting languages such as PowerShell Bash , or Python for automating tasks and infrastructure management.
* Familiarity with containerization technologies like Docker and container orchestration platforms such as Kubernetes
* Solid understanding of version control systems, particularly Git , and related tools for managing code and deployment pipelines.
* Excellent troubleshooting skills and the ability to resolve complex infrastructure issues.
* Strong communication and collaboration skills to work effectively across cross-functional teams.
* Knowledge of security best practices for cloud infrastructure (IAM, encryption, data protection, etc.).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8248424
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
10/08/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a Staff Devops Engineer.
As a Devops Staff Engineer, you will not be assigned a specific R&D group, but will serve as a focal point for the DevOps engineers, to help and support with any issue.
Youll be leading projects that cross DevOps, push forward technical discussions and interact with each DevOps engineer as needed to solve diverse complex problems of high scale.
Youll support multi-region environments, build and maintain tools for automation, deployment, monitoring, and operations.
Youll troubleshoot and resolve issues in our various environments.
Youll play a key role in designing and enforcing infrastructure patterns that support zero-downtime deployments, high resilience, and compliance standards.
Youll collaborate with teams across the company to define and drive forward scalable, production-grade architecture.
Youll conduct periodic on-call duties and emergency response.
Requirements:
10+ years of experience in the industry, including 6+ years of hands-on experience in high-scale SaaS companies or zero-downtime/disaster recovery enterprise environments (e.g., banking, cybersecurity, healthcare, or large-scale cloud platform providers).
5+ years of experience in DevOps roles across a minimum of 2 different companies, with strong hands-on experience in Kubernetes and AWS. Experience with hybrid or multi-cloud architectures is a strong plus.
Experience with on-call duties to manage critical infrastructure and application issues outside business hours, ensuring high availability and reliability.
3+ years of experience with CI/CD tools such as GitLab, GitHub Actions, CircleCI, or similar.
2+ years of experience with programming languages such as Python or TypeScript. Strong Linux administration skills, including debugging and Bash scripting.
2+ years of experience with Terraform (experience with Terragrunt is a plus), as well as GitOps systems such as ArgoCD.
2+ years of experience with configuration management tools such as Ansible, Chef, or Puppet, and monitoring and alerting systems such as Datadog, Splunk, New Relic, or Grafana.
Strong understanding of networking concepts, including VPC, service meshes, routing, DNS, TLS, and firewalls.
Production-oriented mindset with a strong sense of ownership over reliability, scalability, and incident response.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8296098
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
24/07/2025
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking a Senior Platform Engineer, Observability to join our Observability team. This role offers the opportunity to work at the intersection of software development and platform engineering, contributing to the tools, systems, and practices that improve visibility, reliability, and operational excellence across our engineering organization.

This position is ideally suited for experienced software engineers who are passionate about building high-quality systems and are interested in expanding their expertise in observability, distributed systems, and developer experience. You will help design, build and maintain systems that empower engineers across us to monitor, understand, and troubleshoot their services more effectively.

Our observability team is responsible for delivering scalable and user-friendly solutions to over 150 engineers working across more than 20 teams. Were focused on enabling rapid incident detection and resolution, improving our reliability posture, and supporting a culture of continuous improvement.

What you'll be doing:
Design, build, and maintain observability tools and infrastructure that help our engineers provide actionable insights into the performance and reliability of our systems.
Collaborate with other engineers and teams to enhance the developer experience around monitoring, logging, alerting, and tracing.
Develop and evolve our internal tooling to simplify the process of instrumenting and observing services.
Partner with engineering teams to improve incident response and recovery workflows, and ensure systems meet internal SLOs/SLAs and reliability targets.
Support the migration from our legacy ELK stack to a modern observability platform using Prometheus, Mimir, Grafana, Honeycomb, Loki, Quickwit, and OpenTelemetry.
Contribute to knowledge sharing and the ongoing development of best practices in observability across the organisation.
Requirements:
What you'll need:
4+ years of professional experience as a software engineer, with a strong foundation in building and maintaining production systems.
Proficiency in one or more modern programming languages such as Python, Java, JavaScript, or Ruby.
Familiarity with Kubernetes, AWS, and infrastructure-as-code tools such as Terraform.
Experience working with observability tools and platforms (e.g. Prometheus, Grafana, ELK, Honeycomb, Loki, or similar).
A strong interest in developer experience and platform tooling, with the ability to empathise with engineering teams as internal customers.
Excellent communication skills, with the ability to collaborate effectively across teams and explain complex technical concepts clearly.
A proactive mindset focused on long-term impact, sustainable engineering practices, and continuous improvement.

Preferred Qualifications:
Experience with OpenTelemetry or distributed tracing systems.
Understanding of observability-driven development and service reliability principles (e.g. SRE, MTTR, SLIs/SLOs).
Experience optimising observability systems for cost and performance at scale.
Knowledge of microservices architectures and how to monitor and debug distributed systems.
Contributions to open-source projects in the observability or monitoring space
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8274690
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 6 שעות
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Were looking for a DevOps Team Leader to lead our DevOps efforts as we scale. This is a hands-on leadership role requiring deep technical expertise, strong project and people management skills, and the ability to navigate complex stakeholder environments.

The ideal candidate will lead the day-to-day operations of the DevOps team, ensure operational excellence, drive platform improvements, and manage cross-functional alignment with Dev/Finance/Security/ higher manager.

Responsibilities
Team Leadership & Execution

Lead, mentor, and grow a high-performing DevOps team.
Manage day-to-day operations including incident management and task prioritization.
Ensure SLAs and compliance requirements are met.
Balance proactive platform improvements with reactive issue resolution.
Platform Ownership

Oversee the design, implementation, and maintenance of a secure, scalable, multi-region AWS infrastructure.
Own CI/CD pipelines, infrastructure-as-code (IaC), observability (logs, metrics, tracing), and automation tooling.
Ensure robust disaster recovery (DR) and business continuity practices are in place and regularly tested.
Stakeholder Collaboration

Act as the main point of contact between DevOps and external/internal stakeholders: Banks, Regulators, Security teams, NOC/SOC, and Development teams.
Communicate clearly on priorities, incidents, risks, timelines, and platform status.
Represent DevOps in cross-functional planning and reviews.
Process & Standards

Define and evolve the DevOps teams SDLC, deployment standards, and incident response processes.
Drive best practices in monitoring, alerting, and reliability engineering.
Champion a culture of ownership, transparency, and continuous improvement.
Requirements:
7+ years of DevOps / SRE experience, including at least 2 years in a leadership or tech lead role.
Proven experience managing multi-region AWS production environments.
Strong skills in Terraform, Kubernetes, CI/CD (e.g., GitHub Actions), observability tools (e.g., Datadog, Prometheus, OTLP).
Hands-on experience with high-availability systems, disaster recovery, and compliance-driven environments.
Ability to balance short-term firefighting with long-term vision.
Excellent communication and stakeholder management skills.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8326597
סגור
שירות זה פתוח ללקוחות VIP בלבד