דרושים » תוכנה » Streaming Infrastructure DevOps Engineer

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 3 שעות
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking a skilled and motivated DevOps engineer with deep familiarity in the streaming ecosystem to join our elite infrastructure team. If you're excited by the challenge of operating mission-critical systems at scale and optimizing the developer experience through automation and tooling, wed love to hear from you.

What you'll do:
Automate Deployment and Operation
Oversee deployment of Kafka and RabbitMQ clusters (including Confluent Cloud & CFK). Build automation pipelines to ensure repeatability and resiliency across environments.

Monitor and Support Production Systems
Own production stability of global Kafka clusters. Handle on-call rotations, incident management, troubleshooting, and scaling challenges.

Improve Infrastructure Observability
Build and maintain observability systems: dashboards, alerting pipelines, metrics collection (Prometheus, Grafana, etc.).

Optimize System Performance
Collaborate with peers on benchmarking and optimization initiatives. Work on tuning Kafka brokers, cluster configurations, and runtime parameters.

Provide Developer Support and Training (Infra-focused)
Help developers configure topics, quotas, and consumers appropriately. Train service owners to interpret monitoring data and avoid pitfalls.

Develop and Maintain Infrastructure
Contribute to building infrastructure tools and scripts (IaC, Helm charts, etc.) that make provisioning and managing clusters reliable and efficient.

Secure Infrastructure Access
Configure and maintain secure access patterns across streaming infrastructure, ensuring proper authentication and role-based access controls are enforced for both developers and services.
Requirements:
8+ years of experience in DevOps, SRE, or Infrastructure Engineering roles.
Deep hands-on Kafka experience, including deploying, maintaining, scaling, and monitoring clusters.
Experience with RabbitMQ.
Extensive experience with Docker, Kubernetes, Helm, and GitOps-style deployments.
Infrastructure as Code experience (Terraform, Pulumi, etc.).
Strong skills in scripting and automation (Python, Bash, etc.).
Familiarity with Confluent Cloud, Confluent for Kubernetes, and similar tools.
Solid understanding of authentication and authorization mechanisms in distributed systems.
Production support mindset - with proven troubleshooting and incident resolution history.
Collaboration and communication skills - especially with dev teams depending on platform support.
Experience with Istio Service Mesh (bonus).
Experience with GovCloud (bonus).
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8478350
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
21/12/2025
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a talented and motivated DevOps Engineer to join our Cloud Engineering team. Youll play a key role in developing, maintaining, and scaling our SaaS product across AWS, Azure, and GCP. This includes managing our deployment packages (Terraform, CloudFormation, Azure Bicep), ensuring seamless integrations with customer environments, and enabling secure, reliable data scanning at scale.

As part of our DevOps team, youll not only drive automation and infrastructure management, but also participate in customer-facing installation meetings, helping customers deploy and configure our platform successfully.

Responsibilities:
Design, develop, and maintain cloud infrastructure on AWS, Azure, and GCP.
Manage Infrastructure as Code using Terraform, CloudFormation, and Azure Bicep.
Build, scale, and maintain Kubernetes clusters and containerized applications.
Implement automation for deployment, monitoring, and incident response.
Write and maintain Python and Bash scripts for automation, integrations, and tooling.
Troubleshoot networking, connectivity, and security issues (TCP/IP, UDP, VPNs).
Collaborate with engineering and product teams to optimize deployments.
Support customer onboarding by assisting with setup and deployment meetings.
Continuously improve CI/CD pipelines and operational processes.
Requirements:
Requirements:
35+ years of experience in DevOps, Cloud Engineering, or related roles.
Hands-on expertise with at least two major cloud providers (AWS, Azure, GCP; experience with all three is a plus).
Strong programming skills in Python and Bash (automation, tooling, scripts).
Proficiency in Linux systems administration.
Strong experience with Kubernetes, Docker, and container orchestration.
Deep understanding of networking fundamentals (TCP/IP, UDP, DNS, VPNs, routing, firewalls).
Experience with IaC tools: Terraform, CloudFormation, Azure Bicep.
Familiarity with CI/CD tools (GitHub Actions, GitLab CI, or similar).
Excellent problem-solving and troubleshooting skills.
Excellent communication skills, with the ability to work directly with customers.
Observability stack experience (Datadog, Prometheus, Grafana, ELK, etc.).

Nice to Have:
Experience in SaaS environments or multi-cloud deployments.
Security best practices and compliance knowledge (IAM, RBAC, data protection).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8465449
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
02/12/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We're looking for a Senior SRE Engineer who combines strong infrastructure expertise with solid programming skills to help scale our platform, who can balance operational excellence with software development.
This is an exciting opportunity to build SRE processes from the ground up - creating new reliability pipelines, monitoring frameworks, and foundational practices that will scale with our rapid growth.
You'll lead our infrastructure and reliability efforts while writing code to automate, optimize, and enhance our systems. This role requires both deep technical expertise and the ability to mentor team members as we scale.
Stack: AWS, Python, EKS, K8s, Kafka, RabbitMQ, Pulumi, PostgreSQL, Databricks, GitHub Actions
Core Responsibilities:
Design and implement scalable, reliable infrastructure solutions on AWS using Infrastructure as Code (Terraform/Pulumi).
Build and maintain sophisticated CI/CD pipelines with GitOps methodologies.
Develop custom tooling and automation scripts in Python/Go/similar languages to improve operational efficiency.
Architect and implement comprehensive observability solutions (metrics, logging, tracing, alerting).
Define and track SLIs/SLOs/Error Budgets to ensure system reliability.
Lead incident response, conduct thorough post-mortems, and drive systemic improvements.
Optimize cloud costs through data-driven analysis and architectural improvements.
Collaborate with development teams to improve application reliability and performance.
Mentor team members on SRE best practices and infrastructure design patterns.
Requirements:
5+ years of DevOps/SRE experience in production environments.
Solid programming skills in at least one language (Python, Go, Java, or similar) with ability to write production-quality code.
Strong understanding of SRE principles: reliability engineering, capacity planning, chaos engineering.
Deep expertise with Kubernetes (EKS preferred) including operators, CRDs, and advanced networking.
Proven experience implementing Infrastructure as Code at scale.
Hands-on experience with observability stacks (Prometheus, Grafana, ELK, Datadog, or similar).
Experience with distributed systems concepts and troubleshooting.
Excellent problem-solving skills with a systematic approach to debugging.
Strong communication skills and ability to work across teams.
What Sets You Apart:
You write code to solve operational problems, not just configure existing tools.
You think in systems and can identify root causes across complex architectures.
You're passionate about automation and eliminating toil.
You balance perfectionism with pragmatism to deliver reliable solutions quickly.
You stay current with cloud-native technologies and best practices.
You can translate technical concepts for various audiences.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8439435
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a skilled DevOps Engineer to join our R&D infrastructure team and play a key role in building and scaling cloud-based platforms. In this position, you will design, implement, and maintain modern CI/CD pipelines, manage multi-cloud environments, and support microservices-based architectures. You will work closely with developers, QA, and product teams to streamline delivery processes, improve system reliability, and ensure smooth deployments. This is a hands-on role where you will directly influence the stability, scalability, and efficiency of our production systems while leveraging cutting-edge technologies across AWS and Azure.

Responsibilities
Infrastructure as Code (IaC): Develop and maintain infrastructure using tools like Terraform, Ansible
Cloud Infrastructure Management: Deploy, manage, and monitor applications in cloud environments(aws and Azure)
Collaboration & Support: Work closely with developers, QA, and product teams to streamline releases and improve productivity.
Provide technical support for development and operations teams during incidents and deployments.
CI/CD Pipeline Management:
Design, implement, and maintain continuous integration and delivery pipelines. Automate build, test, and deployment processes to improve speed and reliability.
Requirements:
3-5 years experience as DevOps Engineer\SRE Engineer\Platform Engineer
Strong problem-solving skills
Microservices architecture & container orchestration (Docker and Kubernetes)
Experience with IaaC tools (e.g. Terraform)
Strong knowledge of CI/CD tools such as Jenkins, GitHub.
Experience with Configuration Managements tools (e.g. Chef, Ansible or Puppet)
Experience with GitOps (e.g. ArgoCD)
Proven Scripting capabilities: PowerShell/Bash/Python
Hands-on experience with cloud platforms AWS/Azure/GCP
Strong troubleshooting skills
Familiarity with monitoring and logging tools (Prometheus, Grafana, ELK, etc.)
Excellent collaboration and communication skills for working across development, QA, and operations teams
BSc degree in computer science, computer engineering, relevant technical discipline, or equivalent practical experience
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8423258
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a DevOps Engineer.
As a key member of our engineering team, youll work at the intersection of development, operations, and reliability. Youll automate cloud infrastructure, ensure system performance, and maintain secure, scalable deployments in a regulated fintech environment.
Responsibilities:
Manage and enhance cloud infrastructure (AWS, GCP, Azure, or similar).
Develop, maintain, and automate CI/CD pipelines to streamline application delivery.
Implement Infrastructure as Code (e.g., Terraform, Ansible, CloudFormation) for provisioning and managing environments.
Set up and maintain monitoring, observability, and alerting systems using tools like Prometheus, Grafana, Splunk, New Relic, ELK,etc.
Define, track, and act upon SRE metrics (SLIs, SLOs, error budgets) to balance reliability and development velocity.
Participate in incident response, including root cause analysis and remediation.
Automate repetitive tasks to reduce toil and increase system resiliency and uptime.
Collaborate with developers and security teams to embed security and compliance best practices (e.g., PCI DSS, DevSecOps).
Support on-call rotation and continuously improve operational processes.
Requirements:
5-8 years experience in DevOps, SRE, or related engineering roles.
Proven experience working with at least one cloud provider (AWS, GCP, Azure).
Proven experience with containerization and orchestration (Docker, Kubernetes,GKE).
Proficiency in CI/CD tooling (e.g., GitLab CI, Jenkins, GitHub Actions).
Hands-on experience with Infrastructure as Code tools (Terraform, Ansible,CloudFormation).
Strong command of monitoring and observability tools (Prometheus, Grafana, ELK stack, Splunk, New Relic).
Solid scripting ability in Python, Bash, or similar.
Familiarity with Linux/Unix systems, networking, and basic system administration.
Comfortable working in fast-paced, collaborative environments and able to handle operational incidents effectively.
Excellent communication skills and a mindset geared toward continuous learning and improvement.
Nice to Have:
Exposure to containerization and orchestration (Docker, Kubernetes, GKE).
Understanding of SLA/SLI/SLO frameworks, error budgets, and reliability engineering principles. WikipediaReddit
Awareness of financial compliance standards like PCI DSS. jobs.singaporefintech.org
Knowledge of DevSecOps practices (security-as-code, shifting security left). Wikipedia
Familiarity with incident management and on-call culture.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8441385
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
As a Lead DevOps Engineer, your role involves the design and development of robust, scalable, and resilient distributed systems. You'll define product specifications, leveraging your technical expertise to create optimal solutions hosted in Kubernetes on AWS Cloud. This position requires extensive collaboration with various teams throughout the software development lifecycle. You will lead design discussions and code reviews, contributing to the overall quality of engineering within the organization.

Your responsibilities also include creating and supporting reusable application components and patterns, considering both business and technology perspectives. You'll utilize developer tools and a range of AWS services for task management, source code handling, building, deployment, operations, and real-time communication. You are expected to demonstrate advanced skills in application design, implementation, and maintenance, often with minimal supervision.

Beyond technical tasks, you will mentor other engineers, sharing your knowledge and actively contributing to the enhancement of best practices and processes within and across teams.

Responsibilities:

Design, build, and maintain the scalable cloud infrastructure and CI/CD pipelines necessary to support our cutting-edge AI and optimization services.

Champion Infrastructure as Code (IaC) practices using tools like Terraform and Kubernetes to automate the deployment, scaling, and management of our production environments.

Implement robust monitoring, logging, and alerting systems to ensure the high availability, performance, and reliability of all services.

Partner with development teams to streamline the software development lifecycle, improve deployment velocity, and embed best practices for security and operational excellence.


JR314438
Requirements:
4+ years of hands-on experience in DevOps Concepts and Cloud Architecture.

4+ years of experience with AWS (mandatory to know concepts around s3, sqs, dynamodb, iam and kms) or other similar concepts around different cloud service providers e.g., GCP and Azure (Optional)

4+ Experience deploying and managing CI/CD pipelines. E.g., Jenkins and/or Spinnaker

Advanced programming experience with at least two modern languages such as GoLang, Java, C++, Or Python including object-oriented design.

Proven understanding of micro-services-oriented architecture and extensible REST and gRPC APIs. Experience building the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems.

Knowledge and experience to ensure Kubernetes cluster management including workloads in deployments and statefulsets remains reliable, available, secured and meet performance expectations

Experience with Kubernetes packaging technologies such as HELM and experience in administrating Kubernetes config maps, services, deployments, and stateful sets.

Experience with monitoring production and staging of test and development environments for a number of applications in a dynamic organization.

Good command of the version control tools including but not limited to GIT.

Strong expertise in troubleshooting complex production issues. Excellent problem-solving, critical thinking, and communication skills.

Degree or equivalent relevant experience required. Experience will be evaluated based on the core competencies for the role (e.g. extracurricular leadership roles, military experience, volunteer roles, work experience, etc.).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8431996
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
5 ימים
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking a Senior DevOps Engineer with extensive experience to lead the design, development, deployment, and operation of large-scale software solutions. This role is a critical bridge between Software Engineering and Infrastructure, demanding a deep proficiency in building and operating reliable, scalable systems within a complex Big Data environment.
What you'll be doing all day:
Own Reliability and Scalability: Lead the architecture and implementation of best practices to ensure high availability, optimal performance, and horizontal scalability of our critical systems, operating within a vast Big Data landscape.
Infrastructure as Code (IaC): Develop, maintain, and evolve our infrastructure using advanced IaC tools (e.g., Terraform or Pulumi), ensuring full automation of service deployment and management across our AWS/GCP cloud environment.
Strategic Collaboration: Partner closely with application software engineering teams to design, conduct code reviews, and implement systems that are stable, secure, and performant.
Observability: Implement and manage robust monitoring, logging, and alerting solutions to enable proactive identification and deep Root Cause Analysis (RCA) of issues.
Automation & Efficiency: Identify and eliminate manual tasks ("Toil") by automating repetitive processes to continuously improve operational efficiency and system reliability.
Production Incident Response: Participate in an on-call rotation to quickly investigate, troubleshoot, and mitigate critical production incidents, driving post-mortems to prevent recurrence.
Performance Engineering: Analyze system performance, conduct performance tuning, and execute capacity planning to meet future demands.
Requirements:
Proven Experience: 5+ years of experience as a Production Engineer, DevOps Engineer, or SRE, running and managing large-scale operations on a major cloud provider (AWS or GCP).
Coding Proficiency: 5+ years of experience developing server-side applications or tooling using languages like Python, Java, Node.js, or Go.
Deep Infrastructure Knowledge: Strong understanding of Kubernetes and container orchestration, complemented by solid knowledge of Web Servers (e.g., Nginx), Load Balancers, Caching Systems (e.g., Redis/Memcached), Databases (Relational and NoSQL), and networking fundamentals.
CI/CD & GitOps: Practical experience with modern CI/CD tools (e.g., Jenkins, GitLab CI, CircleCI) and familiarity with GitOps principles.
Communication: Excellent communication and collaboration skills to coordinate effectively across various R&D and Infrastructure groups.
Passion: Eagerness to take on complex challenges and a continuous desire to learn and implement new, cutting-edge technologies.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8471349
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
Required Site Reliability Engineer Panda team
Realize your potential by joining the leading performance-driven advertising company!
As Site Reliability Engineer on the IT Production team in our Tel Aviv Office, youll play a vital role in building robust services and solving infrastructure challenges with automations while working with cutting-edge technologies and bringing those to their limits on our mostly on-prem cloud like infrastructure.
As a Site Reliability Engineer, youll bring value by:
Ensure Reliability & Scalability: Design, implement and manage highly reliable and scalable distributed systems across our on-premise, cloud and AI/ML environments. Proactively optimize performance, efficiency, resource utilization and cloud cost.
Drive Automation: Automate repetitive tasks, infrastructure provisioning, configuration and deployments using IaC and scripting languages (e.g., Python, Go, Rust).
Develop Observability & Capacity: Implement comprehensive monitoring and alerting systems to ensure system health. Collaborate on capacity planning to meet future growth.
Maintain Security & Compliance: Integrate security best practices and ensure compliance with industry standards.
Lead Incident Management: Participate in on-call rotations, lead incident responses and conduct root cause analysis to minimize downtime.
Foster Collaboration & Improvement: Work closely with development, operations and security teams to drive shared responsibility and continuous improvement in SRE practices.
Our Tech Stack:
Linux, Kubernetes, nginx, Istio, AWS, GCP, Azure, Alicloud, Fastly, Terraform, Consul, Prometheus, Loki, Grafana, Airflow, Redis, Kafka, Vector, Hadoop, Cassandra, Vertica, MySQL, HDFS, ELK.
Requirements:
4+ years of experience in software development with a proven track record of designing and developing internal tools, automation frameworks and platform components in large-scale distributed production environments with focus on linux operating systems.
Deep, demonstrable expertise in one of the following programming languages ( Golang, C, Rust, Python or Java).
Experience in observability tooling development, specifically implementing custom metrics, tracing and logging within application code.
Practical understanding of the HTTP protocol (including HTTP methods, status codes and headers). Proven ability to design, implement and instrument robust internal APIs (e.g., using REST or gRPC).
Understanding in Linux operating system internals: kernel configuration, system calls, process management, memory and I/O.
Proven ability to troubleshoot and optimize performance bottlenecks under heavy load using advanced monitoring and profiling tools for high-throughput and low-latency applications.
Bonus points if you have:
Experience as an SRE, DevOps Engineer, System Administrator in a large distributed environment with focus on Linux operating systems.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8439403
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Description
For many of us theres that one podcast we never miss, and video content is part of our daily routine, whether its professional or personal. But how many of us truly understand the effort that goes on behind the scenes? Here at our company, we know it well. Thats exactly why we built an AI-powered platform that helps content creators, podcasters, marketeers, and more at major brands like Netflix, Disney, Google, and Microsoft to create high-quality content with ease.
our companys technology streamlines the entire content creation process, turning ideas into professional-grade content with the highest production standards, without requiring expensive equipment or external services. The secret? AI-driven tools that replace traditional production roles like editing, directing, and design, automating the entire process at the click of a button.
About the Engineering Team
Were a team of smart, curious engineers building scalable, reliable systems that power content creation for millions. We work with modern web technologies, tackle real-world challenges in distributed systems, and keep things practical - no overengineering, just solid solutions. If you love solving tough problems, moving fast, and building tech that creators actually use, youll fit right in.
On your day-to-day
Were hiring a Mid-Level DevOps Engineer to join our growing infrastructure team. You'll be working hands-on with modern AWS-native technologies, helping us scale secure and observable cloud environments, while partnering closely with developers to build and ship reliably.
Responsibilities
Manage scalable, production-grade AWS infrastructure with Terraform
Maintain Kubernetes clusters, including deployments, autoscaling, and ingress
Handle cross-service communication using Kafka, SQS, or PubSub
Set up observability and alerting using Grafana, Prometheus, and log aggregation tools
Integrate Cloudflare for traffic protection, rate limiting, and caching
Manage and improve CI/CD workflows using GitHub Actions and ArgoCD
Collaborate directly with developers on environment setup, debug sessions, and deployments
Implement IaC and GitOps patterns across services and teams.
Requirements:
What Will Make You Stand Out?
3+ years in DevOps or Platform Engineering
Deep knowledge of AWS networking, IAM, ALB/NLB, and VPC routing
Strong Terraform experience and infrastructure modularization
Kubernetes hands-on experience including Helm, scaling, and security contexts
Comfortable with observability tooling and alerting practices
Good communication skills; experience supporting developers day-to-day
Nice to Have
Exposure to Crossplane, service mesh (Istio), or Temporal
Familiarity with working in developer-facing platform teams
Experience setting up internal developer platforms or self-service infra.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8457846
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Required Site Reliability Engineer
Realize your potential by joining the leading performance-driven advertising company!
As Site Reliability Engineer on the IT Production team in our TLV Office, youll play a vital role in building robust services and solving infrastructure challenges with automations while working with cutting-edge technologies and bringing those to their limits on our mostly on-prem cloud like infrastructure.
How youll make an impact:
As a Site Reliability Engineer, youll bring value by:
Ensure Reliability & Scalability: Design, implement and manage highly reliable and scalable distributed systems across our on-premise, cloud and AI/ML environments. Proactively optimize performance, efficiency, resource utilization and cloud cost.
Drive Automation: Automate repetitive tasks, infrastructure provisioning, configuration and deployments using IaC and scripting languages (e.g., Python, Go, Rust).
Develop Observability & Capacity: Implement comprehensive monitoring and alerting systems to ensure system health. Collaborate on capacity planning to meet future growth.
Maintain Security & Compliance: Integrate security best practices and ensure compliance with industry standards.
Lead Incident Management: Participate in on-call rotations, lead incident responses and conduct root cause analysis to minimize downtime.
Foster Collaboration & Improvement: Work closely with development, operations and security teams to drive shared responsibility and continuous improvement in SRE practices.
Our Tech Stack:
Linux, Kubernetes, nginx, Istio, AWS, GCP, Azure, Alicloud, Fastly, Terraform, Consul, Prometheus, Loki, Grafana, Airflow, Redis, Kafka, Vector, Hadoop, Cassandra, Vertica, MySQL, HDFS, ELK.
Requirements:
7 years of experience as an SRE, DevOps Engineer, System Administrator in a large distributed environment with focus on Linux operating systems.
Experience supporting, troubleshooting and scaling large distributed systems in production.
Deep understanding of HTTP protocol, including HTTP/1.1, HTTP/2, caching semantics, TLS and gRPC delivery.
Experience configuring and operating CDN services (e.g., Akamai, Fastly, Cloudflare, AWS CloudFront).
Deep understanding in Linux system internals and system performance tuning.
Experience with Configuration Management Tools (Puppet, Ansible, Chef, Terraform).
Experience programming in at least one of the following languages (Python, Golang, Rust, Ruby, C++, Java).
Experience with monitoring and metrics collection systems (Prometheus, Grafana, ELK).
Experience with cloud providers and platforms (AWS, Azure, GCP, Alibaba).
Experience with containerization technologies (Kubernetes, Docker).
Deep understanding of networking principles (TCP/IP, DNS, load balancing).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8439391
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
08/12/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Were seeking a Staff Site Reliability Engineer to join our team and take ownership of the automation, reliability, and scalability of our platform. You will play a key role in refining our service architecture, CI/CD pipelines, and multi-cloud deployments, ensuring that our systems remain secure, efficient, and resilient. This role is ideal for a hands-on problem-solver who thrives in a fast-paced environment and is comfortable with ambiguity. We're looking for someone who can take ownership of a problem and drive solutions.

This is a hybrid role based in Tel Aviv, offering a chance to work closely with our founders and a team of highly talented individuals. Together, you will help shape the vision of our product and company while solving challenging infrastructure problems at scale.

What youll be doing

Design, build, and operate Okta's global production infrastructure
Support a highly available and large scale multi-cloud environment as part of an on-call rotation
Automate workflows, deployments, and infrastructure processes
Design, implement, and optimize CI/CD pipelines for faster and more reliable delivery
Refine and maintain multi-tenant service architecture with strong security and scalability
Deploy and manage infrastructure using Kubernetes, Terraform, and other modern IaC tools
Build robust monitoring, logging, and alerting systems to ensure platform reliability
Write automation and maintenance scripts, primarily in Python and Golang
Collaborate with engineering teams to improve developer experience and delivery speed
Lead by example, responding swiftly and efficiently to production incidents, and driving team learning and process improvements
Requirements:
7+ years of experience as a DevOps or Site Reliability engineer on high-scale distributed systems in Linux environments
Extensive experience with operating customer-facing production services
Strong experience in application and infrastructure monitoring in multi-tenant setups
Hands-on expertise in CI/CD pipeline design and implementation
Deep understanding of Kubernetes, its components, and operational flows
Proficiency in Golang and/or Python scripting for automation and maintenance
Extensive experience with Terraform (or equivalent Infrastructure as Code tools)
Highly motivated, self-learning, and passionate about improving infrastructure and delivery processes
A proven track record of successful SRE engagements, working closely with engineering teams
A proven ability to operate with a high degree of autonomy, taking ownership of complex, open-ended problems and driving solutions from ambiguous requirements to a robust implementation
A strong history of leading and mentoring other engineers, elevating the technical capabilities and operational excellence of the team
Experience collaborating effectively with geographically distributed teams, navigating different time zones and communication challenges to ensure project success
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8448375
סגור
שירות זה פתוח ללקוחות VIP בלבד