דרושים » תוכנה » Senior Site Reliability Engineer (SRE)

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
We are seeking a skilled Site Reliability Engineer (SRE) to join our team and help build, maintain, and improve the reliability, scalability, and performance of our systems. As an SRE, you will be responsible for owning observability tools, driving incident management processes, and implementing automation to enhance our infrastructure. This role involves collaborating across teams to ensure a robust and efficient technology stack supporting mission-critical systems.

You will:
Proactively enhance system reliability, scalability, and performance through automation, monitoring, and capacity planning.
Develop and maintain observability systems, including distributed tracing, logging, and metrics platforms.
Establish and maintain organizational standards for monitoring, leveraging tools like Prometheus, Grafana, and OpenTelemetry.
Drive incident management, root cause analysis, and continuous improvement initiatives.
Partner with development teams to integrate reliability best practices into the software development lifecycle.
Manage infrastructure at scale in cloud services (AWS advantage) and platforms like Kubernetes or ECS.
Optimize resource utilization to reduce costs while maintaining service quality.
Requirements:
At least 5 years of experience as a SRE.
Strong experience with Observability Tools: Proficiency with OpenTelemetry, Grafana, Prometheus, and ELK stack (Elasticsearch, Logstash, Kibana).
Experience with Cloud Platforms: In-depth knowledge of AWS services, including EC2, S3, RDS, and CloudFormation/Terraform for infrastructure-as-code.
Proficiency in scripting and/or development languages like Bash or Python.
Thorough understanding of CI/CD pipelines and automation tools.
Understanding of Infrastructure as Code, and strong experience with automation tools like Terraform and/or Ansible.
Solid troubleshooting and debugging skills.
A team player with a strong can-do mentality.
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8163101
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Are you passionate about ensuring system reliability, scalability, and performance? Do you thrive in a dynamic environment where automation and operational excellence are key?
We are looking for a Site Reliability Engineer (SRE) to join our team and play a crucial role in designing, implementing, and maintaining our cloud-based infrastructure. In this role, you will collaborate across teams to drive automation, improve system resilience, and optimize performance while fostering a culture of reliability.

Responsibilities:
System Reliability Ensure high availability and performance of services through effective monitoring, incident management, and root cause analysis.
Automation & Tooling Develop and maintain automation for infrastructure provisioning, configuration management, and application deployment.
Performance Optimization Analyze and enhance system performance, including load balancing, caching, and database tuning. Conduct regular capacity planning.
Incident Response & Troubleshooting Lead incident response efforts, participate in on-call rotations, and troubleshoot complex infrastructure issues.
Security & Compliance Collaborate with security teams to implement best practices and ensure compliance with relevant standards (ISO 27001, SOC 2, etc.).
Collaboration & Mentorship Work closely with developers, DevOps, Support, and product teams to enhance application reliability and implement SRE best practices.
Requirements:
Requirements:
5+ years in site reliability engineering, DevOps, or related roles.
Proven experience managing large-scale, cloud-based infrastructure in GCP, AWS, or Azure.
Expertise in container orchestration (Kubernetes, Docker) and microservices architecture.
Strong proficiency in scripting and programming languages (Python, Go, Bash, etc.).
Experience with CI/CD pipelines, infrastructure as code (Terraform, CloudFormation), and configuration management (Ansible, Puppet, Chef).
Hands-on experience with monitoring and observability tools (Datadog, Prometheus, Grafana, ELK Stack).
Deep understanding of networking concepts, DNS, load balancing, and distributed systems.
Strong problem-solving skills, excellent communication, and a proactive mindset.

Advantages:
Certifications AWS Certified Solutions Architect, GCP Professional Cloud Architect, or Kubernetes certifications (CKA, CKAD).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8127121
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
03/04/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a Site Reliability Engineer (SRE) to join our Engineering team. Someone who has a passion for observability, monitoring, automation, and high-availability systems, and who has a desire to solve complex technological challenges with a proactive approach to continuous improvement.

We use an interesting and mixed technology stack: Kubernetes, Terraform, CI/CD pipelines, Datadog, Prometheus, and cloud-native architectures.

In this position, you will use your expertise in building and scaling SRE operations, and will design, implement, and operate a world-class reliability strategy.
Key Responsibilities
Develop and maintain our monitoring, alerting, and logging systems, ensuring high visibility into production environments.
Implement automation to improve system reliability, scalability, and efficiency.
Troubleshoot and resolve production incidents, leading root cause analyses and implementing permanent fixes.
Collaborate with software engineers and DevOps teams to enhance application performance and resilience.
Continuously improve operational processes, focusing on reducing toil and improving reliability.
Requirements:
3+ years of experience as an SRE, DevOps Engineer, or in a similar role.
Hands-on experience with monitoring and observability tools like Datadog, Prometheus, and Grafana.
Strong understanding of Linux systems, networking, and cloud-native architectures.
Experience with Kubernetes, Terraform, and CI/CD pipelines.
A problem solver, capable of finding creative solutions and getting things done.
Fluent with incident management, RCA processes, and operational best practices.
It would be great if you also have:
Experience in high-scale distributed systems.
Background in security and compliance for cloud infrastructure.
Familiarity with AWS (EKS, EC2, RDS, S3, networking configurations).
Proficiency in Python, Go, or Bash for automation and scripting.
Understanding of cost optimization and resource management in cloud environments.
Familiarity with machine learning or predictive analytics for proactive reliability management.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8127048
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
07/04/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking to hire a talented, self-driven and passionate Senior Infrastructure Engineer to build and maintain the cloud infrastructure for our highly available SaaS application as well as our machine learning and data engineering stack.

As a Senior Infrastructure Engineer, you will be responsible for designing, implementing, and maintaining the cloud infrastructure and DevOps processes that power our products and internal tooling. You will work closely with all data and development teams and lead the companys security and compliance vectors. You will ensure a highly reliable, scalable, and secure infrastructure that supports our rapid growth and product innovation, while maintaining observability and cost-effectiveness of our cloud resources and data.

Responsibilities:
Cloud Infrastructure Management: Architect, deploy, and manage our cloud infrastructure (AWS), ensuring high availability, scalability, and security.
Software Engineering: Be a top notch SW engineer, harnessing your coding and architectural skills, as well as researching skills, for our infra stack.
Infrastructure as Code (IaC): Define and maintain infrastructure using tools like Terraform, CloudFormation, or Pulumi to manage resources efficiently and reproducibly.
Monitoring & Incident Management: Build and manage monitoring and alerting systems to ensure uptime, and respond to incidents with root cause analysis and remediation.
DevOps & Automation: Implement and maintain CI/CD pipelines to streamline development workflows and automate deployment processes across development, staging, and production environments, and across different parts of our solution. While our development teams are expected to write and maintain their own CI, you will act as a supervisor and professional authority, and maintain cross team and complex automation.
Collaboration and technical leadership: Partner with software engineers, data engineers, and machine learning teams to support their infrastructure needs and guide the evolution of our infrastructure team.
Cost Optimization: Monitor cloud spend and optimize resources to ensure cost-effective infrastructure without sacrificing performance or security.
Security & Compliance: Implement security best practices, including access control, network security, monitoring and ensuring the infrastructure is compliant with relevant industry standards (e.g., SOC2, GDPR).
דרישות:
Requirements:
Experience: 5+ years of hands-on experience in cloud infrastructure, DevOps and platform engineering in production environments.
Cloud Platforms and IaC: Expertise in managing cloud infrastructure on at least one of the major providers: AWS, GCP, Azure. Proficient in Infrastructure as Code tools such as Terraform, CloudFormation, or Pulumi.
Containerization & Orchestration: Solid experience with Docker and Kubernetes.
Monitoring & Logging: Hands-on experience with monitoring tools (Prometheus, Grafana) and logging systems (ELK, Splunk, or equivalent).
Software Engineering: Proficient Software engineering, architecture, as well as scripting languages such as Python, Bash, or Go. Full control of version control systems such as Git.
DevOps Tools: Strong experience with CI/CD pipelines and automation using Jenkins, CircleCI, GitHub Actions, GitLab CI, or similar.
Networking: Strong understanding of cloud networking, VPNs, VPCs, DNS, and firewalls.
Security Best Practices: Experience implementing cloud security best practices, including IAM, encryption, and key management.
Startup Experience: Previous experience in a fast-paced startup environment, where adaptability and hands-on execution are key.
Team Player: Strong communication skills and ability to work cross-functionally with different teams.

Advantages:
ML Infrastructure: Experience supporting machine learning pipelines and deploying ML models to production environments.
Data Engineering: Familiarity with data engineering tools like Apache Spark, Airflow, or similar.#EN המשרה מיועדת לנשים ולגברים כאחד.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8131896
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
02/04/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
We are seeking an experienced and motivated SRE Tech Lead to join our dynamic Site Reliability Engineering (SRE) team. As a Tech Lead you will play a crucial role in enhancing the reliability, performance, and scalability of our systems and services. You will be a part of a global commando team of highly skilled SREs, driving best practices and innovations for optimal system operations, while protecting critical companies systems in a real time.
In this role, you will be responsible for:
Drive incident response and post-mortem processes, fostering a culture of continuous improvement.
Design, build and improve internal tools and automation software to make maintaining production services easier and safer.
Lead reliability-focused practices such as SLO (Service Level Objective) design and implementation, Failure Analysis, Load and Capacity Planning, Service Reviews, Architecture Designs, Incident Postmortems, and others.
Participate in the on-call rotation, providing expertise and support during critical system incidents and ensuring timely resolution.
Requirements:
Minimum 5 years of Software Engineering experience with .Net, NodeJs or other object-oriented languages.
Knowledge of architecture and application design experience.
Excellent troubleshooting and debugging skills.
Excellent verbal and written communication skills in English.
Basic knowledge of AWS or other cloud platforms on the infrastructure level
Preferred:
Experience with building AzureDevops CI/CD pipelines
Experience working on large-scale, high-traffic platforms.
Distributed monitoring experience with logging, metrics and tracing using OpenTelemetry and Prometheus.
Additional scripting languages: bash, powershell, python
Previous experience working as SRE
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8125103
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
02/04/2025
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking an experienced and motivated Engineering Backend Tech Lead to join our dynamic Site Reliability Engineering (SRE) team. As an Engineering Backend Tech Lead you will play a crucial role in enhancing the reliability, performance, and scalability of our systems and services. You will be a part of a global commando team of highly skilled SREs, driving best practices and innovations for optimal system operations, while protecting critical companies systems in a real time.
In this role, you will be responsible for:
Drive incident response and post-mortem processes, fostering a culture of continuous improvement.
Design, build and improve internal tools and automation software to make maintaining production services easier and safer.
Lead reliability-focused practices such as SLO (Service Level Objective) design and implementation, Failure Analysis, Load and Capacity Planning, Service Reviews, Architecture Designs, Incident Postmortems, and others.
Participate in the on-call rotation, providing expertise and support during critical system incidents and ensuring timely resolution.
Requirements:
Minimum 5 years of Software Engineering experience with .Net, NodeJs or other object-oriented languages.
Knowledge of architecture and application design experience.
Excellent troubleshooting and debugging skills.
Excellent verbal and written communication skills in English.
Basic knowledge of AWS or other cloud platforms on the infrastructure level
Preferred:
Experience with building AzureDevops CI/CD pipelines
Experience working on large-scale, high-traffic platforms.
Distributed monitoring experience with logging, metrics and tracing using OpenTelemetry and Prometheus.
Additional scripting languages: bash, powershell, python
Previous experience working as SRE
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8125295
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
31/03/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
seeking a results-driven DevOps Team Lead to head the DevOps and Infrastructure team within our R&D organization. This role requires strategic vision, technical expertise, and a proactive approach to drive operational excellence, empower teams with robust tools and automation, and ensure high system reliability and scalability.

As the DevOps Team Lead, you will be instrumental in delivering critical KPIs, including system uptime, automation, incident management, and collaboration with development and QA teams to enable self-sufficiency. Additionally, you will serve as the leader for strategic projects, identifying opportunities to improve infrastructure and operational processes, setting long-term goals, and executing initiatives that align with Cyberints business objectives and growth.

Key Responsibilities
Strategic Leadership
System Reliability & Uptime

Lead initiatives to ensure system reliability, minimize disruptions, and maintain high availability for Cyberints SaaS platform.
Establish and manage proactive monitoring, alerting, and preventive maintenance strategies.
Drive incident prevention efforts, ensuring robust failover and disaster recovery mechanisms.
Develop and maintain playbooks to enable rapid diagnosis and resolution of issues.
Automation, Infrastructure as Code (IaC), & Self-Service Enablement

Champion the adoption of automation and IaC to streamline infrastructure management and deployments.
Build and enhance self-service tools and frameworks, empowering R&D teams to operate independently with minimal reliance on DevOps.
Continuously improve CI/CD pipelines to optimize deployment speed and reliability.
Collaboration & Support for Self-Sufficiency

Collaborate closely with development, QA, and support teams to deliver tools and frameworks that promote team autonomy and efficiency.
Advocate for cross-functional engagement to align operational processes with R&D objectives.
Provide training and mentorship to teams on using DevOps tools effectively.
Accountability, Ownership, & Scalability

Take ownership of all systems and infrastructure, ensuring solutions are scalable, resilient, and aligned with Cyberints growth objectives.
Establish clear accountability frameworks for maintaining infrastructure and delivering on key projects.
Design and execute a roadmap to support self-service-oriented and scalable solutions.


Identify and lead strategic projects to enhance Cyberints platform scalability, reliability, and operational efficiency.
Develop and execute a roadmap for critical infrastructure and DevOps initiatives that drive business success.
Collaborate with senior stakeholders to align projects with organizational priorities and deliver measurable outcomes.
Requirements:
5+ years of experience in DevOps or SRE roles, with 2+ years in a leadership capacity.
Proven expertise in building and maintaining highly available, cloud-native environments (AWS preferred).
Experience with Kubernetes, Terraform, CI/CD pipelines, and monitoring technology and tools (Prometheus, Grafana, Jenkins, ArgoCD, Terraform, Elasticsearch, Redis, EKS, etc.).
Skills & Expertise

Strong understanding of automation, Infrastructure as Code (IaC), and self-service enablement.
Expertise in incident management and a track record of delivering reliable, scalable systems.
Hands-on experience with scripting and automation tools (Python, Bash).
Deep understanding of containerization, orchestration, and cloud-native architectures.
Familiarity with cost monitoring and optimization strategies to ensure infrastructure is both efficient and cost-effective.
Knowledge of security best practices for infrastructure and DevOps environments.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8121466
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
02/04/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking an experienced and motivated Backend & DevOps Technical Lead to join our dynamic Site Reliability Engineering (SRE) team. As a Technical Lead you will play a crucial role in enhancing the reliability, performance, and scalability of our systems and services. You will be a part of a global commando team of highly skilled SREs, driving best practices and innovations for optimal system operations, while protecting critical companies systems in a real time.
In this role, you will be responsible for:
Drive incident response and post-mortem processes, fostering a culture of continuous improvement.
Design, build and improve internal tools and automation software to make maintaining production services easier and safer.
Lead reliability-focused practices such as SLO (Service Level Objective) design and implementation, Failure Analysis, Load and Capacity Planning, Service Reviews, Architecture Designs, Incident Postmortems, and others.
Participate in the on-call rotation, providing expertise and support during critical system incidents and ensuring timely resolution.
Requirements:
Minimum 5 years of Software Engineering experience with .Net, NodeJs or other object-oriented languages.
Knowledge of architecture and application design experience.
Excellent troubleshooting and debugging skills.
Excellent verbal and written communication skills in English.
Basic knowledge of AWS or other cloud platforms on the infrastructure level
Preferred:
Experience with building AzureDevops CI/CD pipelines
Experience working on large-scale, high-traffic platforms.
Distributed monitoring experience with logging, metrics and tracing using OpenTelemetry and Prometheus.
Additional scripting languages: bash, powershell, python
Previous experience working as SRE
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8125381
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking a highly skilled and experienced Architecture & Operations Lead to drive the development of infrastructure for automation testing, internal DevOps, CI/CD, and deployment.

This role is critical in designing and maintaining scalable and high-performance infrastructure for software development, testing, and production environments. The ideal candidate has a strong background in cloud infrastructure, automation, microservices architecture, performance monitoring, and software best practices.

Key Responsibilities:

Infrastructure & Automation:
Design and implement infrastructure for automation testing and internal DevOps processes.
Develop and manage CI/CD pipelines, ensuring smooth and automated software deployment.
Architect and maintain scalable infrastructure on AWS, leveraging Terraform and infrastructure-as-code (IaC) best practices.
Define and enforce software best practices, ensuring reliability, maintainability, and security.

Operations & Performance Monitoring:
Lead performance monitoring and optimization efforts using tools like APM (Application Performance Monitoring) and New Relic.
Implement Site Reliability Engineering (SRE) principles to enhance system reliability and scalability.
Monitor and improve system performance, ensuring high availability and fault tolerance.

Collaboration Across Teams:
Work closely with development, product, and DevOps teams to align infrastructure strategies with system architecture.
Conduct design reviews and provide recommendations to optimize software and infrastructure performance.
Oversee GitHub Actions workflows for efficient automation and deployment processes.

Security & Compliance:
Ensure infrastructure meets industry security and compliance standards.
Collaborate with security teams to perform vulnerability assessments and implement secure deployment strategies.

Software Development & Best Practices:
Define and enforce best practices for software development and deployment.
Ensure backward compatibility compliance, preventing API breakages.
Drive automation initiatives to reduce manual effort and increase efficiency.
Requirements:
Key Experience and Qualifications Required:
Bachelors or Masters degree in Computer Science, Software Engineering, or a related field.
8+ years of experience in software infrastructure, DevOps, or cloud architecture, including leadership roles.
Expertise in designing and managing CI/CD pipelines using GitHub Actions.
Strong experience with AWS, Terraform, and infrastructure-as-code (IaC) principles.
Proficiency in Python for automation and infrastructure management.
Strong understanding of microservices architecture and distributed systems.
Experience with performance monitoring tools such as New Relic and APM solutions.
Familiarity with containerization and orchestration (Docker, Kubernetes).
Hands-on experience with SRE methodologies and best practices.
Strong problem-solving skills with a focus on scalability and system-wide impact.

Preferred Skills:
Experience in high-availability system design and cloud-based infrastructure optimization.
Knowledge of compliance and security frameworks for cloud environments.
Strong analytical skills for performance tuning and optimization.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8164868
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
09/04/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for an experienced Backend Infra Engineer with good DevOps orientation to join our Infrastructure team in order to keep building out our products and services to support new and exciting functionalities.

You Will:
Tackle complex challenges: Work on highly available, cloud-native applications used by millions of customers globally, handling millions of transactions monthly.
Immerse in the Fintech industry: Gain deep knowledge of Fintech, including payments, banking systems, fraud prevention, and more.
Sharpen technical skills: Collaborate with a strong engineering team to plan, implement, and maintain our cloud environment, profile and fine-tune services, and improve security.
Join a high-performing team: Work alongside highly experienced DevOps, Infrastructure, and Security engineers.

Responsibilities:
Manage cloud operations, monitoring, and develop tools, including integration of modern OSS tools.
Take ownership of our infrastructure and cloud architecture.
Develop and maintain CI/CD pipelines for cloud, web, and mobile applications.
Own the data infrastructure and continuously improve its scalability and readiness.
Guide R&D team developers in deploying their applications.
Build solutions to monitor, improve uptime, enhance performance, and ensure system stability (SRE).
Uphold high engineering standards for execution, code quality, and customer satisfaction.
Work across multiple cloud environments, including AWS, GCP, and Cloudflare.
Requirements:
Bachelors degree in Computer Science, Software Engineering, or equivalent industry experience.
4+ years of hands-on experience in building and maintaining large-scale, highly available cloud production infrastructure.
Proficient coding skills in Node.js, Python, or Golang.
Strong understanding of distributed systems and complex production environments.
Experience with Infrastructure as Code (IAC) tools such as Terraform, Pulumi, or Ansible.
Solid understanding of security principles and hands-on experience with security tools and products.
Experience managing SQL and NoSQL databases, as well as message brokers like Kafka and RabbitMQ.
Knowledge of Kubernetes and CI/CD tools such as GitHub Actions, Jenkins, or GitLab CI.
Familiarity with networking settings, including VPC and cross-site VPN configurations.
Experience with modern observability/monitoring systems and incident management tools.
Strong skills in performance measurement and tuning.
Startup mindset: Move fast, take ownership, and strive for success.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8134551
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
Were looking for an experienced, highly motivated SRE team lead to lead our SRE team in utilizing methodologies and technologies to implement highly scalable and available production environments.

As a SRE team lead, you will lead a small but growing team. You will have the freedom to explore and implement the newest technologies while leading and mentoring the team. You will be responsible for designing and implementing monitoring and alerting infrastructure and defining the correct measurements for a highly available production environment. You will learn new things every minute of every day and constantly be challenged.

Responsibilities:
Lead and mentor the SRE team to design and implement reliable, highly available, and scalable production monitoring infrastructure.
Explore and implement new technologies, from POC through to production.
Ensure high uptime and reliability of the production environment.
Perform root cause analysis for complex failures and offer modern solutions and tools.
Analyze performance and stability issues.
Collaborate closely with DevOps, R&D, product, and support teams to define cross-organizational processes.
Design, develop, and drive troubleshooting & mitigation tools as part of driving a self-healing agenda.
Requirements:
At least 4 years of experience as an SRE or in a DevOps role.
At least 2 years of experience leading a team or as a tech leader.
Proven monitoring and alerting experience (ELK, Grafana, Prometheus, etc.).
Deep expertise in Kubernetes, container orchestration, and cloud infrastructure (AWS, Azure, or GCP).
Experience with a programming language (Python, Java, Go, Ruby, etc.).
Scripting and automation skills (Bash, Python, etc.).
Networking skills.
Experience with IAC tools such as Terraform, etc.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8141243
סגור
שירות זה פתוח ללקוחות VIP בלבד