דרושים » מחשבים ורשתות » Site Reliability Engineer (SRE)

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking an experienced Senior Site Reliability Engineer to join our SRE team as part of our Platform Engineering group. This role involves taking ownership of monitoring, deploying, and ensuring the reliability of production-grade modern SaaS platforms across Cloud and On-Premise environments.
Responsibilities:
Lead initiatives to enhance product reliability and system readiness.
Design and implement sophisticated monitoring solutions to ensure high availability and performance of our production platform.
Oversee and refine the entire product reliability pipeline.
Proactively troubleshoot and resolve issues across production environments.
Champion an "Everything as Code" approach using a wide range of technologies including Ansible, Terraform, Helm, Python and more.
Develop advanced tools for automation, deployment, monitoring, and operations.
Exhibit excellent communication and interpersonal skills to effectively collaborate within the team and across departments.
Promoting best practices in reliability and system operations.
Requirements:
At least 4-5 years of experience as a DevOps or Site Reliability Engineer.
In-depth knowledge of microservices architectures and technologies such as Kubernetes.
Comprehensive understanding of cloud & on-prem environments and hybrid solutions.
Proficiency with one or more major cloud providers. (AWS experience is an advantage)
Advanced experience with CI/CD technologies including Jenkins, GitHub Actions, and ArgoCD.
Proficient coding and scripting capabilities in Python, Bash, or similar languages.
Strong team player with proven ability to lead and inspire.
Advantages:
Prior experience with endpoint security products (agents, sensors, collectors).
Background in working with AI components (training, inference, serving).
Tech Stack: AWS, Kubernetes, EKS, RKE2, ECS, SageMaker, Jenkins, GitHub, Terraform, Python, Ansible, Docker + Compose, ArgoCD, MongoDB, RabbitMQ, Redis, Go, Neo4J, AI, and more.
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8162480
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking an experienced MLOps Engineer to join our Platform and DevOps team as part of our Engineering group. We are on the lookout for an individual who is passionate about software design, development and deployment. The job involves writing production-grade modern DevOps solutions that will be shipped to the cloud and on prem solutions.
Responsibilities:
Design, develop, build and maintain the best solutions for our production platform.
Everything as a code approach (IaC): Run our infrastructure with a wide range of technologies including Ansible, Terraform, and Kubernetes
Work closely with our data scientists and developers to create training, inference and serving pipelines.
Build and maintain tools for automation, deployment, monitoring, and operations.
Troubleshoot issues in our development, production, and test environments
Excellent communication and people skills
Work well in a team!
Requirements:
At least 4-5 years experience in one of the following roles: DevOps, MLOps.
Experience with design, build, development and maintenance of DevOps solutions.
Experience with one of the major cloud providers: AWS, GCP, Azure.
Experience Working cloud & on-prem environments and solutions.
Solid Linux system expert skills - a must
Vast Experience with applications and tooling including Kubernetes, Helm, Terraform, Ansible, SQL/NoSQL/Graph DBs, MLFlow, Jenkins, GitHub, etc.
Experienced with CI\CD technologies.
Experience with bootstrapping projects, introducing new technologies and building systems from scratch.
Good coding capabilities (python\bash etc.)
Advantages:
Experience working on endpoint products (agent/sensors/collectors)
Experience working on AI components (Training, inference, serving)
Tech stack:
AWS, Kubernetes, EKS, ECS, Jenkins, IaC, GitHub, Terraform, Python, Ansible, Docker+Compose, ArgoCD, MongoDB, RabbitMQ, Redis, Go, Neo4J, AI, MLFlow, Clickhouse, Jupyter and more.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8162603
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for an experienced Senior Platform Engineer who is passionate about software design, development, and deployment. The job involves writing production-grade modern DevOps/Platform solutions and will be part of building Computing Services layer which will be deployed anywhere.
Responsibilities:
Develop, build and maintain the best services and solutions for our Computing Services platform.
Apply an Everything as Code (EaC) approach using technologies such as Python, Ansible, Terraform, and Kubernetes.
Build and develop services that will be part of Computing Services and will run on remote infrastructure.
Develop and maintain tools for automation, deployment, monitoring, and operations specifically tailored for remote environments.
Troubleshoot issues in our development, production, and test environments.
Collaborate effectively with team members and communicate complex technical issues clearly.
Requirements:
At least 4-5 years of experience with DevOps or Platform technologies.
Experience with the design, build, development, and maintenance of DevOps services and solutions.
Proven experience working with remote environments and solutions like on-premise or customer private cloud.
Experience with deployment technologies and CI/CD technologies in AWS for building and packaging.
Strong knowledge of networking and security principles, including developing and deploying encryption and signing services.
Proficient coding capabilities in Python and Bash.
Hands-on experience with developing infrastructure services that run remotely, including customized Kubernetes.
Proven track record of delivering packages to remote customer sites.
Experience with air-gapped on-premise solutions is a significant advantage.
Experience with deploying and maintaining robust and automated services/pipelines.
Must-Have Skills:
Kubernetes, Python, Bash, Ansible, Linux, Networking, Docker.
Advantages:
Experience with AI components (training, inference, serving).
Tech Stack:
Kubernetes, Jenkins, Ansible, Terraform, Docker + Compose, MLFlow, Kserve, Minio, GitHub, Python, Bash, Linux, MongoDB, RabbitMQ, Redis, Neo4J.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8162507
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Are you passionate about ensuring system reliability, scalability, and performance? Do you thrive in a dynamic environment where automation and operational excellence are key?
We are looking for a Site Reliability Engineer (SRE) to join our team and play a crucial role in designing, implementing, and maintaining our cloud-based infrastructure. In this role, you will collaborate across teams to drive automation, improve system resilience, and optimize performance while fostering a culture of reliability.

Responsibilities:
System Reliability Ensure high availability and performance of services through effective monitoring, incident management, and root cause analysis.
Automation & Tooling Develop and maintain automation for infrastructure provisioning, configuration management, and application deployment.
Performance Optimization Analyze and enhance system performance, including load balancing, caching, and database tuning. Conduct regular capacity planning.
Incident Response & Troubleshooting Lead incident response efforts, participate in on-call rotations, and troubleshoot complex infrastructure issues.
Security & Compliance Collaborate with security teams to implement best practices and ensure compliance with relevant standards (ISO 27001, SOC 2, etc.).
Collaboration & Mentorship Work closely with developers, DevOps, Support, and product teams to enhance application reliability and implement SRE best practices.
Requirements:
Requirements:
5+ years in site reliability engineering, DevOps, or related roles.
Proven experience managing large-scale, cloud-based infrastructure in GCP, AWS, or Azure.
Expertise in container orchestration (Kubernetes, Docker) and microservices architecture.
Strong proficiency in scripting and programming languages (Python, Go, Bash, etc.).
Experience with CI/CD pipelines, infrastructure as code (Terraform, CloudFormation), and configuration management (Ansible, Puppet, Chef).
Hands-on experience with monitoring and observability tools (Datadog, Prometheus, Grafana, ELK Stack).
Deep understanding of networking concepts, DNS, load balancing, and distributed systems.
Strong problem-solving skills, excellent communication, and a proactive mindset.

Advantages:
Certifications AWS Certified Solutions Architect, GCP Professional Cloud Architect, or Kubernetes certifications (CKA, CKAD).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8127121
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
We are seeking a skilled Site Reliability Engineer (SRE) to join our team and help build, maintain, and improve the reliability, scalability, and performance of our systems. As an SRE, you will be responsible for owning observability tools, driving incident management processes, and implementing automation to enhance our infrastructure. This role involves collaborating across teams to ensure a robust and efficient technology stack supporting mission-critical systems.

You will:
Proactively enhance system reliability, scalability, and performance through automation, monitoring, and capacity planning.
Develop and maintain observability systems, including distributed tracing, logging, and metrics platforms.
Establish and maintain organizational standards for monitoring, leveraging tools like Prometheus, Grafana, and OpenTelemetry.
Drive incident management, root cause analysis, and continuous improvement initiatives.
Partner with development teams to integrate reliability best practices into the software development lifecycle.
Manage infrastructure at scale in cloud services (AWS advantage) and platforms like Kubernetes or ECS.
Optimize resource utilization to reduce costs while maintaining service quality.
Requirements:
At least 5 years of experience as a SRE.
Strong experience with Observability Tools: Proficiency with OpenTelemetry, Grafana, Prometheus, and ELK stack (Elasticsearch, Logstash, Kibana).
Experience with Cloud Platforms: In-depth knowledge of AWS services, including EC2, S3, RDS, and CloudFormation/Terraform for infrastructure-as-code.
Proficiency in scripting and/or development languages like Bash or Python.
Thorough understanding of CI/CD pipelines and automation tools.
Understanding of Infrastructure as Code, and strong experience with automation tools like Terraform and/or Ansible.
Solid troubleshooting and debugging skills.
A team player with a strong can-do mentality.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8163101
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking a highly skilled DevOps Engineer to join the TLV Foundation Services Team within the BDC organization. In this role, you will be responsible for designing, implementing, and maintaining scalable, highly available production systems. You will collaborate with developers and global infrastructure teams to ensure seamless deployment, automation, and system reliability.
What Youll Do:
System Reliability & Scalability: Build and maintain highly available, scalable, and resilient production systems.
Automation & Infrastructure as Code: Develop and manage infrastructure using Terraform and other IaC tools.
Cloud Infrastructure Management: Configure and manage AWS, Azure, or similar cloud platforms to optimize performance and cost efficiency.
Containerization & Orchestration: Work extensively with K8s and Istio to manage containerized environments.
CI/CD Pipelines: Design, build, and maintain CI/CD automation using Jenkins, GitHub, or similar tools.
Monitoring & Logging: Implement and manage observability tools for monitoring, logging, and metrics collection in large-scale production environments.
Incident Management: Rapidly identify and resolve production issues, ensuring minimal downtime.
Security & Compliance: Implement best practices for security, access control, and compliance within cloud and on-prem environments.
Collaboration: Work closely with developers and infrastructure teams to streamline deployment, automation, and operations.
Support & Maintenance: Provide on-call support as needed to ensure system reliability.
Requirements:
3+ years of DevOps experience in a cloud-based production environment.
Strong expertise in Docker, K8s, and containerized application management.
Experience with AWS, Azure, or similar cloud platforms.
Hands-on experience with Infrastructure as Code (IaC) tools like Terraform.
Proficiency in CI/CD automation, including Jenkins, GitHub Actions, or similar
Knowledge of monitoring and logging tools
Strong scripting skills in Bash and experience with programming languages like Python or Java.
Experience with GitOps methodologies.
Familiarity with Istio and service mesh architectures.
Bonus Points For:
Hands-on experience managing data lakes or data warehouses.
Prior experience in Enterprise environments.
Strong problem-solving abilities and a passion for learning new technologies.
Ability to thrive in a fast-paced, dynamic environment and tackle challenges head-on.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8120388
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
03/04/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a Site Reliability Engineer (SRE) to join our Engineering team. Someone who has a passion for observability, monitoring, automation, and high-availability systems, and who has a desire to solve complex technological challenges with a proactive approach to continuous improvement.

We use an interesting and mixed technology stack: Kubernetes, Terraform, CI/CD pipelines, Datadog, Prometheus, and cloud-native architectures.

In this position, you will use your expertise in building and scaling SRE operations, and will design, implement, and operate a world-class reliability strategy.
Key Responsibilities
Develop and maintain our monitoring, alerting, and logging systems, ensuring high visibility into production environments.
Implement automation to improve system reliability, scalability, and efficiency.
Troubleshoot and resolve production incidents, leading root cause analyses and implementing permanent fixes.
Collaborate with software engineers and DevOps teams to enhance application performance and resilience.
Continuously improve operational processes, focusing on reducing toil and improving reliability.
Requirements:
3+ years of experience as an SRE, DevOps Engineer, or in a similar role.
Hands-on experience with monitoring and observability tools like Datadog, Prometheus, and Grafana.
Strong understanding of Linux systems, networking, and cloud-native architectures.
Experience with Kubernetes, Terraform, and CI/CD pipelines.
A problem solver, capable of finding creative solutions and getting things done.
Fluent with incident management, RCA processes, and operational best practices.
It would be great if you also have:
Experience in high-scale distributed systems.
Background in security and compliance for cloud infrastructure.
Familiarity with AWS (EKS, EC2, RDS, S3, networking configurations).
Proficiency in Python, Go, or Bash for automation and scripting.
Understanding of cost optimization and resource management in cloud environments.
Familiarity with machine learning or predictive analytics for proactive reliability management.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8127048
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
07/04/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking a highly experienced DevOps Engineering Manager to lead our team. The DevOps Engineering Manager will be responsible for leading and shaping the implementation of DevOps practices across our organization. This role focuses on strategic automation, continuous integration, and delivery, infrastructure as code, and DevOps culture adoption. The ideal candidate will possess a strong background in software development, cloud architecture, and system administration, with exceptional leadership and collaboration skills to drive cross-functional initiatives.
Lead and manage a team of DevOps engineers, providing mentorship, guidance, and support.
Develop and implement advanced DevOps strategies and practices to enhance efficiency and reliability.
Collaborate with software engineering teams to integrate DevOps practices into the development process.
Architect, build and maintain deployment pipelines and automation tools for software releases
Oversee the implementation and support of system and application security measures.
Develop and maintain infrastructure as code using tools like Terraform.
Monitor and troubleshoot production systems and implement automated remediation techniques
Develop and maintain documentation for infrastructure, processes, and procedures
Stay abreast of emerging technologies and trends in DevOps, cloud computing, and software development, and drive their adoption as appropriate.
Requirements:
7+ years of experience in DevOps or related fields, with experience in a leadership role.
Proven experience with container orchestration technologies like Docker and Kubernetes or Swarm.
Extensive experience with cloud computing platforms like AWS, Azure, or Google Cloud (Google Cloud is an advantage).
Experience with configuration management tools like Terraform, Ansible, Puppet, or Chef.
Experience with package managers for Kubernetes tools like Helm, Kustomize, and Kompose (Helm an advantage).
Advanced knowledge of CI/CD tools like GitLab CI/CD, Circle CI, Jenkins, or Argo CD.
Experience with database administration and management.
Advanced Bash scripting skills.
Proficiency in at least one programming language, such as Python or Go (a must).
Experience with monitoring and logging tools like Prometheus, Grafana, or Datadog.
Experience with microservices architecture, API gateways, or Reverse Proxy such as NGINX (an advantage).
Excellent communication and interpersonal skills, with the ability to influence and drive cross-functional initiatives.
Demonstrated leadership experience, including mentoring and guiding team members.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8130969
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
07/04/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking to hire a talented, self-driven and passionate Senior Infrastructure Engineer to build and maintain the cloud infrastructure for our highly available SaaS application as well as our machine learning and data engineering stack.

As a Senior Infrastructure Engineer, you will be responsible for designing, implementing, and maintaining the cloud infrastructure and DevOps processes that power our products and internal tooling. You will work closely with all data and development teams and lead the companys security and compliance vectors. You will ensure a highly reliable, scalable, and secure infrastructure that supports our rapid growth and product innovation, while maintaining observability and cost-effectiveness of our cloud resources and data.

Responsibilities:
Cloud Infrastructure Management: Architect, deploy, and manage our cloud infrastructure (AWS), ensuring high availability, scalability, and security.
Software Engineering: Be a top notch SW engineer, harnessing your coding and architectural skills, as well as researching skills, for our infra stack.
Infrastructure as Code (IaC): Define and maintain infrastructure using tools like Terraform, CloudFormation, or Pulumi to manage resources efficiently and reproducibly.
Monitoring & Incident Management: Build and manage monitoring and alerting systems to ensure uptime, and respond to incidents with root cause analysis and remediation.
DevOps & Automation: Implement and maintain CI/CD pipelines to streamline development workflows and automate deployment processes across development, staging, and production environments, and across different parts of our solution. While our development teams are expected to write and maintain their own CI, you will act as a supervisor and professional authority, and maintain cross team and complex automation.
Collaboration and technical leadership: Partner with software engineers, data engineers, and machine learning teams to support their infrastructure needs and guide the evolution of our infrastructure team.
Cost Optimization: Monitor cloud spend and optimize resources to ensure cost-effective infrastructure without sacrificing performance or security.
Security & Compliance: Implement security best practices, including access control, network security, monitoring and ensuring the infrastructure is compliant with relevant industry standards (e.g., SOC2, GDPR).
דרישות:
Requirements:
Experience: 5+ years of hands-on experience in cloud infrastructure, DevOps and platform engineering in production environments.
Cloud Platforms and IaC: Expertise in managing cloud infrastructure on at least one of the major providers: AWS, GCP, Azure. Proficient in Infrastructure as Code tools such as Terraform, CloudFormation, or Pulumi.
Containerization & Orchestration: Solid experience with Docker and Kubernetes.
Monitoring & Logging: Hands-on experience with monitoring tools (Prometheus, Grafana) and logging systems (ELK, Splunk, or equivalent).
Software Engineering: Proficient Software engineering, architecture, as well as scripting languages such as Python, Bash, or Go. Full control of version control systems such as Git.
DevOps Tools: Strong experience with CI/CD pipelines and automation using Jenkins, CircleCI, GitHub Actions, GitLab CI, or similar.
Networking: Strong understanding of cloud networking, VPNs, VPCs, DNS, and firewalls.
Security Best Practices: Experience implementing cloud security best practices, including IAM, encryption, and key management.
Startup Experience: Previous experience in a fast-paced startup environment, where adaptability and hands-on execution are key.
Team Player: Strong communication skills and ability to work cross-functionally with different teams.

Advantages:
ML Infrastructure: Experience supporting machine learning pipelines and deploying ML models to production environments.
Data Engineering: Familiarity with data engineering tools like Apache Spark, Airflow, or similar.#EN המשרה מיועדת לנשים ולגברים כאחד.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8131896
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
24/04/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a DevOps Team Lead to lead our geographically diverse team and take ownership of our Cloud Infrastructure and Platform Engineering strategy, enabling high-scale, cutting-edge GenAI products running across 40+ Kubernetes clusters on GCP and AWS.

This role combines technical leadership, team management, and hands-on engineering, requiring solid expertise in cloud-native technologies, Kubernetes at scale, and modern DevOps principles. You will collaborate closely with engineering teams to design scalable infrastructure solutions, optimize developer workflows, and ensure platform reliability and efficiency.

Role and Responsibilities
Team Leadership & Mentorship: Lead and manage a geographically distributed team, fostering growth, engagement, and professional development. Mentor engineers, conduct performance reviews, career growth planning, and encourage knowledge-sharing across R&D teams.
Cloud & Kubernetes Management: Guide the design and implementation of scalable multi-cluster Kubernetes environments across GCP & AWS.
Developer Experience & Enablement: Oversee the development of self-service tools and automation to improve efficiency for R&D teams.
Incident & Reliability Engineering: Collaborate with engineering teams to optimize cost, performance, and reliability of production infrastructure through monitoring, capacity planning, and scaling strategies.
Security & Governance: Drive best practices for RBAC, IAM, cloud security, and compliance, ensuring robust infrastructure security.
Automation & Infrastructure as Code: Promote adoption of GitOps workflows and Infrastructure as Code (Terraform, Helm, Crossplane) for improved automation and consistency.
Cross-Team Collaboration: Align cloud infrastructure goals with business needs by working closely with engineering, security, and product teams.
Requirements:
7+ years of DevOps, SRE, or Platform Engineering experience.
5+ years working with public cloud platforms (AWS/GCP) at scale.
Senior-level Kubernetes expertise, including experience managing enterprise-grade, multi-cluster environments.
Experience with Infrastructure as Code (Terraform, Helm) and familiarity with GitOps principles (ArgoCD, FluxCD, etc.).
Familiarity with observability and monitoring tools (Prometheus, Grafana, Datadog, OpenTelemetry, etc.).
Proficiency in scripting and automation (Python, Go, Bash) for infrastructure management.
Knowledge of cloud networking (VPC, load balancers, service meshes) and security best practices (RBAC, IAM, security groups, network policies).
Experience with CI/CD pipelines, optimizing for performance, security, and developer velocity.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8152212
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
31/03/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
seeking a results-driven DevOps Team Lead to head the DevOps and Infrastructure team within our R&D organization. This role requires strategic vision, technical expertise, and a proactive approach to drive operational excellence, empower teams with robust tools and automation, and ensure high system reliability and scalability.

As the DevOps Team Lead, you will be instrumental in delivering critical KPIs, including system uptime, automation, incident management, and collaboration with development and QA teams to enable self-sufficiency. Additionally, you will serve as the leader for strategic projects, identifying opportunities to improve infrastructure and operational processes, setting long-term goals, and executing initiatives that align with Cyberints business objectives and growth.

Key Responsibilities
Strategic Leadership
System Reliability & Uptime

Lead initiatives to ensure system reliability, minimize disruptions, and maintain high availability for Cyberints SaaS platform.
Establish and manage proactive monitoring, alerting, and preventive maintenance strategies.
Drive incident prevention efforts, ensuring robust failover and disaster recovery mechanisms.
Develop and maintain playbooks to enable rapid diagnosis and resolution of issues.
Automation, Infrastructure as Code (IaC), & Self-Service Enablement

Champion the adoption of automation and IaC to streamline infrastructure management and deployments.
Build and enhance self-service tools and frameworks, empowering R&D teams to operate independently with minimal reliance on DevOps.
Continuously improve CI/CD pipelines to optimize deployment speed and reliability.
Collaboration & Support for Self-Sufficiency

Collaborate closely with development, QA, and support teams to deliver tools and frameworks that promote team autonomy and efficiency.
Advocate for cross-functional engagement to align operational processes with R&D objectives.
Provide training and mentorship to teams on using DevOps tools effectively.
Accountability, Ownership, & Scalability

Take ownership of all systems and infrastructure, ensuring solutions are scalable, resilient, and aligned with Cyberints growth objectives.
Establish clear accountability frameworks for maintaining infrastructure and delivering on key projects.
Design and execute a roadmap to support self-service-oriented and scalable solutions.


Identify and lead strategic projects to enhance Cyberints platform scalability, reliability, and operational efficiency.
Develop and execute a roadmap for critical infrastructure and DevOps initiatives that drive business success.
Collaborate with senior stakeholders to align projects with organizational priorities and deliver measurable outcomes.
Requirements:
5+ years of experience in DevOps or SRE roles, with 2+ years in a leadership capacity.
Proven expertise in building and maintaining highly available, cloud-native environments (AWS preferred).
Experience with Kubernetes, Terraform, CI/CD pipelines, and monitoring technology and tools (Prometheus, Grafana, Jenkins, ArgoCD, Terraform, Elasticsearch, Redis, EKS, etc.).
Skills & Expertise

Strong understanding of automation, Infrastructure as Code (IaC), and self-service enablement.
Expertise in incident management and a track record of delivering reliable, scalable systems.
Hands-on experience with scripting and automation tools (Python, Bash).
Deep understanding of containerization, orchestration, and cloud-native architectures.
Familiarity with cost monitoring and optimization strategies to ensure infrastructure is both efficient and cost-effective.
Knowledge of security best practices for infrastructure and DevOps environments.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8121466
סגור
שירות זה פתוח ללקוחות VIP בלבד