דרושים » מחשבים ורשתות » Site Reliability Engineering Manager

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 7 שעות
Location: Netanya and Tel Aviv-Yafo
Job Type: Full Time
At our company, were reinventing DevOps to help the worlds greatest companies innovate -- and we want you along for the ride. This is a special place with a unique combination of brilliance, spirit and just all-around great people. Here, if youre willing to do more, your career can take off. And since software plays a central role in everyones lives, youll be part of an important mission. Thousands of customers, including the majority of the Fortune 100, trust our company to manage, accelerate, and secure their software delivery from code to production -- a concept we call liquid software. Wouldn't it be amazing if you could join us in our journey?
We are looking for a Site Reliability Engineering Manager to lead our Israel SRE team. In this role, you'll drive best practices in reliability engineering, ensuring the stability, availability, and performance of our companys SaaS services. You'll collaborate with global SRE leaders, refine processes, and foster a culture of accountability and continuous improvement.
As a Site Reliability Engineering Manager at our company you will
Lead, mentor, and develop a high-performing SRE Israel team, fostering collaboration, innovation, and accountability
Ensure SaaS reliability, performance, and availability, meeting or exceeding service-level objectives
Drive SRE best practices, including capacity planning, incident management, chaos engineering, and disaster recovery
Implement proactive monitoring, alerting, and anomaly detection aligned with SaaS standards
Collaborate with P&E and Cloud engineering teams to embed reliability into the SDLC
Oversee incident management, ensuring swift identification, escalation, and resolution
Maintain comprehensive SRE documentation, including processes, incident reports, and system architecture
Evaluate and adopt tools, technologies, and methodologies to enhance uptime and reliability.
Requirements:
3+ years of management experience leading a team of SRE, DevOps, or a similar SaaS role
Bachelors degree in Computer Science, Engineering, or related field (or equivalent experience)
Strong expertise in cloud platforms (AWS, GCP, or Azure), containers (Kubernetes, Docker), and configuration management (Terraform, Ansible)
Proficiency in Python or Go for automation and system optimization, as well as GitOps experience with SCM tools (e.g., Git, Bitbucket)
Strong leadership, communication, and collaboration skills, working across globally distributed teams
Familiarity with Agile methodologies, CI/CD pipelines, and orchestration tools (Jenkins, ArgoCD, StackStorm)
Familiarity with Chaos Engineering (e.g., Gremlin, Litmus, Chaos Toolkit)
Hands-on with alerting & observability tools (e.g., PagerDuty, OpsGenie, New Relic, Coralogix)
Strong understanding of scalability, high availability, and security best practices in cloud & Kubernetes environments.
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8255508
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 7 שעות
Location: Tel Aviv-Yafo and Netanya
Job Type: Full Time
At our company, were reinventing DevOps to help the worlds greatest companies innovate -- and we want you along for the ride. This is a special place with a unique combination of brilliance, spirit and just all-around great people. Here, if youre willing to do more, your career can take off. And since software plays a central role in everyones lives, youll be part of an important mission. Thousands of customers, including the majority of the Fortune 100, trust our company to manage, accelerate, and secure their software delivery from code to production -- a concept we call liquid software. Wouldn't it be amazing if you could join us on our journey?
our company seeks a highly-skilled Senior Site Reliability Engineer to join our team! In this role, you will drive best practices, optimize operational workflows, and mentor junior engineers, fostering a culture of collaboration and innovation. This is an exciting opportunity for someone passionate about building and integrating services and systems that ensure the availability, performance, and reliability of our company SaaS environments. You will lead large-scale, cross-functional initiatives, You will work closely with P&E engineering and Cloud teams to design, build, and maintain scalable, resilient infrastructure while championing best practices for automation, monitoring, and incident response. If you're eager to make a significant impact in a fast-paced, high-growth environment, we encourage you to apply.
As a Senior Site Reliability Engineer in our company you will
Lead and groom the team towards technical solutions guided by a strong understanding of the latest and greatest technologies like Kubernetes, Helm, Terraform, and more
Advocate, build, and manage scalable and reliable services and infrastructure to support our company SaaS services
Apply SRE best practices, including incident management, performance and capacity planning, and disaster recovery flows
Drive the reliability, performance, and availability of our SaaS products, ensuring service-level objectives are met or exceeded
Design, develop, and manage large-scale systems with CI/CD in mind, to support multiple production environments and use cases
Tackle large-scale production issues and bring out-of-the-box thinking to the table
Evaluate new cloud-native technologies and vendor products to continuously improve our SaaS offering
Requirements:
5+ years of relevant DevOps or SRE experience in large-scale production environments
2+ years of infrastructure automation, configuration management, or container orchestration using Kubernetes, Docker, Terraform, and Ansible
2+ years in Python or any other advanced programming language
Strong ability to lead, design, and execute cross-organization projects
Experience in managing container and infrastructure orchestration tools (e.g. Kubernetes, Terraform)
Hands-on experience administering public clouds (AWS, GCP, or Azure)
Experience with building CI/CD pipelines for applications and microservices (Jenkins/ArgoCD)
Experience with chaos, alerting & observability tools (Gremlin, PagerDuty, Opsgenie, New Relic, Coralogix).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8255520
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
Were looking for a Site Reliability Engineer (SRE) to enhance the reliability, performance, and scalability of our production infrastructure. This role goes beyond keeping systems runningyoull be a key player in shaping the culture of reliability, driving self-healing mechanisms, proactive alerting strategies, and automation to reduce toil and improve operational efficiency. You'll work closely with engineering teams to ensure high availability, observability, and smooth incident management processes.
Responsibilities
Ensure reliability & scalability of our production environment across multiple cloud providers.
Define and implement SRE best practicesfostering a culture of ownership, continuous improvement, and automation.
Automate everythingfrom infrastructure deployment to self-healing mechanisms that eliminate manual intervention.
Design and improve observability solutions (monitoring, logging, tracing) to enable faster detection and resolution of issues.
Optimize alerting strategies to ensure actionable, high-quality alerts while minimizing noise and fatigue.
Improve system resilience, driving chaos engineering, failover strategies, and automatic recovery processes.
Enhance incident response processes, including on-call strategies, root cause analysis, and post-mortems to drive long-term stability.
Collaborate with development teams to build reliable, scalable, and efficient architectures, ensuring seamless deployment and rollback processes.
Promote a culture of reliability, educating teams on best practices, service ownership, and production-readiness.
Requirements:
3+ years of experience as an SRE, DevOps Engineer, or in a similar role.
Strong expertise in Kubernetes and container orchestration in production.
Hands-on experience with cloud platforms (AWS, Azure, or GCP).
Proven experience with monitoring & observability tools (Prometheus, ELK, Grafana, Coralogix, etc.).
Strong scripting/programming skills (Python, Go, Bash, or similar).
Experience with Infrastructure as Code (IaC)Terraform, Helm, or similar tools.
Track record of improving system reliability, scalability, and performance.
Experience designing and implementing self-healing mechanisms to minimize human intervention.
Ability to foster a strong reliability culture across engineering teams, leading by example.
Excellent problem-solving skills, with a proactive and ownership-driven mindset.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8228722
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 7 שעות
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a Site Reliability Engineer to join our DevOps team. You will ensure the reliability, performance, and scalability of our back-office solutions, which serve as the foundation for the entire purchasing process. This role will lead the development of SRE capabilities, meeting SLI/SLO/SLA targets, and establishing effective monitoring systems. You will enhance our Software Development Lifecycle by integrating reliability and scalability, working with cross-functional teams, and supporting production environments. Additionally, you will implement incident management processes and conduct post-mortem analyses to drive continuous improvement. If you have a strong engineering and automation background and are passionate about the E-commerce field, then we would love to hear from you.
Roles and Responsibilities:
Develop and implement SRE capabilities to enhance the reliability, availability, and performance of Admin solutions.
Design and maintain proactive monitoring and alerting systems for deep visibility into critical business flows, beyond simple statuses, to identify functional issues.
Drive improvements in the Software Development Lifecycle (SDLC) for reliability and scalability from design to deployment.
Collaborate with development and operations teams to troubleshoot production incidents affecting the purchase flow through root cause analysis.
Lead SRE initiatives to boost system resilience and operational efficiency.
Implement best practices for incident management and conduct blameless post-mortems, contributing to capacity planning and performance testing to ensure scalability.
Requirements:
5+ years of experience as a Site Reliability/DevOps Engineer
Deep understanding of E-commerce flows, specifically with back-office operations and order processing - must
Experience as an Automation/Software Engineer with a strong understanding of software development principles and in building, testing, and deploying distributed systems - must
Experience in designing, implementing, and utilizing monitoring and observability platforms such as DataDog, NewRelic, Prometheus/Grafana, or ELK stack - must
Proficiency in scripting and automation using languages such as Python, Java, etc. - must
Ability to create dashboards, alerts, and insightful queries - must
Experience with AWS services to build and operate scalable and resilient applications (e.g., EC2, ECS/EKS, RDS, S3, Lambda, CloudWatch) - plus
Experience in automating infrastructure provisioning, application deployments, and repetitive operational tasks - plus
Proactive approach with excellent problem-solving skills
Strong collaborator, with an ability to work with cross-functional teams
Proficient in English.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8255386
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
Required Site Reliability Engineer- Infra
Realize your potential by joining the leading performance-driven advertising company!
As a Site Reliability Engineer- infra, on our Infrastructure team at the TLV office, you will play a key role in ensuring the reliability, scalability, and performance of our critical systems. You will be responsible for managing and improving our core infrastructure, with a focus on automation, monitoring, and incident response. You will work with a wide range of technologies, including Kubernetes, monitoring and observability tools, configuration management systems, and core networking services.
How youll make an impact:
As a Site Reliability Engineer, youll bring value by:
Ensure the reliability, availability, and performance of our infrastructure services.
Manage and maintain our Kubernetes infrastructure, including KubeVirt.
Design, implement, and maintain our monitoring and observability stack (SensuGo, VictoriaMetrics, Prometheus, ELK).
Automate infrastructure provisioning, configuration, and deployment processes using Puppet and Ansible.
Manage and maintain core services such as DNS and networking.
Troubleshoot and resolve complex infrastructure issues in a timely and efficient manner.
Participate in on-call rotations and incident response.
Develop and maintain infrastructure-as-code (IaC).
Identify and implement proactive measures to prevent incidents and improve system reliability.
Collaborate with development teams to ensure smooth and reliable deployments.
Contribute to the design and implementation of new infrastructure solutions.
Drive improvements in system architecture, processes, and tools.
Mentor and coach other team members.
Requirements:
To thrive in this role, youll need:
5+ years of experience in a Site Reliability Engineering, Systems Engineering, or similar role.
Deep understanding of Site Reliability Engineering principles and practices.
Extensive experience with Kubernetes, including deployment, management, and troubleshooting.
Strong experience with monitoring and observability tools such as SensuGo, Zabbix, VictoriaMetrics, Prometheus, and ELK.
Proficiency in configuration management tools such as Puppet and Ansible.
Solid understanding of Linux internals and networking.
Experience with managing and maintaining core services such as DNS and networking.
Strong programming skills in Python and/or Go.
Experience with both on-premises and cloud environments.
Experience with KubeVirt.
Excellent troubleshooting and problem-solving skills.
Strong communication and collaboration skills.
Ability to work in a fast-paced, dynamic environment.
Ability to participate in on-call rotations including weekends.
Preferred Qualifications:
Experience with large-scale, distributed systems.
Experience with other cloud providers (e.g., AWS, Azure, GCP).
Contributions to open-source projects.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8205377
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
18/06/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking a talented, self-driven and passionate Senior Infrastructure Engineer to build and maintain the cloud infrastructure for our highly available SaaS application as well as our machine learning and data engineering stack.

As a Senior Infrastructure Engineer, you will be responsible for designing, implementing, and maintaining the cloud infrastructure and DevOps processes that power our products and internal tooling. You will work closely with all data and development teams and lead the companys security and compliance vectors. You will ensure a highly reliable, scalable, and secure infrastructure that supports our rapid growth and product innovation, while maintaining observability and cost-effectiveness of our cloud resources and data.

What Youll Do

Cloud Infrastructure Management: Architect, deploy, and manage our cloud infrastructure (AWS), ensuring high availability, scalability, and security.
Software Engineering: Be a top notch SW engineer, harnessing your coding and architectural skills, as well as researching skills, for our infra stack.
Infrastructure as Code (IaC): Define and maintain infrastructure using tools like Terraform, CloudFormation, or Pulumi to manage resources efficiently and reproducibly.
Monitoring & Incident Management: Build and manage monitoring and alerting systems to ensure uptime, and respond to incidents with root cause analysis and remediation.
DevOps & Automation: Implement and maintain CI/CD pipelines to streamline development workflows and automate deployment processes across development, staging, and production environments, and across different parts of our solution. While our development teams are expected to write and maintain their own CI, you will act as a supervisor and professional authority, and maintain cross team and complex automation.
Collaboration and technical leadership: Partner with software engineers, data engineers, and machine learning teams to support their infrastructure needs and guide the evolution of our infrastructure team.
Cost Optimization: Monitor cloud spend and optimize resources to ensure cost-effective infrastructure without sacrificing performance or security.
Security & Compliance: Implement security best practices, including access control, network security, monitoring and ensuring the infrastructure is compliant with relevant industry standards (e.g., SOC2, GDPR).
דרישות:
5+ years of hands-on experience in cloud infrastructure, DevOps and platform engineering in production environments.
Expertise in managing cloud infrastructure on at least one of the major providers: AWS, GCP, Azure. Proficient in Infrastructure as Code tools such as Terraform, CloudFormation, or Pulumi.
Solid experience with Docker and Kubernetes.
Monitoring & Logging: Hands-on experience with monitoring tools (Prometheus, Grafana) and logging systems (ELK, Splunk, or equivalent).
Proficient Software engineering, architecture, as well as scripting languages such as Python, Bash, or Go. Full control of version control systems such as Git.
Strong experience with CI/CD pipelines and automation using Jenkins, CircleCI, GitHub Actions, GitLab CI, or similar.
Strong understanding of cloud networking, VPNs, VPCs, DNS, and firewalls.
Experience implementing cloud security best practices, including IAM, encryption, and key management.
Previous experience in a fast-paced startup environment, where adaptability and hands-on execution are key.
Strong communication skills and ability to work cross-functionally with different teams.
Advantages:

Experience supporting machine learning pipelines and deploying ML models to production environments.
Familiarity with data engineering tools like Apache Spark, Airflow, or similar ETL tools.
Experience with serverless technologies such as AWS Lambda, GCP Functions, or Azure Functions.
AWS Certified Security Specialty, or equivalent certifications in cloud security.
Experience and knowledge with regulatory complia המשרה מיועדת לנשים ולגברים כאחד.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8222324
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
18/06/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking an experienced Senior Site Reliability Engineer to join our SRE team as part of our Platform Engineering group. This role involves taking ownership of monitoring, deploying, and ensuring the reliability of production-grade modern SaaS platforms across Cloud and On-Premise environments.
Responsibilities:
Lead initiatives to enhance product reliability and system readiness.
Design and implement sophisticated monitoring solutions to ensure high availability and performance of our production platform.
Oversee and refine the entire product reliability pipeline.
Proactively troubleshoot and resolve issues across production environments.
Champion an "Everything as Code" approach using a wide range of technologies including Ansible, Terraform, Helm, Python and more.
Develop advanced tools for automation, deployment, monitoring, and operations.
Exhibit excellent communication and interpersonal skills to effectively collaborate within the team and across departments.
Promoting best practices in reliability and system operations.
Requirements:
4+ years of experience as a DevOps or Site Reliability Engineer.
In-depth knowledge of microservices architectures and technologies such as Kubernetes.
Comprehensive understanding of cloud & on-prem environments and hybrid solutions.
Proficiency with one or more major cloud providers. (AWS experience is an advantage)
Advanced experience with CI/CD technologies including Jenkins, GitHub Actions, and ArgoCD.
Proficient coding and scripting capabilities in Python, Bash, or similar languages.
Strong team player with proven ability to lead and inspire.
Advantages:
Prior experience with endpoint security products (agents, sensors, collectors).
Background in working with AI components (training, inference, serving).
Tech Stack: AWS, Kubernetes, EKS, RKE2, ECS, SageMaker, Jenkins, GitHub, Terraform, Python, Ansible, Docker + Compose, ArgoCD, MongoDB, RabbitMQ, Redis, Go, Neo4J, AI, and more.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8221921
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Required Site Reliability Engineer
Realize your potential by joining the leading performance-driven advertising company!
As Site Reliability Engineer on the IT Production team in our TLV Office, youll play a vital role in building robust services and solving infrastructure challenges with automations while working with cutting-edge technologies and bringing those to their limits on our mostly on-prem cloud like infrastructure.
How youll make an impact:
As a Site Reliability Engineer, youll bring value by:
Ensure Reliability & Scalability: Design, implement and manage highly reliable and scalable distributed systems across our on-premise, cloud and AI/ML environments. Proactively optimize performance, efficiency, resource utilization and cloud cost.
Drive Automation: Automate repetitive tasks, infrastructure provisioning, configuration and deployments using IaC and scripting languages (e.g., Python, Go, Rust).
Develop Observability & Capacity: Implement comprehensive monitoring and alerting systems to ensure system health. Collaborate on capacity planning to meet future growth.
Maintain Security & Compliance: Integrate security best practices and ensure compliance with industry standards.
Lead Incident Management: Participate in on-call rotations, lead incident responses and conduct root cause analysis to minimize downtime.
Foster Collaboration & Improvement: Work closely with development, operations and security teams to drive shared responsibility and continuous improvement in SRE practices.
Our Tech Stack:
Linux, Kubernetes, nginx, Istio, AWS, GCP, Azure, Alicloud, Fastly, Terraform, Consul, Prometheus, Loki, Grafana, Airflow, Redis, Kafka, Vector, Hadoop, Cassandra, Vertica, MySQL, HDFS, ELK.
Requirements:
7 years of experience as an SRE, DevOps Engineer, System Administrator in a large distributed environment with focus on Linux operating systems.
Experience supporting, troubleshooting and scaling large distributed systems in production.
Deep understanding of HTTP protocol, including HTTP/1.1, HTTP/2, caching semantics, TLS and gRPC delivery.
Experience configuring and operating CDN services (e.g., Akamai, Fastly, Cloudflare, AWS CloudFront).
Deep understanding in Linux system internals and system performance tuning.
Experience with Configuration Management Tools (Puppet, Ansible, Chef, Terraform).
Experience programming in at least one of the following languages (Python, Golang, Rust, Ruby, C++, Java).
Experience with monitoring and metrics collection systems (Prometheus, Grafana, ELK).
Experience with cloud providers and platforms (AWS, Azure, GCP, Alibaba).
Experience with containerization technologies (Kubernetes, Docker).
Deep understanding of networking principles (TCP/IP, DNS, load balancing).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8205371
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
17/06/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Join our DeviantArt team as a Senior DevOps Engineer and play a pivotal role in maintaining and architecting a robust infrastructure that powers one of the largest online art communities. You'll be at the forefront of ensuring our platform's high availability, performance, and security, handling over 1.5 billion monthly page views.
The DeviantArt DevOps Team is a very small remote team that performs all tasks normally inclusive of SRE/DevOps/Infrastructure Engineers, with a bit of networking, security, and database administration mixed in. We are responsible for the day-to-day management and implementation of large-scale, mission-critical production systems that run on a public cloud.
This role requires wearing a lot of hats, and is equal parts fun and challenging. In this role, you will:
Architect and maintain a highly available infrastructure with a focus on proactive and reactive DDOS mitigation, autoscaling, self-healing, site performance, and cost optimization
Participate in a 24/7 on-call rotation, responding swiftly to outages or performance issues, and focus on less urgent alerts during normal work hours
Maintain and develop a developer environment and CI/CD pipelines in parity with production systems, for seamless testing and release of changes
Automate infrastructure provisioning and management using configuration management tools, complete with tests and documentation
Optimize and support sharded MySQL databases for efficient and reliable data handling amidst growing data reads and writes
Regularly update system components to avoid security issues and ensure up-to-date technology
We take our work seriously, but we dont take ourselves too seriously! We enjoy designing and building systems using open source tools and industry standards, and are in the fortunate position to be able to make decisions as a team about adopting newer technologies, and redesigning our infrastructure when appropriate.
This role is on a fully remote and distributed team, and asynchronous communication within and across teams is crucial. To be successful in this role, a candidate will need to work flexibly, balancing server and service issues, needs from development teams, security needs, and shifting priorities in our own tasks in managing our infrastructure.
Requirements:
5+ years of experience managing systems at scale as a DevOps Engineer, Site Reliability Engineer, or Platform Engineer
Excellent technical analytical skills with the ability to implement DDOS mitigation, troubleshoot complex problems, analyze system bottlenecks, and implement effective solutions, from frontend through backend systems, sometimes during production degradation or outage for a high traffic site
Exceptional command line Linux skills, with proficiency in Bash and Python for investigation of server and services issues, scripting, and automation
In-depth knowledge of AWS services, infrastructure as code using Terraform, GitOps tools and methodologies, and container orchestration using Docker, Helm, and Kubernetes
Experience with setup, administration, and maintenance of sharded MySQL database clusters while maintaining no downtime or data loss
Excellent communication skills with fluent English, and the ability to collaborate effectively across teams while articulating technical concepts to non-technical stakeholders
The ability to get up to speed on systems, make decisions, be flexible, and execute independently with attention to detail for production systems.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8220324
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
3 ימים
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Were looking for a seasoned DevOps professional to join our team and play a key role in shaping the infrastructure and delivery processes that power our technology. In this role, youll architect and build robust CI/CD pipelines, automate and optimize critical infrastructure, and work closely with cross-functional teams to deliver scalable, secure, and highly available systems.
Responsibilities:
Architect, develop, and optimize robust CI/CD pipelines to streamline software delivery and deployment processes.
Automate, orchestrate, and enhance mission-critical infrastructure while ensuring system scalability, security, and high availability.
Collaborate closely with R&D, product teams, and engineers to understand challenges and provide tailored DevOps solutions.
Continuously evaluate, refine, and innovate DevOps methodologies, tools, and best practices to maximize efficiency.
Troubleshoot and resolve complex issues across development, testing, and production environments, ensuring seamless system operations.
Operate with a DevOps-first mindset, championing collaboration, automation, and observability.
Stay ahead of the curve with emerging technologies, industry trends, and best practices, driving continuous improvement.
Requirements:
3+ years of hands-on experience in DevOps engineering, automation, and CI/CD strategies.
Hands-on experience with cloud platforms such as AWS, GCP, or Azure (e.g., managing services, networking, IAM, and automation).
Expert-level proficiency in Python, with the ability to develop robust automation scripts, tooling and integrations.
Deep expertise in Git methodologies (branching strategies, GitOps, CI/CD best practices, etc.).
Proven experience in designing, implementing, and maintaining CI/CD pipeline.
Hands-on experience with Kubernetes, and Docker, including workload orchestration, container lifecycle management, and performance tuning.
Strong Linux administration skills.
Infrastructure as Code (IaC) expertise with Terraform, ensuring reproducibility and automation at scale.
Proficiency in observability and monitoring solutions (Prometheus, Grafana, ELK stack, etc.), enabling proactive system health management.
Experience working in Agile/SCRUM methodologies an advantage.
Networking fundamentals (DNS, TCP/IP, Load Balancing) an advantage.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8252949
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
At our company, we are building an AI-First company, leveraging cutting-edge technology to revolutionize climate intelligence.
our company's engineering department is focused on building life changing software and products at scale, from infrastructure that handles massive amounts of data to outstanding customer-centric user experiences in B2B, B2C and B2D products that change billions of lives worldwide.
We're looking for a DevOps Engineer who'll bring the dev stream from build through deploy to production, to maximum efficiency, reliability, security and stability. You'll integrate AI capabilities into our DevOps workflows, facilitate team independence, and maintain adaptive infrastructure that supports our business strategy. By continuously reducing costs and increasing velocity, you'll work in a team effort to achieve our goals. You'll help us make sure we are building the biggest weather platform in the world.
As a DevOps Engineer at our company, You'll:
Develop and adopt AI-powered tools to make Development and Operations processes more efficient
Collaborate with developers and weather scientists to optimize service performance, reliability, scale, security, and cost
Evolve and maintain adaptive cloud infrastructure to support our business strategy and enable smooth growth at scale
Build self-service platforms for scientists and developers to work independently
Introduce and integrate MLOps practices for GPU-based model deployment on Kubernetes
Maintain Production availability by participating in DevOps on-call shifts.
Requirements:
At least 3 years of experience as a DevOps/SRE Engineer in a Linux environment, experienced with AWS, GCP, or Azure and IaC, such as Terraform or Crossplane
Experience with CI/CD tools and deployment methodologies in Kubernetes
Strong sense of ownership and accountability for service reliability
Comfort with AI-powered development tools and willingness to experiment with new technologies and methods
Experience implementing and customizing monitoring systems (Datadog, Prometheus, ELK Stack)
Experience working in an agile environment with high-velocity teams
Proficiency with scripting languages like Python, Node.js, and Go
Adaptable problem-solving mindset - thriving in changing environments and requirements
Data-driven decision-making mindset, with a passion for leveraging AI, automation, and analytics to solve complex challenges.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8239733
סגור
שירות זה פתוח ללקוחות VIP בלבד