דרושים » מחשבים ורשתות » Senior Site Reliability Engineer

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 7 שעות
Location: Netanya and Tel Aviv-Yafo
Job Type: Full Time
At our company, were reinventing DevOps to help the worlds greatest companies innovate -- and we want you along for the ride. This is a special place with a unique combination of brilliance, spirit and just all-around great people. Here, if youre willing to do more, your career can take off. And since software plays a central role in everyones lives, youll be part of an important mission. Thousands of customers, including the majority of the Fortune 100, trust our company to manage, accelerate, and secure their software delivery from code to production -- a concept we call liquid software. Wouldn't it be amazing if you could join us on our journey?
our company seeks a highly-skilled Senior Site Reliability Engineer to join our team! In this role, you will drive best practices, optimize operational workflows, and mentor junior engineers, fostering a culture of collaboration and innovation. This is an exciting opportunity for someone passionate about building and integrating services and systems that ensure the availability, performance, and reliability of our company SaaS environments. You will lead large-scale, cross-functional initiatives, You will work closely with P&E engineering and Cloud teams to design, build, and maintain scalable, resilient infrastructure while championing best practices for automation, monitoring, and incident response. If you're eager to make a significant impact in a fast-paced, high-growth environment, we encourage you to apply.
As a Senior Site Reliability Engineer in our company you will
Lead and groom the team towards technical solutions guided by a strong understanding of the latest and greatest technologies like Kubernetes, Helm, Terraform, and more
Advocate, build, and manage scalable and reliable services and infrastructure to support our company SaaS services
Apply SRE best practices, including incident management, performance and capacity planning, and disaster recovery flows
Drive the reliability, performance, and availability of our SaaS products, ensuring service-level objectives are met or exceeded
Design, develop, and manage large-scale systems with CI/CD in mind, to support multiple production environments and use cases
Tackle large-scale production issues and bring out-of-the-box thinking to the table
Evaluate new cloud-native technologies and vendor products to continuously improve our SaaS offering
Requirements:
5+ years of relevant DevOps or SRE experience in large-scale production environments
2+ years of infrastructure automation, configuration management, or container orchestration using Kubernetes, Docker, Terraform, and Ansible
2+ years in Python or any other advanced programming language
Strong ability to lead, design, and execute cross-organization projects
Experience in managing container and infrastructure orchestration tools (e.g. Kubernetes, Terraform)
Hands-on experience administering public clouds (AWS, GCP, or Azure)
Experience with building CI/CD pipelines for applications and microservices (Jenkins/ArgoCD)
Experience with chaos, alerting & observability tools (Gremlin, PagerDuty, Opsgenie, New Relic, Coralogix).
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8255520
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 7 שעות
Location: Tel Aviv-Yafo and Netanya
Job Type: Full Time
At our company, were reinventing DevOps to help the worlds greatest companies innovate -- and we want you along for the ride. This is a special place with a unique combination of brilliance, spirit and just all-around great people. Here, if youre willing to do more, your career can take off. And since software plays a central role in everyones lives, youll be part of an important mission. Thousands of customers, including the majority of the Fortune 100, trust our company to manage, accelerate, and secure their software delivery from code to production -- a concept we call liquid software. Wouldn't it be amazing if you could join us in our journey?
We are looking for a Site Reliability Engineering Manager to lead our Israel SRE team. In this role, you'll drive best practices in reliability engineering, ensuring the stability, availability, and performance of our companys SaaS services. You'll collaborate with global SRE leaders, refine processes, and foster a culture of accountability and continuous improvement.
As a Site Reliability Engineering Manager at our company you will
Lead, mentor, and develop a high-performing SRE Israel team, fostering collaboration, innovation, and accountability
Ensure SaaS reliability, performance, and availability, meeting or exceeding service-level objectives
Drive SRE best practices, including capacity planning, incident management, chaos engineering, and disaster recovery
Implement proactive monitoring, alerting, and anomaly detection aligned with SaaS standards
Collaborate with P&E and Cloud engineering teams to embed reliability into the SDLC
Oversee incident management, ensuring swift identification, escalation, and resolution
Maintain comprehensive SRE documentation, including processes, incident reports, and system architecture
Evaluate and adopt tools, technologies, and methodologies to enhance uptime and reliability.
Requirements:
3+ years of management experience leading a team of SRE, DevOps, or a similar SaaS role
Bachelors degree in Computer Science, Engineering, or related field (or equivalent experience)
Strong expertise in cloud platforms (AWS, GCP, or Azure), containers (Kubernetes, Docker), and configuration management (Terraform, Ansible)
Proficiency in Python or Go for automation and system optimization, as well as GitOps experience with SCM tools (e.g., Git, Bitbucket)
Strong leadership, communication, and collaboration skills, working across globally distributed teams
Familiarity with Agile methodologies, CI/CD pipelines, and orchestration tools (Jenkins, ArgoCD, StackStorm)
Familiarity with Chaos Engineering (e.g., Gremlin, Litmus, Chaos Toolkit)
Hands-on with alerting & observability tools (e.g., PagerDuty, OpsGenie, New Relic, Coralogix)
Strong understanding of scalability, high availability, and security best practices in cloud & Kubernetes environments.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8255508
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 6 שעות
חברה חסויה
Location: Tel Aviv-Yafo and Netanya
Job Type: Full Time
we are seeking a highly skilled Senior DevOps Engineer to join our team! In this role, you will drive best practices, optimize operational workflows, and mentor junior engineers, fostering a culture of collaboration and innovation. This is an exciting opportunity for someone passionate about building next-generation DevOps platforms at scale. You will lead large-scale, cross-functional initiatives, working closely with R&D architects to shape the best DevOps strategies for our SaaS solutions. If you're eager to make a significant impact in a fast-paced, high-growth environment, we encourage you to apply.
As a Senior DevOps Engineer at our company, you will
Lead the team towards technical solutions guided by a strong understanding of the latest and greatest technologies like Kubernetes, Helm, Terraform, and more
Design, implement, and manage cloud and containerized architectures in GCP, AWS, and Azure
Design, develop, and manage large-scale systems with CI/CD in mind, to support multiple production environments and use cases
Tackle large-scale production issues and bring out-of-the-box thinking to the table
Evaluate new cloud-native technologies and vendor products to continuously improve our SaaS offerings.
Requirements:
5+ years of relevant DevOps experience in large-scale production environments
2+ years of infrastructure automation, configuration management, or container orchestration
2+ years in Python or any other advanced programming language
Strong ability to lead, design, and execute cross-organization projects
Experience in managing container and infrastructure orchestration tools (e.g. Kubernetes, Terraform)
Hands-on experience administering public clouds (AWS, GCP, or Azure)
Experience with building CI/CD pipelines for applications and microservices .
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8255830
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
18/06/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking a talented, self-driven and passionate Senior Infrastructure Engineer to build and maintain the cloud infrastructure for our highly available SaaS application as well as our machine learning and data engineering stack.

As a Senior Infrastructure Engineer, you will be responsible for designing, implementing, and maintaining the cloud infrastructure and DevOps processes that power our products and internal tooling. You will work closely with all data and development teams and lead the companys security and compliance vectors. You will ensure a highly reliable, scalable, and secure infrastructure that supports our rapid growth and product innovation, while maintaining observability and cost-effectiveness of our cloud resources and data.

What Youll Do

Cloud Infrastructure Management: Architect, deploy, and manage our cloud infrastructure (AWS), ensuring high availability, scalability, and security.
Software Engineering: Be a top notch SW engineer, harnessing your coding and architectural skills, as well as researching skills, for our infra stack.
Infrastructure as Code (IaC): Define and maintain infrastructure using tools like Terraform, CloudFormation, or Pulumi to manage resources efficiently and reproducibly.
Monitoring & Incident Management: Build and manage monitoring and alerting systems to ensure uptime, and respond to incidents with root cause analysis and remediation.
DevOps & Automation: Implement and maintain CI/CD pipelines to streamline development workflows and automate deployment processes across development, staging, and production environments, and across different parts of our solution. While our development teams are expected to write and maintain their own CI, you will act as a supervisor and professional authority, and maintain cross team and complex automation.
Collaboration and technical leadership: Partner with software engineers, data engineers, and machine learning teams to support their infrastructure needs and guide the evolution of our infrastructure team.
Cost Optimization: Monitor cloud spend and optimize resources to ensure cost-effective infrastructure without sacrificing performance or security.
Security & Compliance: Implement security best practices, including access control, network security, monitoring and ensuring the infrastructure is compliant with relevant industry standards (e.g., SOC2, GDPR).
דרישות:
5+ years of hands-on experience in cloud infrastructure, DevOps and platform engineering in production environments.
Expertise in managing cloud infrastructure on at least one of the major providers: AWS, GCP, Azure. Proficient in Infrastructure as Code tools such as Terraform, CloudFormation, or Pulumi.
Solid experience with Docker and Kubernetes.
Monitoring & Logging: Hands-on experience with monitoring tools (Prometheus, Grafana) and logging systems (ELK, Splunk, or equivalent).
Proficient Software engineering, architecture, as well as scripting languages such as Python, Bash, or Go. Full control of version control systems such as Git.
Strong experience with CI/CD pipelines and automation using Jenkins, CircleCI, GitHub Actions, GitLab CI, or similar.
Strong understanding of cloud networking, VPNs, VPCs, DNS, and firewalls.
Experience implementing cloud security best practices, including IAM, encryption, and key management.
Previous experience in a fast-paced startup environment, where adaptability and hands-on execution are key.
Strong communication skills and ability to work cross-functionally with different teams.
Advantages:

Experience supporting machine learning pipelines and deploying ML models to production environments.
Familiarity with data engineering tools like Apache Spark, Airflow, or similar ETL tools.
Experience with serverless technologies such as AWS Lambda, GCP Functions, or Azure Functions.
AWS Certified Security Specialty, or equivalent certifications in cloud security.
Experience and knowledge with regulatory complia המשרה מיועדת לנשים ולגברים כאחד.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8222324
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
16/06/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a DevOps DevOps Engineer to take ownership of our Cloud Infrastructure and Platform Engineering strategy, enabling high-scale, cutting-edge GenAI products running across 40+ Kubernetes clusters on GCP and AWS.
This role is a hands-on engineering , requiring deep expertise in cloud-native technologies, Kubernetes at scale, and modern DevOps principles. You will work closely with engineering teams to design and implement scalable infrastructure solutions, optimize developer workflows, and ensure reliability and efficiency across our platform.
Role and Responsibilities:
Cloud & Kubernetes Expertise: Design and implement highly scalable multi-cluster Kubernetes environments across GCP & AWS.
Developer Experience & Enablement: Lead the development of self-service tools and automation that improve efficiency for R&D teams.
Incident & Reliability Engineering: Work with engineering teams to optimize cost, performance, and reliability of production infrastructure through monitoring, capacity planning, and scaling strategies.
Security & Governance: Contribute to best practices for RBAC, IAM, cloud security, and compliance while ensuring infrastructure reliability.
Automation & Infrastructure as Code: Drive adoption of GitOps workflows and Infrastructure as Code (Terraform, Helm, Crossplane) to enhance automation and consistency.
Mentorship & Team Growth: Provide technical mentorship within the platform engineering team and contribute to knowledge-sharing across R&D.
Cross-Team Collaboration: Work closely with engineering teams to align cloud infrastructure goals with business needs and reliability requirements.
Technology Assessment: Assess and advocate for new technologies that improve reliability, efficiency, and scalability within the platform.
Requirements:
Technical Expertise:
5+ years of DevOps, or SRE experience
3+ years working with public cloud platforms (AWS, GCP) at scale
Deep Kubernetes expertise, including managing large-scale, multi-cluster enterprise-grade Kubernetes environments
Experience designing and managing Custom Resource Definitions (CRDs) and custom controllers
Strong background in Infrastructure as Code (Terraform, Helm) and GitOps principles (ArgoCD, Crossplane, FluxCD, etc.)
Hands-on experience in observability & monitoring (Prometheus, Grafana, Datadog, OpenTelemetry, etc.)
Proficiency in scripting & automation (Python, Go, Bash) for infrastructure automation
Expertise in cloud networking (VPC, load balancers, service meshes) and security best practices (RBAC, IAM, security groups, network policies, etc.)
Experience with CI/CD pipelines, optimizing for performance, security, and developer velocity
Nice-to-Have:
Experience with self-hosted on-prem deployments and managed private VPC deployments (Bring Your Own Cloud models)
Advanced expertise in Helm and Crossplane for Kubernetes resource management.
Other cloud provider experience
Experience in GenAI or large-scale SaaS platforms
Familiarity with SQL/NoSQL databases and distributed systems
DevSecOps experience, with a strong understanding of security automation and compliance frameworks.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8218726
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
17/06/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Join our DeviantArt team as a Senior DevOps Engineer and play a pivotal role in maintaining and architecting a robust infrastructure that powers one of the largest online art communities. You'll be at the forefront of ensuring our platform's high availability, performance, and security, handling over 1.5 billion monthly page views.
The DeviantArt DevOps Team is a very small remote team that performs all tasks normally inclusive of SRE/DevOps/Infrastructure Engineers, with a bit of networking, security, and database administration mixed in. We are responsible for the day-to-day management and implementation of large-scale, mission-critical production systems that run on a public cloud.
This role requires wearing a lot of hats, and is equal parts fun and challenging. In this role, you will:
Architect and maintain a highly available infrastructure with a focus on proactive and reactive DDOS mitigation, autoscaling, self-healing, site performance, and cost optimization
Participate in a 24/7 on-call rotation, responding swiftly to outages or performance issues, and focus on less urgent alerts during normal work hours
Maintain and develop a developer environment and CI/CD pipelines in parity with production systems, for seamless testing and release of changes
Automate infrastructure provisioning and management using configuration management tools, complete with tests and documentation
Optimize and support sharded MySQL databases for efficient and reliable data handling amidst growing data reads and writes
Regularly update system components to avoid security issues and ensure up-to-date technology
We take our work seriously, but we dont take ourselves too seriously! We enjoy designing and building systems using open source tools and industry standards, and are in the fortunate position to be able to make decisions as a team about adopting newer technologies, and redesigning our infrastructure when appropriate.
This role is on a fully remote and distributed team, and asynchronous communication within and across teams is crucial. To be successful in this role, a candidate will need to work flexibly, balancing server and service issues, needs from development teams, security needs, and shifting priorities in our own tasks in managing our infrastructure.
Requirements:
5+ years of experience managing systems at scale as a DevOps Engineer, Site Reliability Engineer, or Platform Engineer
Excellent technical analytical skills with the ability to implement DDOS mitigation, troubleshoot complex problems, analyze system bottlenecks, and implement effective solutions, from frontend through backend systems, sometimes during production degradation or outage for a high traffic site
Exceptional command line Linux skills, with proficiency in Bash and Python for investigation of server and services issues, scripting, and automation
In-depth knowledge of AWS services, infrastructure as code using Terraform, GitOps tools and methodologies, and container orchestration using Docker, Helm, and Kubernetes
Experience with setup, administration, and maintenance of sharded MySQL database clusters while maintaining no downtime or data loss
Excellent communication skills with fluent English, and the ability to collaborate effectively across teams while articulating technical concepts to non-technical stakeholders
The ability to get up to speed on systems, make decisions, be flexible, and execute independently with attention to detail for production systems.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8220324
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
As a Senior DevOps Engineer, you will be responsible for building and maintaining scalable, reliable infrastructure and deployment pipelines with a strong emphasis on security integration throughout the software delivery lifecycle. You will work closely with development teams to improve development velocity while ensuring system reliability, security, and performance. This role is critical in bridging the gap between software development and operations, implementing DevSecOps best practices throughout our organization.

Main Responsibilities:
Infrastructure Management: Design, implement, and maintain cloud-based infrastructure using Infrastructure as Code principles
CI/CD Implementation: Build and optimize continuous integration and continuous deployment pipelines to enable rapid, reliable software delivery
Automation: Develop automation scripts and tools to streamline operations and eliminate manual processes
Containerization: Manage containerization strategies and orchestration using Docker and Kubernetes
Security Integration: Implement security scanning, testing, and validation throughout the CI/CD pipeline
Vulnerability Management: Conduct regular security assessments and remediate vulnerabilities in infrastructure and application code
Compliance Automation: Automate compliance checks and reporting to ensure adherence to security standards
Performance Optimization: Analyze and optimize system performance, scalability, and cost-efficiency
Documentation: Create and maintain thorough documentation for infrastructure, deployment processes, and operational procedures
Incident Response: Participate in on-call rotations and lead incident resolution with thorough post-mortem analysis
Requirements:
5+ years of experience in DevOps, DevSecOps, or similar roles
Cloud Platforms: Extensive hands-on experience with AWS services and architecture patterns
Infrastructure as Code: Proficiency with Terraform, AWS Cloud Formation, or similar IaC tools
Containerization: Advanced knowledge of Docker and Kubernetes ecosystem
Kubernetes Technologies: Experience with ArgoCD, Prometheus, Grafana, and other Kubernetes tooling
Programming/Scripting: Strong coding skills in Python, Bash, or Go
CI/CD Tools: Experience implementing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or similar
Advantage - Security Tools: Experience with security scanning tools (SonarQube, OWASP ZAP, Snyk, etc.)
Advantage - Networking: Solid understanding of networking principles, load balancing, and security concepts
Exceptional problem-solving abilities and analytical thinking
Strong communication skills with the ability to explain complex technical concepts to various audiences
Collaborative mindset with experience working in cSECross-functional teams
Self-motivated with the ability to work independently
Proactive approach to identifying and resolving potential issues before they impact production

Advantage:
experience with PostgreSQL, NoSQL, Shell scripting, Networking, Firewalls, System security.
Strong background in software development with security focus
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8220838
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
Required Site Reliability Engineer- Infra
Realize your potential by joining the leading performance-driven advertising company!
As a Site Reliability Engineer- infra, on our Infrastructure team at the TLV office, you will play a key role in ensuring the reliability, scalability, and performance of our critical systems. You will be responsible for managing and improving our core infrastructure, with a focus on automation, monitoring, and incident response. You will work with a wide range of technologies, including Kubernetes, monitoring and observability tools, configuration management systems, and core networking services.
How youll make an impact:
As a Site Reliability Engineer, youll bring value by:
Ensure the reliability, availability, and performance of our infrastructure services.
Manage and maintain our Kubernetes infrastructure, including KubeVirt.
Design, implement, and maintain our monitoring and observability stack (SensuGo, VictoriaMetrics, Prometheus, ELK).
Automate infrastructure provisioning, configuration, and deployment processes using Puppet and Ansible.
Manage and maintain core services such as DNS and networking.
Troubleshoot and resolve complex infrastructure issues in a timely and efficient manner.
Participate in on-call rotations and incident response.
Develop and maintain infrastructure-as-code (IaC).
Identify and implement proactive measures to prevent incidents and improve system reliability.
Collaborate with development teams to ensure smooth and reliable deployments.
Contribute to the design and implementation of new infrastructure solutions.
Drive improvements in system architecture, processes, and tools.
Mentor and coach other team members.
Requirements:
To thrive in this role, youll need:
5+ years of experience in a Site Reliability Engineering, Systems Engineering, or similar role.
Deep understanding of Site Reliability Engineering principles and practices.
Extensive experience with Kubernetes, including deployment, management, and troubleshooting.
Strong experience with monitoring and observability tools such as SensuGo, Zabbix, VictoriaMetrics, Prometheus, and ELK.
Proficiency in configuration management tools such as Puppet and Ansible.
Solid understanding of Linux internals and networking.
Experience with managing and maintaining core services such as DNS and networking.
Strong programming skills in Python and/or Go.
Experience with both on-premises and cloud environments.
Experience with KubeVirt.
Excellent troubleshooting and problem-solving skills.
Strong communication and collaboration skills.
Ability to work in a fast-paced, dynamic environment.
Ability to participate in on-call rotations including weekends.
Preferred Qualifications:
Experience with large-scale, distributed systems.
Experience with other cloud providers (e.g., AWS, Azure, GCP).
Contributions to open-source projects.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8205377
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 8 שעות
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a Site Reliability Engineer to join our DevOps team. You will ensure the reliability, performance, and scalability of our back-office solutions, which serve as the foundation for the entire purchasing process. This role will lead the development of SRE capabilities, meeting SLI/SLO/SLA targets, and establishing effective monitoring systems. You will enhance our Software Development Lifecycle by integrating reliability and scalability, working with cross-functional teams, and supporting production environments. Additionally, you will implement incident management processes and conduct post-mortem analyses to drive continuous improvement. If you have a strong engineering and automation background and are passionate about the E-commerce field, then we would love to hear from you.
Roles and Responsibilities:
Develop and implement SRE capabilities to enhance the reliability, availability, and performance of Admin solutions.
Design and maintain proactive monitoring and alerting systems for deep visibility into critical business flows, beyond simple statuses, to identify functional issues.
Drive improvements in the Software Development Lifecycle (SDLC) for reliability and scalability from design to deployment.
Collaborate with development and operations teams to troubleshoot production incidents affecting the purchase flow through root cause analysis.
Lead SRE initiatives to boost system resilience and operational efficiency.
Implement best practices for incident management and conduct blameless post-mortems, contributing to capacity planning and performance testing to ensure scalability.
Requirements:
5+ years of experience as a Site Reliability/DevOps Engineer
Deep understanding of E-commerce flows, specifically with back-office operations and order processing - must
Experience as an Automation/Software Engineer with a strong understanding of software development principles and in building, testing, and deploying distributed systems - must
Experience in designing, implementing, and utilizing monitoring and observability platforms such as DataDog, NewRelic, Prometheus/Grafana, or ELK stack - must
Proficiency in scripting and automation using languages such as Python, Java, etc. - must
Ability to create dashboards, alerts, and insightful queries - must
Experience with AWS services to build and operate scalable and resilient applications (e.g., EC2, ECS/EKS, RDS, S3, Lambda, CloudWatch) - plus
Experience in automating infrastructure provisioning, application deployments, and repetitive operational tasks - plus
Proactive approach with excellent problem-solving skills
Strong collaborator, with an ability to work with cross-functional teams
Proficient in English.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8255386
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Required Site Reliability Engineer
Realize your potential by joining the leading performance-driven advertising company!
As Site Reliability Engineer on the IT Production team in our TLV Office, youll play a vital role in building robust services and solving infrastructure challenges with automations while working with cutting-edge technologies and bringing those to their limits on our mostly on-prem cloud like infrastructure.
How youll make an impact:
As a Site Reliability Engineer, youll bring value by:
Ensure Reliability & Scalability: Design, implement and manage highly reliable and scalable distributed systems across our on-premise, cloud and AI/ML environments. Proactively optimize performance, efficiency, resource utilization and cloud cost.
Drive Automation: Automate repetitive tasks, infrastructure provisioning, configuration and deployments using IaC and scripting languages (e.g., Python, Go, Rust).
Develop Observability & Capacity: Implement comprehensive monitoring and alerting systems to ensure system health. Collaborate on capacity planning to meet future growth.
Maintain Security & Compliance: Integrate security best practices and ensure compliance with industry standards.
Lead Incident Management: Participate in on-call rotations, lead incident responses and conduct root cause analysis to minimize downtime.
Foster Collaboration & Improvement: Work closely with development, operations and security teams to drive shared responsibility and continuous improvement in SRE practices.
Our Tech Stack:
Linux, Kubernetes, nginx, Istio, AWS, GCP, Azure, Alicloud, Fastly, Terraform, Consul, Prometheus, Loki, Grafana, Airflow, Redis, Kafka, Vector, Hadoop, Cassandra, Vertica, MySQL, HDFS, ELK.
Requirements:
7 years of experience as an SRE, DevOps Engineer, System Administrator in a large distributed environment with focus on Linux operating systems.
Experience supporting, troubleshooting and scaling large distributed systems in production.
Deep understanding of HTTP protocol, including HTTP/1.1, HTTP/2, caching semantics, TLS and gRPC delivery.
Experience configuring and operating CDN services (e.g., Akamai, Fastly, Cloudflare, AWS CloudFront).
Deep understanding in Linux system internals and system performance tuning.
Experience with Configuration Management Tools (Puppet, Ansible, Chef, Terraform).
Experience programming in at least one of the following languages (Python, Golang, Rust, Ruby, C++, Java).
Experience with monitoring and metrics collection systems (Prometheus, Grafana, ELK).
Experience with cloud providers and platforms (AWS, Azure, GCP, Alibaba).
Experience with containerization technologies (Kubernetes, Docker).
Deep understanding of networking principles (TCP/IP, DNS, load balancing).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8205371
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
3 ימים
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
At UVeye, we're on a mission to redefine vehicle safety and reliability on a global scale. Founded in 2016, we have pioneered the world's first fully automated suite of vehicle inspection systems. At the heart of this innovation lies our advanced AI-driven technology, representing the pinnacle of machine learning, GenAI, and computer vision within the automotive sector. With close to $400M in funding and strategic partnerships with industry giants such as Amazon, General Motors, Volvo, and CarMax, UVeye stands at the forefront of automotive technological advancement. Our growing global team of over 200 employees is committed to creating a workplace that celebrates diversity and encourages teamwork. Our drive for innovation and pursuit of excellence are deeply embedded in our vibrant company culture, ensuring that each individual's efforts are recognized and valued as we unite to build a safer automotive world.
We are looking for a DevOps Engineer to join our DevOps R&D team. In this position, you will be responsible for integrating developers and operations teams to improve collaboration and productivity by automating infrastructure, automating workflows, and continuously measuring application performance.
A day in the life and how you’ll make an impact:
* Establish, maintain, and evolve concepts in continuous integration and deployment (CI/CD) pipelines for existing and new services.
* Collaborate with Engineering and Operations teams to improve automation of workflows, infrastructure, code testing, and deployment of on-premise and cloud services.
* Remain up-to-date on industry trends, share knowledge among teams, and abide by industry best practices for configuration management and automation.
* Implement effective monitoring and increase the sophistication of our alerting and escalation mechanisms
* Identify and resolve performance and scalability issues in products and infrastructure.
Requirements:
* 5+ years of experience in systems and production engineering and 3+ years of DevOps experience in a Linux environment
* Experience maintaining and deploying highly available, fault-tolerant systems at scale
* Experience in developing Python and scripting using bash
* Practical experience with Docker containerization and clustering (Kubernetes)
* Experience with configuration management tools (e.g. Ansible, Terraform)
* Experience implementing CI/CD (e.g. Jenkins,, GitHub actions, bitbucket pipelines)
* Experience with cloud providers (eg: AWS, GCP)
Ideally, we’re looking for:
* Bachelor's or master’s degree in CS
* AWS Certification
* Experience working in and advocating for agile environments
* Knowledge of Linux Kernel fundamentals, including job management, memory management, file systems, networking & debugging

Why UVeye: Pioneer Advanced Solutions: Harness cutting-edge technologies in AI, machine learning, and computer vision to revolutionize vehicle inspections. Drive Global Impact: Your innovations will play a crucial role in enhancing automotive safety and reliability, impacting lives and businesses on an international scale. Career Growth Opportunities: Participate in a journey of rapid development, surrounded by groundbreaking advancements and strategic industry partnerships.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8010890
סגור
שירות זה פתוח ללקוחות VIP בלבד