דרושים » מחשבים ורשתות » Site Relaibility Engineer

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 5 שעות
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for an experienced SRE Engineer to drive the reliability, observability, and automation practices across our private cloud infrastructure and operations. In this role, you will be in a team of site reliability engineers, own the engineering roadmap for monitoring and automation, and act as a key liaison between development, operations, and platform teams. You bring at least 4+ years of hands-on people management experience and a deep technical background in SRE or DevOps disciplines.

What will you do?

Automation & Infrastructure
- Design, develop, and maintain automation tools to support infrastructure and operations teams at scale.
- Manage pipelines and infrastructure workflows using Jenkins, Ansible, Python, and Bash.
- Drive the adoption of infrastructure-as-code practices across the organization.
- Collaborate with system engineers to improve scalability, performance, and fault tolerance of critical systems.

Monitoring & Observability
- Build and extend monitoring and alerting systems using Grafana, the ELK (Elastic) stack, Zabbix, and custom scripts.
- Implement and enforce observability best practices to ensure full visibility into systems, applications, and infrastructure.
- Define and track SLIs, SLOs, and error budgets across key services.
- Partner with development teams to embed observability earlier in the software development lifecycle.

Database & Platform Support
- Support monitoring and infrastructure integration for databases including MongoDB and PostgreSQL.
- Maintain documentation and champion knowledge sharing around automation, monitoring, and reliability practices.
Requirements:
What you need:

4+ years of overall experience in SRE, DevOps, or infrastructure automation roles.

Strong scripting skills in Python and Bash; comfortable building and maintaining production-grade automation.

Hands-on experience with infrastructure automation tools, particularly Ansible.

Solid experience with monitoring and observability platforms - ELK stack, Grafana, and Zabbix.

Good understanding of CI/CD pipelines and related tooling, including Jenkins.

Familiarity with managing and monitoring MongoDB and PostgreSQL in a production environment.

Comfortable working in Linux-based environments.

Excellent problem-solving skills and strong written and verbal communication.


Ability to support the following:
Experience with cloud providers - AWS, GCP, or Azure.
Exposure to containerization technologies such as Docker and Kubernetes.
Familiarity with infrastructure provisioning using Terraform.
Experience introducing SRE practices (SLOs, error budgets, chaos engineering) at an organizational level.
Exposure and experience with migrating/ building AI tools to improve process.
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8662378
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 3 שעות
דרושים בCrowdStrike
Location: Tel Aviv-Yafo
Job Type: Full Time
CrowdStrike's Data Science Studio is seeking a pioneering Senior MLOps Engineer to establish and lead our MLOps function from the ground up. As the first MLOps engineer in the studio, you will play a foundational role in shaping how we build, deploy, and scale machine learning systems that protect thousands of organizations worldwide.

This is a unique opportunity to define the technical strategy, influence the technology stack, and architect the infrastructure that will power our AI/ML-driven security solutions for years to come.

This role combines strategic vision with hands-on execution. You'll work at the intersection of data science, engineering, and production operations - building production-grade systems that operate at immense scale while collaborating closely with highly technical data scientists and ML engineering teams across CrowdStrike.

What You'll Do:
- Architect MLOps infrastructure from the ground up: Design and implement the foundational MLOps platform, establishing best practices, tooling, and workflows that will scale with our growing data science initiatives
- Define technology strategy: Evaluate, select, and integrate MLOps technologies and platforms that best serve our needs - from experiment tracking and model versioning to deployment pipelines and monitoring systems
- Build production-grade ML pipelines: Develop robust, scalable pipelines for model training, validation, deployment, and monitoring that handle massive data volumes and ensure reliability in production
- Enable data scientist productivity: Create tools, frameworks, and automation that empower data scientists to move quickly from research to production while maintaining high quality and reliability standards
- Establish monitoring and observability: Implement comprehensive monitoring, logging, and alerting systems to ensure ML models perform optimally in production and issues are detected proactively
- Drive MLOps culture and practices: Champion best practices in ML engineering, CI/CD for ML, model governance, and reproducibility across the data science organization
- Collaborate cross-functionally: Partner closely with data scientists to understand their workflows and pain points, and work with ML engineering teams to ensure seamless integration with broader platform capabilities
 -Scale for the future: Design systems with scalability, security, and maintainability in mind, anticipating the needs of a rapidly growing ML portfolio
Requirements:
- 6+ years of experience in MLOps, ML engineering, DevOps, or related infrastructure roles with focus on machine learning systems
- Production ML systems expertise: Proven track record of building and operating ML systems at scale in production environments
- Strong infrastructure and automation skills: Deep knowledge of cloud platforms (AWS, Azure, or GCP), containerization (Docker, Kubernetes), and infrastructure-as-code (Terraform, CloudFormation)
- ML pipeline proficiency: Hands-on experience with ML workflow orchestration tools (e.g., Airflow, Kubeflow, MLflow, Metaflow) and building end-to-end ML pipelines
- Programming excellence: Strong coding skills in Python; experience with additional languages is a plus
- CI/CD and DevOps practices: Expertise in building automated deployment pipelines, version control, and modern DevOps methodologies
- Strategic and hands-on balance: Ability to think architecturally about long-term solutions while rolling up your sleeves to implement them
- Collaborative mindset: Excellent communication skills and ability to work effectively with data scientists, engineers, and stakeholders with varying technical backgrounds
- Startup mentality: Comfort with ambiguity and ability to build from scratch in a fast-paced environment
This position is open to all candidates.
 
Show more...
הגשת מועמדות
עדכון קורות החיים לפני שליחה
8611396
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 5 שעות
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for an experienced SRE Team Lead to drive the reliability, observability, and automation practices across our private cloud infrastructure and operations. In this role, you will lead a team of site reliability engineers, own the engineering roadmap for monitoring and automation, and act as a key liaison between development, operations, and platform teams. You bring at least 3-4 years of hands-on people management experience and a deep technical background in SRE or DevOps disciplines.



What will you do?

Leadership & Team Management

Lead, mentor, and grow a team of SREs, providing technical direction, career development guidance, and day-to-day management.

Own the team roadmap for reliability, observability, and automation initiatives - prioritizing work, removing blockers, and driving delivery.

Conduct regular 1:1s, performance reviews, and hiring processes to build and sustain a high-performing team.

Foster a culture of operational excellence, blameless post-mortems, and continuous improvement.

Act as an escalation point for complex incidents and reliability issues, leading post-incident reviews and ensuring follow-through on action items.


Automation & Infrastructure

Design, develop, and maintain automation tools to support infrastructure and operations teams at scale.

Manage pipelines and infrastructure workflows using Jenkins, Ansible, Python, and Bash.

Drive the adoption of infrastructure-as-code practices across the organization.

Collaborate with system engineers to improve scalability, performance, and fault tolerance of critical systems.


Monitoring & Observability

Build and extend monitoring and alerting systems using Grafana, the ELK (Elastic) stack, Zabbix, and custom scripts.

Implement and enforce observability best practices to ensure full visibility into systems, applications, and infrastructure.

Define and track SLIs, SLOs, and error budgets across key services.

Partner with development teams to embed observability earlier in the software development lifecycle.


Database & Platform Support

Support monitoring and infrastructure integration for databases including MongoDB and PostgreSQL.

Maintain documentation and champion knowledge sharing around automation, monitoring, and reliability practices.
Requirements:
What you need:

Experience & Leadership

3-4+ years of experience in a people management or team lead capacity within SRE, DevOps, or infrastructure engineering.

5-8+ years of overall experience in SRE, DevOps, or infrastructure automation roles.

Proven track record of building, coaching, and retaining high-performing engineering teams.

Experience owning an engineering roadmap and driving cross-functional reliability initiatives.



Technical Skills

Strong scripting skills in Python and Bash; comfortable building and maintaining production-grade automation.

Hands-on experience with infrastructure automation tools, particularly Ansible.

Solid experience with monitoring and observability platforms - ELK stack, Grafana, and Zabbix.

Good understanding of CI/CD pipelines and related tooling, including Jenkins.

Familiarity with managing and monitoring MongoDB and PostgreSQL in a production environment.

Comfortable working in Linux-based environments.

Excellent problem-solving skills and strong written and verbal communication.



Ability to support the following:

Experience with cloud providers - AWS, GCP, or Azure.

Exposure to containerization technologies such as Docker and Kubernetes.

Familiarity with infrastructure provisioning using Terraform.

Experience introducing SRE practices (SLOs, error budgets, chaos engineering) at an organizational level.

Exposure and experience with migrating/ building AI tools to improve process.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8662300
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
05/05/2026
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
As a Senior IT SRE Engineer, you will be a key player in ensuring the reliability, scalability, and performance of our critical IT infrastructure. You will leverage SRE principles and an automation-first mindset to build and maintain resilient hybrid cloud environments. This role is ideal for a candidate who thrives in a fast-paced, innovative setting and is passionate about solving complex challenges with cutting-edge technology.
Key Responsibilities
Provision, configure, and support resilient hybrid cloud deployment architectures using an Infrastructure-as-Code framework.
Proactively collaborate with development teams to ensure new applications are production-ready, scalable, and reliable from inception.
Develop and maintain tools and frameworks to automate operational tasks, including deployment, monitoring, and recovery.
Conduct thorough root cause analysis of production issues and implement preventative measures to improve system resilience, demonstrating strong problem-solving skills.
Manage CI/CD platforms, Linux infrastructure, and contribute to capacity planning and operational runbooks.
Design and implement proactive service monitoring, alerting, and trend analysis to maintain service availability and performance SLAs.
Participate in an on-call rotation to support critical applications and services, responding to and resolving incidents efficiently.
Contribute to comprehensive documentation related to infrastructure design, deployment, and operational procedures.
Requirements:
Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.
6+ years of Devops engineering experience on mission-critical, enterprise-level systems in a hybrid (both cloud and on-prem) environment.
3+ years of hands-on experience with cloud environments, preferably Google Cloud Platform (GCP).
Expertise in configuration management and Infrastructure-as-Code using frameworks such as Terraform and Ansible.
Strong programming/scripting knowledge in languages like Python, Bash, or Go for infrastructure automation.
Demonstrated experience with CI/CD pipelines (e.g., GitHub, Jenkins, Artifactory) and a strong foundation in Linux/Unix administration.
Preferred Qualifications
Experience with containerization and orchestration technologies, particularly Kubernetes.
Hands-on experience with monitoring and observability tools such as Datadog, Grafana, or Prometheus.
Understanding of networking principles including firewalls, load balancers, and complex network designs.
A curious and positive mindset with a passion for applied learning and challenging existing processes for continuous improvement.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8637997
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
Required Al Infrastructure & Reliability Engineer
What this role is really about
Youll join a 3-person platform team within our Business Technology group -owning the internal infrastructure that our AI platform and its users depend on. This isnt a product engineering role, and it isnt ticket work or babysitting pipelines someone else built. Youre building and operating the internal foundation that the company runs on. The work covers the full stack of platform engineering: core cloud infrastructure (AWS, Kubernetes, IaC), CI/CD pipelines, AI-driven infrastructure components, and the SRE and observability practice that keeps it all honest -metrics, alerting, incident response, and reliability standards. As our AI capabilities grow, so does the complexity underneath them, and staying ahead of that is central to the role. If you treat infrastructure as a product -reusable, automated, observable, and built to last -this is your kind of role.
Job responsibilities
DevOps & AI-Driven Infrastructure - own CI/CD, deployment processes, and release reliability. Build and operate cloud infrastructure that is automated, intelligent, and continuously self-improving - not just managed.
Design and build our Terraform repository and IaC pipeline from scratch -AI-assisted generation, drift detection, and policy enforcement built in.
Build AI-driven GitHub Actions pipelines -automated code review, risk assessment, and intelligent deployment decisions.
Manage Kubernetes workloads across AWS accounts -zero downtime, fully automated, nothing left behind.
Embed AI into the operational layer -proactive drift detection, automated remediation, and intelligent scaling toward a self-healing runtime.
Reliability & SRE -improve uptime, resilience, and incident response.
Define and enforce SLOs/SLIs, error budgets, and on-call practices.
Lead incident response, postmortems, and systemic reliability improvements.
Own AI-specific reliability: model latency SLOs, token quota monitoring, rate limit handling, fallback and retry strategies, and cost-per-request alerting.
Observability & Telemetry - increase visibility, reduce noise, improve troubleshooting.
Establish and continuously evolve the observability stack: metrics, logs, distributed tracing, and alerting tuned for both application and AI workloads.
AI / LLM Operations- bringing AI systems to production and operating them at scale, with a focus on reliability, performance, and trust.
Own the AI infrastructure layer: rate limits, quota management, latency SLOs, and fallback strategies (retries, circuit breakers).
Operate LLM APIs in production with resilience and cost attribution per team/model.
Requirements:
2-4 years Hands-on DevOps, SRE, or infrastructure engineering in production SaaS environments.
Strong AWS experience: multi-account architecture, cross-account IAM, serverless and event-driven services (Lambda, SQS, SNS, EventBridge), and EKS cluster management.
Proven Kubernetes experience in production, including cross-account migrations and stateful workload management.
Proficiency with Terraform - repository structure design, module architecture, and CI/CD pipeline implementation.
Hands-on experience building and maintaining GitHub Actions pipelines for end-to-end CI/CD workflows.
Working Python proficiency for scripting, internal tooling, and workflow automation.
Practical experience implementing observability stacks from scratch: metrics, logging, distributed tracing, and alerting.
Experience owning reliability practices: SLOs, incident response, and postmortem culture.
Nice to have
Hands-on experience operating LLM APIs in production: rate-limit and quota management, cost attribution per team/model, latency monitoring, and resilience patterns (retries, fallbacks, circuit breakers).
FinOps experience across cloud, AI, and observability spend.
Experience introducing self-healing or auto-remediation patterns in production.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8659781
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
05/05/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
As a Senior DevOps Engineer supporting our Cortex Research Group, you will lead all DevOps and infrastructure initiatives that empower our researchers to move quickly, securely, and reliably. You will be responsible for designing, building, and maintaining the groups cloud environments, ensuring scalability, stability, and performance across a wide range of experimental and production workloads. Youll serve as the primary point of contact between the Research Group and other critical stakeholders-including Security, Networking, and Compliance teams-ensuring that research projects align with organizational standards while still enabling rapid innovation.
Key Responsibilities
Own and evolve the Research Groups cloud infrastructure and CI/CD pipelines to enable reproducible, automated, and scalable experimentation.
Define and implement standards for infrastructure-as-code, observability, monitoring, and resource optimization tailored to research use cases.
Proactively collaborate with security and compliance teams to enforce best practices for data governance, access controls, and regulatory requirements.
Partner with networking and platform engineers to integrate research workloads into the broader company ecosystem, ensuring seamless operation.
Serve as the primary technical liaison between the Research Group and stakeholders like Security, Networking, and Platform teams.
Mentor engineers and researchers on DevOps best practices, helping to instill a culture of operational excellence and applied learning.
Requirements:
Your Experience:
5+ years of demonstrated experience in a DevOps, Site Reliability Engineering (SRE), or cloud infrastructure role.
Strong proficiency with infrastructure-as-code (IaC) tools such as Terraform or Ansible.
Hands-on experience building and maintaining CI/CD pipelines using tools like Jenkins, GitLab CI, or GitHub Actions.
In-depth knowledge of at least one major cloud provider (GCP, AWS, Azure).
Preferred Qualifications
Experience with containerization and orchestration technologies, particularly Docker and Kubernetes.
Proficiency in a scripting or programming language such as Python or Go.
Familiarity with monitoring and observability tools like Prometheus, Grafana, or the ELK stack.
Experience supporting machine learning or research-focused environments.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8638096
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a Senior DevOps Engineer to join our R&D team in developing the next rising product in the health tech landscape. If you are looking for a challenging, influential position and are passionate about making an impact, this might be the role for you.

As a Senior DevOps Engineer, youll play a key role in the design, development, testing, deployment, and monitoring of our infrastructure and products. In this position, you'll make significant contributions to our observability stack, helping build and maintain robust systems for logs, metrics, traces, and alerting.

Our ideal candidate is passionate about DevOps and observability, has strong communication skills, and thrives on constant improvement for both technology and processes. If you enjoy working on multiple projects in parallel and are a proactive team player, youll fit right in.

This is a unique opportunity to join the core team of a fast-growing startup, where your contributions will have a direct impact on our product and success.

Responsibilities
Support and collaborate with cross-functional engineering teams using cutting-edge technologies.
Contribute to the design, implementation, and maintenance of monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, Loki).
Secure, scale, and manage our cloud environments (AWS and GCP).
Design and implement automation solutions for both development and production.
Manage and improve our CI/CD pipelines for fast and safe delivery
Lead best practices in infrastructure, observability, configuration management, and system hardening.
Continuously assess and improve existing infrastructure in line with industry standards.
Requirements:
5+ years of experience as a DevOps Engineer or similar software engineering role.
Proven experience with Docker and Kubernetes (EKS preferred).
Hands-on experience with monitoring and observability tools, including Prometheus, Grafana, Datadog, or similar.
Expertise in Terraform for AWS infrastructure-as-code deployments.
Strong collaboration and interpersonal communication skills.
Excellent analytical thinking and problem-solving mindset.
Proficiency with relational databases.
Solid knowledge of Python and Bash scripting.
Experience with test automation - an advantage.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8610670
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
13/05/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
Are you ready to kickstart your DevOps career and be part of the infrastructure powering the future of cybersecurity? Join Infinity Next-our companys next-generation, cloud-native platform delivering a suite of cutting-edge products such as WAF, SD-WAN, and more.
We are looking for a DevOps Engineer to join our dynamic and growing team. This is an opportunity to work alongside top engineers, gain hands-on experience in production environments, and help scale secure, high-performance services used by customers around the globe. You will be part of a fast-paced, startup-like environment, with the backing and stability of a global cybersecurity leader.
Key Responsibilities
Support the deployment and maintenance of scalable, multi-tenant environments across cloud platforms
Assist in automating infrastructure using Infrastructure-as-Code tools and CI/CD pipelines
Monitor and improve the reliability, performance, and security of platform services
Collaborate with development, product, and operations teams to deliver new features and improvements
Troubleshoot infrastructure and application issues in development and production environments
Implement custom user interfaces using the latest programming techniques and technologies
Design, develop, and maintain DevOps-related microservices that support platform automation and reliability
Design and integrate agentic AI capabilities into DevOps workflows to automate decision-making, incident response, and platform operations.
Requirements:
Bachelors degree in Computer Science or a related technical field
At least 3 years of experience as a DevOps Engineer
Strong interest in cloud technologies, DevOps methodologies, and automation
Knowledge of containerization and container orchestration technologies, such as Amazon EKS
Experience in the design, operation, and troubleshooting of Kubernetes core components and API extensions for cloud-native, distributed systems
Familiarity with Linux, basic networking, containers, and scripting
Understanding of CI/CD, cloud infrastructure, and monitoring concepts
Experience building maintainable and testable codebases, including API design and unit testing techniques
Hands-on experience applying GitOps principles to manage Kubernetes infrastructure and application deployments
Nice to Have
Exposure to Kubernetes in cloud environments such as Amazon EKS
Familiarity with Ingress Controllers, Kubernetes Gateway API, CloudFront, and Global Accelerator
Experience designing and developing Kubernetes operators and controllers
Experience or coursework with tools such as Terraform, Pulumi, Crossplane, and Helm
Hands-on experience with observability technologies such as Prometheus, Grafana, OpenTelemetry, and centralized logging systems.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8650194
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a hands-on DevOps Team Lead to take ownership of our infrastructure, DevOps practices, and automation pipelines.
You will be the technical and operational lead for a small but growing DevOps team, driving reliability, scalability, and security across our cloud environments.
In this role, you will split your time between leading and mentoring the team, designing and evolving infrastructure, and implementing solutions.
What Youll Do
Lead, mentor, and grow the DevOps team.
Define and enforce DevOps best practices across infrastructure, CI/CD, and security.
Manage the SeaPod Lab environment for developer and test usage.
Operate and evolve the SeaPod Server Linux infrastructure, deployed at scale worldwide, handling complex connectivity and security.
Maintain consistent baselines, update tools, and ensure fleet-wide monitoring and support.
Design, manage, and evolve AWS infrastructure (VPC, IAM, networking, RDS, EKS, etc.).
Operate and upgrade Kubernetes/EKS clusters, manage Helm charts, operators, and custom resources.
Define namespace policies, quotas, and resource allocations.
Drive security, compliance, and cost optimization.
Maintain and enhance GitLab CI pipelines for multiple workloads (Lambda, EKS, EC2, etc.).
Integrate testing, linting, and vulnerability scans into CI/CD workflows.
Build reusable pipeline components for microservices.
Own monitoring and alerting strategies (Grafana, CloudWatch, Coralogix, Prometheus).
Operate and tune PostgreSQL (RDS, Aurora) and manage backups/restores.
Manage distributed tracing. Lead upgrade from Fluentd → OpenTelemetry.
Architect and deploy serverless solutions (Lambda, DynamoDB, API Gateway).
Integrate with event-driven services (SNS/SQS, Kinesis, RDS Proxy).
Manage IAM roles/policies, secrets, and security posture.
Requirements:
5+ years of hands-on DevOps, including 2+ years in a leadership or mentoring role.
Strong production experience with AWS services (VPC, RDS, EKS, IAM, Lambda).
Proven track record operating Kubernetes/EKS clusters at scale.
Expertise with Terraform (or similar IaC tools) and GitLab CI/CD (or equivalent).
Solid background in Linux systems administration, ideally managing large distributed fleets.
Practical experience with PostgreSQL in production (replication, tuning, backup/restore).
Hands-on with observability stacks (Prometheus, Grafana, CloudWatch, OpenTelemetry).
Experience designing and operating secure, compliant environments (SOC2/ISO27001 familiarity a plus).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8610254
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
13/05/2026
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
As the worlds leading vendor of Cyber Security, facing the most sophisticated threats and attacks, weve assembled a global team of the most driven, creative, and innovative people. At our company, our employees are redefining the security landscape by meeting our customers real-time needs and providing our cutting-edge technologies and services to an ever-growing customer base.
our company Software Technologies has been honored by Time Magazine as one of the Worlds Best Companies and recently Gartner rated our company email security as a market leader for product, detection and innovation. We've also earned a spot on the Forbes list of the Worlds Best Places to Work for five consecutive years (2020-2024) and recognized as one of the Worlds Top Female-Friendly Companies. If you're passionate about making the world a safer place and want to be part of an award-winning company culture, we invite you to join us.
our company Harmony Email Security and Collaboration (Previously AVANAN) is a unique email solution that fully secures cloud email and cloud platforms using AI.
we are seeking a promising and talented DevOps Cloud Engineer to join our DevOps group. If you thrive in a fast-paced, dynamic environment, can handle multiple requests simultaneously, and enjoy working independently as part of a cutting-edge DevOps team, this is your opportunity to help make the world a safer place!
Key Responsibilities
Act as a DevOps Engineer within a highly skilled team, responsible for large-scale operations from development to production
Design, develop, and maintain Avanans CI/CD solutions, including operating systems, containers, cloud orchestration, and full end-to-end automation
Implement tools and procedures for monitoring, deployment, and alerting across our SaaS multi-tenant product family
Participate in the large-scale migration of a highly complex system into a secured, regulation-compliant environment
Continuously improve our cloud infrastructure to ensure fault tolerance, scalability, and security
Plan capacity, stabilize, and enhance the performance of application infrastructure with cost efficiency and scaling in mind
Design and shape our monitoring and logging solutions
Execute all tasks with top-notch cloud infrastructure security as a guiding principle.
Requirements:
Hands-on mindset - we all write code daily!
3+ years of relevant DevOps experience building CI/CD pipelines for both development and production - must
2+ years of AWS Cloud experience working with high-traffic systems and multiple services - must
Strong scripting skills, with fluency in Python - must
Experience with AI SRE agents
Experience with containers and orchestration tools (Docker, Kubernetes, or ECS) - must.
Experience with CI integration tools such as Jenkins
Familiarity with AWS CloudFormation - an advantage
Exposure to a wide range of open-source technologies (Redis, Nagios, Grafana, Prometheus, etc.)
Knowledge of best practices in security, performance, and monitoring.
Proven ability to research, evaluate, and implement new technologies, including running proof of concepts and cost analysis.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8650178
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
04/05/2026
חברה חסויה
מיקום המשרה: תל אביב יפו
סוג משרה: משרה מלאה
We are seeking a skilled and motivated DevOps Infrastructure Engineer to join our DevOps Infra team. Our team is responsible for managing and evolving the cloud-native infrastructure that powers our microservices architecture. Core responsibilities span our EKS-based Kubernetes platform, ArgoCD-driven GitOps pipelines, infrastructure observability, Helm-based deployments, and mission-critical web services running on AWS.
We are looking for a DevOps engineer who can hit the ground running, take ownership of critical infrastructure components, and contribute meaningfully from day one. The ideal candidate brings deep Kubernetes expertise, strong hands-on experience with observability tooling, and the maturity to work independently.
In this role, you will be responsible for:
Managing and evolving our EKS-based Kubernetes platform and Helm-based deployment pipelines
Owning and maintaining GitOps workflows using ArgoCD, including troubleshooting sync and rollout issues
Designing, building, and maintaining observability solutions using Prometheus, VictoriaMetrics, and Grafana
Writing and maintaining infrastructure as code using Terraform, including modules, remote state, and CI/CD automation
Taking full ownership of AWS infrastructure components - including networking, compute, IAM, and storage - ensuring reliability, security, and operational excellence across environments
Collaborating with developers and SREs to support reliable, scalable, and secure AWS infrastructure
דרישות:
1-3 years of hands-on experience in DevOps or infrastructure engineering roles.
Deep expertise in Kubernetes and Helm, including production-grade deployments and live incident troubleshooting.
Strong proficiency in Terraform or equivalent IaC tooling
Solid working knowledge of AWS core services (EC2, IAM, S3, VPC, CloudWatch, EKS).
Practical experience with Prometheus, VictoriaMetrics, Grafana, and alerting stack design.
Proven ability to work independently, take ownership end-to-end, and communicate effectively across engineering teams.
Agentic DevOps experience working with common AI assistant tools, MCPs and Agents.
Advantages:
Experience with cloud cost optimization strategies and tooling.
Background in cloud-native security practices (RBAC, policy enforcement,SSL, MTLS etc).
Prior involvement in designing or operating high-availability, fault-tolerant systems.
Experience with nginx and IIS web servers. המשרה מיועדת לנשים ולגברים כאחד.
 
עוד...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8636122
סגור
שירות זה פתוח ללקוחות VIP בלבד