דרושים » תוכנה » Senior Platform Engineer, Observability

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
7 ימים
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking a Senior Platform Engineer, Observability to join our Observability team. This role offers the opportunity to work at the intersection of software development and platform engineering, contributing to the tools, systems, and practices that improve visibility, reliability, and operational excellence across our engineering organization.

This position is ideally suited for experienced software engineers who are passionate about building high-quality systems and are interested in expanding their expertise in observability, distributed systems, and developer experience. You will help design, build and maintain systems that empower engineers across us to monitor, understand, and troubleshoot their services more effectively.

Our observability team is responsible for delivering scalable and user-friendly solutions to over 150 engineers working across more than 20 teams. Were focused on enabling rapid incident detection and resolution, improving our reliability posture, and supporting a culture of continuous improvement.

What you'll be doing:
Design, build, and maintain observability tools and infrastructure that help our engineers provide actionable insights into the performance and reliability of our systems.
Collaborate with other engineers and teams to enhance the developer experience around monitoring, logging, alerting, and tracing.
Develop and evolve our internal tooling to simplify the process of instrumenting and observing services.
Partner with engineering teams to improve incident response and recovery workflows, and ensure systems meet internal SLOs/SLAs and reliability targets.
Support the migration from our legacy ELK stack to a modern observability platform using Prometheus, Mimir, Grafana, Honeycomb, Loki, Quickwit, and OpenTelemetry.
Contribute to knowledge sharing and the ongoing development of best practices in observability across the organisation.
Requirements:
What you'll need:
4+ years of professional experience as a software engineer, with a strong foundation in building and maintaining production systems.
Proficiency in one or more modern programming languages such as Python, Java, JavaScript, or Ruby.
Familiarity with Kubernetes, AWS, and infrastructure-as-code tools such as Terraform.
Experience working with observability tools and platforms (e.g. Prometheus, Grafana, ELK, Honeycomb, Loki, or similar).
A strong interest in developer experience and platform tooling, with the ability to empathise with engineering teams as internal customers.
Excellent communication skills, with the ability to collaborate effectively across teams and explain complex technical concepts clearly.
A proactive mindset focused on long-term impact, sustainable engineering practices, and continuous improvement.

Preferred Qualifications:
Experience with OpenTelemetry or distributed tracing systems.
Understanding of observability-driven development and service reliability principles (e.g. SRE, MTTR, SLIs/SLOs).
Experience optimising observability systems for cost and performance at scale.
Knowledge of microservices architectures and how to monitor and debug distributed systems.
Contributions to open-source projects in the observability or monitoring space
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8274690
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
21/07/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
we are seeking a Site Reliability Engineer who excels at bridging the gap between infrastructure and development. In this role, you will work closely with engineering teams to ensure the reliability, scalability, and performance of our systems. A strong emphasis will be placed on observability - designing and implementing effective monitoring, logging, tracing and alerting solutions to provide deep visibility into system behavior. You should be comfortable collaborating with developers, presenting technical insights, and helping shape best practices. Your responsibilities will include incident management, automation and improvement of our observability solutions, and continuous performance tuning to ensure our platform can scale and evolve with our business needs.

Role:
Ensure production systems meet or exceed established SLAs and SLOs by actively maintaining and enhancing system performance and uptime.
Design and maintain end-to-end observability systemsincluding monitoring, logging, and distributed tracingto detect anomalies and enable proactive issue resolution.
Work closely with engineering teams to improve how their applications are monitored and alerted on. Help define meaningful alerts, reduce noise, and ensure developers are accountable for the operational health of their services.
Optimize application performance on Kubernetes through resource tuning, scaling strategies, and deep performance analysis.
* Provide guidance on reliability-first design, instrumenting code for observability, and using Grafana dashboards to drive decision-making and incident response.
Requirements:
5+ years in SRE, DevOps, or Production Engineering roles
Deep expertise in AWS, Kubernetes, Linux
Being responsible of deploying and tuning monitoring tools like Prometheus, Thanos and any time-series databases for storing metrics.
Logging responsibilities with ELK stack, Loki, Grafana or any alternatives.
Experience with tracing opentelemetry, tempo, jaeger
Strong understanding of incident management processes and best practices.
Experience with automation tools and practices for deployment and infrastructure management.
Excellent communication and collaboration skills, with the ability to work effectively in a team environment.
Ownership mindset, proactive and reliable
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8268431
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
21/07/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
we are seeking a Site Reliability Engineer who excels at bridging the gap between infrastructure and development. In this role, you will work closely with engineering teams to ensure the reliability, scalability, and performance of our systems. A strong emphasis will be placed on observability - designing and implementing effective monitoring, logging, tracing and alerting solutions to provide deep visibility into system behavior. You should be comfortable collaborating with developers, presenting technical insights, and helping shape best practices. Your responsibilities will include incident management, automation and improvement of our observability solutions, and continuous performance tuning to ensure our platform can scale and evolve with our business needs.

Role:
Ensure production systems meet or exceed established SLAs and SLOs by actively maintaining and enhancing system performance and uptime.
Design and maintain end-to-end observability systemsincluding monitoring, logging, and distributed tracingto detect anomalies and enable proactive issue resolution.
Work closely with engineering teams to improve how their applications are monitored and alerted on. Help define meaningful alerts, reduce noise, and ensure developers are accountable for the operational health of their services.
Optimize application performance on Kubernetes through resource tuning, scaling strategies, and deep performance analysis.
* Provide guidance on reliability-first design, instrumenting code for observability, and using Grafana dashboards to drive decision-making and incident response.
Requirements:
5+ years in SRE, DevOps, or Production Engineering roles
Deep expertise in AWS, Kubernetes, Linux
Being responsible of deploying and tuning monitoring tools like Prometheus, Thanos and any time-series databases for storing metrics.
Logging responsibilities with ELK stack, Loki, Grafana or any alternatives.
Experience with tracing opentelemetry, tempo, jaeger
Strong understanding of incident management processes and best practices.
Experience with automation tools and practices for deployment and infrastructure management.
Excellent communication and collaboration skills, with the ability to work effectively in a team environment.
Ownership mindset, proactive and reliable
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8268705
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
01/07/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Were looking for a Senior DevOps Engineer to join our newly formed Foundations Teama small, high-impact group responsible for the infrastructure, tools, and shared services that power our entire R&D organization.
In this role, youll design, build, and evolve internal platform infrastructure, CI/CD systems, and developer enablement tooling. Your mission is to empower developers across the company to work autonomously, by creating self-service tools, automation, and clear standards that reduce friction and increase reliability.
Youll collaborate closely with engineers across disciplines and partner with the Foundations Team Lead to shape DevOps practices that scale. This is a hands-on role for someone who thrives in high-velocity, mission-critical environments and is passionate about building tools that make developers faster, more productive, and confident in running their own services.
What Youll Do
Design and maintain scalable, developer-friendly CI/CD pipelines and deployment workflows.
Build self-service tooling and automation that enables teams to manage deployments, environments, secrets, and observability independently
Be responsible for cloud infrastructure and operations foundations
Implement and promote best practices for monitoring, logging, and alerting across services.
Operate and optimize Kubernetes-based production environments, ensuring performance, security, and stability.
Manage infrastructure using Infrastructure as Code (IaC) and ensure repeatability and traceability through tools like Terraform.
Collaborate with R&D teams to support onboarding to internal tooling and promote a culture of enablement over dependency.
Monitor cloud cost, ensuring our cloud operates efficiently.
Requirements:
4+ years of hands-on experience in DevOps or infrastructure engineering, ideally in high-velocity, mission-critical production environments.
Deep expertise in Kubernetes and containerized infrastructure, with experience deploying and managing workloads at scale.
Strong understanding of cloud infrastructure and operations, including networking, storage, compute, and securityGCP experience preferred.
Proficiency with Infrastructure as Code tools, especially Terraform, with a focus on automation and operational excellence.
Experience developing and managing CI/CD processes and tools, with a passion for improving developer workflows and release quality.
Strong debugging and problem-solving skills, with the ability to troubleshoot complex systems across the stack.
Highly self-motivated and organized, able to work independently in a fast-paced, collaborative environment.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8238951
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
14/07/2025
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
Cyberint, a market leader in External Risk Management, empowers global organizations to detect, respond, and remediate external threats efficiently. Now part of our company, Cyberint continues to grow and innovate at the intersection of cybersecurity and cloud-native SaaS technologies.
Join Our Operations Team
We are seeking a proactive, experienced Site Reliability Engineer (SRE) to join our dynamic Operations team. Youll be working on a cutting-edge SaaS solution that runs on AWS (EKS-based Kubernetes infrastructure), supporting an architecture with many moving parts. If you're driven by reliability engineering, love automation, and want to make an impact on mission-critical platforms, this role is for you.
What Youll Do
As an SRE at Cyberint, you will be instrumental in ensuring the observability, stability, and scalability of our platform. You will develop automated solutions and monitoring tools to proactively detect and respond to incidents, improve system resilience, and collaborate with engineering teams across the company to embed operational excellence into our product lifecycle.
Additionally, you will help evolve our AI-driven operational and monitoring tooling, including our on-call assistant bot, which leverages AI technologies to streamline incident resolution, automate repetitive tasks, and support real-time decision-making for engineers.
Key Responsibilities
Design, implement, and maintain monitoring and alerting systems (e.g., Prometheus, Grafana) to detect and prevent reliability issues.
Develop tools and automation (Python, Bash, etc.) for improving infrastructure reliability and operational efficiency.
Collaborate with R&D and Product teams to embed reliability-first principles into every stage of the development process.
Participate in and improve incident response processes, including running blameless postmortems and implementing preventive measures.
Enhance our Infrastructure-as-Code (IaC) and CI/CD practices to streamline deployments and reduce risk.
Maintain and extend internal AI-driven tools, such as bots that support SRE workflows (on-call management, triaging, etc.).
Document infrastructure, playbooks, and operational procedures to facilitate onboarding and knowledge sharing.
Requirements:
3+ years of experience in an SRE, DevOps, or similar role in a SaaS/cloud-native environment.
Strong experience with Kubernetes, AWS, and cloud-based distributed systems.
Hands-on experience building or maintaining monitoring stacks such as Prometheus, Grafana, ELK, etc.
Proficiency in Python, Bash, or similar scripting languages.
Experience with Infrastructure as Code tools (Terraform, Helm, etc.).
Familiarity with CI/CD tools (e.g., GitHub Actions, Jenkins, ArgoCD).
Solid analytical and problem-solving skills with a passion for operational excellence.
Exposure to AI-based tooling (e.g., OpenAI API, LLM-based bots) to automate operations or enhance incident response processes.
Nice to Have
Experience with incident management platforms (e.g., PagerDuty).
Security-minded mindset and experience in the cybersecurity industry.
Experience with service mesh, zero-downtime deployments, or chaos engineering.
Contributions to AI-assisted SRE initiatives or platform operations & monitoring automation.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8257631
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
Required Site Reliability Engineer- Infra
Realize your potential by joining the leading performance-driven advertising company!
As a Site Reliability Engineer- infra, on our Infrastructure team at the TLV office, you will play a key role in ensuring the reliability, scalability, and performance of our critical systems. You will be responsible for managing and improving our core infrastructure, with a focus on automation, monitoring, and incident response. You will work with a wide range of technologies, including Kubernetes, monitoring and observability tools, configuration management systems, and core networking services.
How youll make an impact:
As a Site Reliability Engineer, youll bring value by:
Ensure the reliability, availability, and performance of our infrastructure services.
Manage and maintain our Kubernetes infrastructure, including KubeVirt.
Design, implement, and maintain our monitoring and observability stack (SensuGo, VictoriaMetrics, Prometheus, ELK).
Automate infrastructure provisioning, configuration, and deployment processes using Puppet and Ansible.
Manage and maintain core services such as DNS and networking.
Troubleshoot and resolve complex infrastructure issues in a timely and efficient manner.
Participate in on-call rotations and incident response.
Develop and maintain infrastructure-as-code (IaC).
Identify and implement proactive measures to prevent incidents and improve system reliability.
Collaborate with development teams to ensure smooth and reliable deployments.
Contribute to the design and implementation of new infrastructure solutions.
Drive improvements in system architecture, processes, and tools.
Mentor and coach other team members.
Requirements:
5+ years of experience in a Site Reliability Engineering, Systems Engineering, or similar role.
Deep understanding of Site Reliability Engineering principles and practices.
Extensive experience with Kubernetes, including deployment, management, and troubleshooting.
Strong experience with monitoring and observability tools such as SensuGo, Zabbix, VictoriaMetrics, Prometheus, and ELK.
Proficiency in configuration management tools such as Puppet and Ansible.
Solid understanding of Linux internals and networking.
Experience with managing and maintaining core services such as DNS and networking.
Strong programming skills in Python and/or Go.
Experience with both on-premises and cloud environments.
Experience with KubeVirt.
Excellent troubleshooting and problem-solving skills.
Strong communication and collaboration skills.
Ability to work in a fast-paced, dynamic environment.
Ability to participate in on-call rotations including weekends.
Preferred Qualifications:
Experience with large-scale, distributed systems.
Experience with other cloud providers (e.g., AWS, Azure, GCP).
Contributions to open-source projects.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8272676
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
14/07/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
We are looking for a Site Reliability Engineer (SRE) to join our Engineering team. Someone who has a passion for observability, monitoring, automation, and high-availability systems, and who has a desire to solve complex technological challenges with a proactive approach to continuous improvement.
We use an interesting and mixed technology stack: Kubernetes, Terraform, CI/CD pipelines, Datadog, Prometheus, and cloud-native architectures.
In this position, you will use your expertise in building and scaling SRE operations, and will design, implement, and operate a world-class reliability strategy.
About Us
we are a key player the network security field, striving to provide the leading SASE platform in the market. Our innovative approach, merging cloud and on-device protection, redefines how businesses connect in the era of cloud and remote work.
Key Responsibilities
Develop and maintain our monitoring, alerting, and logging systems, ensuring high visibility into production environments.
Implement automation to improve system reliability, scalability, and efficiency.
Troubleshoot and resolve production incidents, leading root cause analyses and implementing permanent fixes.
Collaborate with software engineers and DevOps teams to enhance application performance and resilience.
Continuously improve operational processes, focusing on reducing toil and improving reliability.
Requirements:
3+ years of experience as an SRE, DevOps Engineer, or in a similar role.
Hands-on experience with monitoring and observability tools like Datadog, Prometheus, and Grafana.
Strong understanding of Linux systems, networking, and cloud-native architectures.
Experience with Kubernetes, Terraform, and CI/CD pipelines.
A problem solver, capable of finding creative solutions and getting things done.
Fluent with incident management, RCA processes, and operational best practices.
It would be great if you also have:
Experience in high-scale distributed systems.
Background in security and compliance for cloud infrastructure.
Familiarity with AWS (EKS, EC2, RDS, S3, networking configurations).
Proficiency in Python, Go, or Bash for automation and scripting.
Understanding of cost optimization and resource management in cloud environments.
Familiarity with machine learning or predictive analytics for proactive reliability management.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8258448
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
15/07/2025
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
We are looking for a Site Reliability Engineering (SRE) & Production Team Leader to join our Engineering team. Someone who has a passion for observability, monitoring, automation, and high-availability systems, and who has a desire to solve complex technological challenges with a proactive approach to continuous improvement.
We use an interesting and mixed technology stack: Kubernetes, Terraform, CI/CD pipelines, Datadog, Prometheus, and cloud-native architectures.
In this position, you will use your expertise in building and scaling SRE operations, and will design, implement, and operate a world-class reliability strategy.
About Us
we are a key player the network security field, striving to provide the leading SASE platform in the market. Our innovative approach, merging cloud and on-device protection, redefines how businesses connect in the era of cloud and remote work.
Key Responsibilities
Design, build, and manage our SRE framework to ensure observability, resilience, and high availability.
Develop and automate solutions for proactive monitoring, incident response, and performance optimization.
Improve and maintain our alerting and monitoring stack, leveraging tools like Datadog, Prometheus, and Grafana.
Lead post-mortem analysis and implement continuous improvement initiatives.
Collaborate with DevOps, Engineering, and Product teams to ensure smooth and efficient delivery of reliable services.
Requirements:
SRE & Production Manager with 5+ years of experience in SRE, Production Engineering, or DevOps, including 2+ years in a leadership role.
Experience with monitoring and observability tools like Datadog, Prometheus, and Grafana.
A problem solver, capable of finding creative solutions and getting things done.
Fluent with incident management, RCA processes, and operational best practices.
Experience with AWS (EKS, EC2, RDS, S3, networking configurations).
It would be great if you also have:
Experience in high-scale distributed systems.
Background in security and compliance for cloud infrastructure.
Understanding of cost optimization and resource management in cloud environments.
Familiarity with machine learning or predictive analytics for proactive reliability management.
Proficiency in Python, Go, or Bash for automation and scripting.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8259881
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
15/07/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
our company's Infinity External Risk Management, otherwise known as Cyberint, continuously reduces external cyber risk by managing and mitigating an array of digital threats with a unified solution.
At Cyberint, we help organizations protect their digital presence by delivering cutting-edge Attack Surface Management (ASM) and Threat Intelligence (TI) solutions. As a member of our R&D organization, youll play a key role in ensuring the scalability, reliability, and performance of our cloud-native SaaS platform operating at scale.
Key Responsibilities
As a DevOps Engineer, you will be a core member of our DevOps & Infrastructure team, focused on building and maintaining distributed, scalable, and highly available systems in a dynamic SaaS environment. You will collaborate closely with development, QA, and support teams to enhance automation, improve CI/CD pipelines, and drive operational excellence across the board.
Key Responsibilities:
Design, build, and maintain infrastructure in a modern cloud-native SaaS ecosystem (primarily AWS).
Contribute to the scalability and reliability of distributed systems supporting high-volume data processing and real-time operations.
Develop and enhance CI/CD pipelines to support rapid and reliable deployments across multiple environments.
Implement and manage Infrastructure as Code (IaC) using Terraform for consistent, scalable infrastructure.
Operate and optimize Kubernetes (EKS) clusters to support distributed microservices architectures.
Monitor and respond to system alerts, troubleshoot issues, and contribute to incident prevention and response strategies.
Build self-service tools and automation frameworks to empower R&D teams and enhance delivery velocity.
Work cross-functionally with developers, QA, and support to ensure infrastructure meets evolving product needs.
Write and maintain scripts (Python, Bash) to automate recurring tasks and streamline operations.
Continuously identify and execute improvements in system performance, availability, and cost-efficiency.
Requirements:
Experience:
25 years of experience in DevOps, SRE, or infrastructure engineering roles, working with distributed systems and SaaS applications.
Hands-on experience with public cloud providers (AWS strongly preferred).
Production experience with tools such as Kubernetes, Terraform, CI/CD platforms (Jenkins, ArgoCD), and monitoring systems (Prometheus, Grafana).
Skills:
Solid grasp of Infrastructure as Code principles and best practices.
Strong knowledge of distributed systems, microservices, and orchestration technologies.
Proficiency in scripting (Python, Bash) for automation and tooling.
Familiarity with logging and monitoring stacks (e.g., Elasticsearch, Redis, CloudWatch, Grafana, Prometheus).
Awareness of DevOps security practices and cloud cost optimization strategies.
Mindset & Traits:
A strong sense of ownership and accountability for system health and performance.
Passion for automation, self-service, and continuous improvement.
Excellent communication and collaboration skills.
Comfortable working in fast-paced SaaS environments with cross-functional teams.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8259928
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
we are looking for an AI Platform Engineer to revolutionize engineering productivity and drive AI-powered innovation across the organization. This role is part of our Platform Engineering initiative, focused on enabling engineers and product teams with cutting-edge AI tools, automation, and scalable AI platforms.
If youre an innovator at heart, passionate about leveraging AI to solve complex problems, and thrive in an environment where experimentation and entrepreneurship are encouragedthis role is for you.
Responsibilities:
 Develop and integrate AI-powered solutions to enhance engineering productivity, automate workflows, and improve developer efficiency.
Evaluate and implement state-of-the-art AI platforms (ChatGPT, Claude, Gemini, custom models) to solve engineering and operational challenges.
Design and maintain AI-driven internal platforms such as knowledge management systems, AI-enhanced coding assistants, intelligent automation tools, and AI-powered chatbots.
Work alongside engineering, DevOps, and product teams to embed AI into everyday development workflows.
Lead Proof-of-Concept (PoC) projects, experimenting with LLMs, generative AI, and automation frameworks to create tangible business impact.
Stay ahead of emerging AI trends, researching and implementing cutting-edge AI models and tools.
Build scalable AI infrastructure that integrates with cloud environments (AWS, GCP, Azure) and engineering toolchains.
Promote a culture of innovation, empowering teams to embrace AI-driven solutions and fostering AI adoption across the organization.
Requirements:
Must:
BSc in Computer Science or related degree, or equivalent practical experience.
4+ years of hands-on experience as a software Engineer
Strong proficiency in Python and experience with AI/ML libraries (e.g., PyTorch, TensorFlow, Hugging Face).
Experience with AI APIs (OpenAI, Anthropic, Google AI, Microsoft AI) and integrating them into engineering workflows
Hands-on experience with developer productivity tools, AI-enhanced automation, and knowledge management systems
Strong problem-solving skills and ability to build AI-powered solutions that engineers love to use.
Nice to Have:
Deep understanding of LLMs, RAG (Retrieval-Augmented Generation), fine-tuning, and prompt engineering.
Experience with data pipelines, embeddings, and vector databases for AI-powered search and automation.
Experience working with cloud-based AI platforms and scalable AI architectures
Knowledge of containerization (Docker, Kubernetes) and CI/CD pipelines for deploying AI applications.
Experience with AI Ops, observability tools, and monitoring AI applications in production.
Familiarity with LangChain, AutoML, MLOps frameworks, and workflow automation tools.
A background in software engineering, DevOps, or developer tooling.
Strong entrepreneurial mindset, able to identify opportunities, move fast, and drive AI adoption in a high-impact environment.
Published research, open-source contributions, or side projects demonstrating innovative AI applications.
Why Join Us?
This role is for builders, problem-solvers, and AI enthusiasts who want to push the boundaries of engineering productivity with AI. Youll be at the forefront of AI-powered platform engineering, shaping how our company leverages AI to supercharge innovation, efficiency, and automation.
If youre looking to build game-changing AI platforms, experiment with cutting-edge tech, and make a real impactjoin us!
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8258471
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
were looking for a DevOps engineer to take the lead on one of the most critical technical challenges in our business: how we deploy software at scale.
Today, deploying our core system into each store is a complex, multi-stage process. Tomorrow, it needs to be seamless, automated, and capable of onboarding dozens of stores per weekwithout sacrificing versatility and quality.
This role isnt just about CI/CD or scripting. Its about refining the automation infrastructure that enables repeatable, self-service deployments across hundreds of live, mission-critical environments. Youll sit at the heart of our Ops Technology team, working at the intersection of system engineering and in-store execution, and serving as the technical backbone for deployment scale.
You'll help shape and evolve our DevOps toolsetworking hands-on with cutting-edge technologies to streamline deployments, boost reliability, and scale our platform with speed and confidence. For the right person, this role is a stepping stone toward technical leadership in one of most strategic teams.
A day in the life:
Design and build the software deployment framework that powers in-store systems at scale
Lead remote deployments using Octopus Deploy, driving repeatability and automation
Set up and manage Kubernetes clusters for scalable microservices in diverse environments
Deploy and monitor services with a focus on resilience, observability, and recovery
Troubleshoot complex issues across CI/CD, environments, services, and infrastructure
Collaborate with System Engineering and SRE peers to ensure smooth end-to-end deployment flows
Continuously evolve our CI/CD pipelines, deployment logic, and infrastructure-as-code practices
Build tooling, templates, and documentation to enable fast, low-touch deployments by others
Serve as a technical leader and force multiplier within the Ops Tech team
Requirements:
3+ years in DevOps, Infrastructure, or SRE roles with deep end-to-end ownership
Solid background in Kubernetes, container orchestration, and microservices
Experience deploying and supporting systems in live production environments
Strong CI/CD skills with tools like GitHub Actions, GitLab CI, or Jenkins
Scripting proficiency in Bash, PowerShell, or Python
Familiarity with monitoring, alerting, and diagnostics (Prometheus, Grafana, etc.)
Experience with infrastructure-as-code tools like Terraform, Helm, etc.
Excellent troubleshooting skills and a bias toward automation and scale
Strong communication and the ability to work cross-functionally and independently
Nice to have:
Experience with Azure and GCP
Understanding of K8s networking, service meshes, ingress controllers
Passion for enabling othersthrough tools, documentation, or mentorship
Proven ability to drive change in fast-moving, operationally complex environments
Expertise with Octopus Deploy and experience deploying software remotely
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8246111
סגור
שירות זה פתוח ללקוחות VIP בלבד