דרושים » תוכנה » Senior ML Engineer, AI Infrastructure & Benchmarking

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 9 שעות
Location: Ra'anana
Job Type: Full Time and Hybrid work
The Senior Machine Learning Engineer GenAI is responsible for designing, implementing, and operating large-scale systems and tools for AI model benchmarking, optimization, and validation. Unlike a traditional ML Engineer focused primarily on model training, this role centers on building the infrastructure, automation, and services that enable systematic evaluation and performance tuning of LLMs at scale.

This position combines deep understanding of model serving frameworks, GPU optimization, and benchmarking methodologies with strong software engineering skills to deliver reliable, reproducible, and production-grade evaluation pipelines. The engineer will design and maintain validation-as-a-service platforms that allow internal and external stakeholders to assess models across latency, throughput, accuracy, and cost dimensionsintegrating seamlessly with our AI ecosystem and industry-standard GenAI tooling.

A core aspect of this role is creating a robust, extensible benchmarking and validation framework capable of running across diverse inference engines, hardware configurations, and deployment environments, while providing actionable insights for model selection, optimization, and integration.

What you will do:
Benchmarking Platform Development: Design and implement scalable benchmarking pipelines for LLM performance measurement (latency, throughput, accuracy, cost) across multiple serving backends and hardware types.

Optimization Tooling: Build utilities and automation to profile, debug, and optimize inference performance (GPU utilization, memory footprint, CUDA kernels, parallelism strategies).

Validation-as-a-Service: Develop APIs and self-service platforms for model evaluation, enabling teams to run standardized benchmarks on demand.

Serving Integration: Integrate and operate high-performance serving frameworks (vLLM, TGI, LMDeploy, Triton) with cloud-native deployment patterns.

Dataset & Scenario Management: Create reproducible workflows for dataset preparation, augmentation, and scenario-based testing to ensure robust evaluation coverage.

Observability & Diagnostics: Implement real-time monitoring, logging, and metrics dashboards (Prometheus, Grafana) for benchmark and inference performance.

Cloud-Native Orchestration: Deploy and manage benchmarking workloads on Kubernetes (Helm, Argo CD, Argo Workflows) across AWS/GCP GPU clusters.

Integration with GenAI Tooling: Leverage Hugging Face Hub, OpenAI SDK, LangChain, LlamaIndex, and internal frameworks for streamlined evaluation workflows.

Performance Engineering: Identify bottlenecks, apply targeted optimizations, and document best practices for inference scalability.

Ecosystem Leadership: Track emerging frameworks, benchmarks, and optimization techniques to continuously improve the evaluation platform.
Requirements:
What you ill bring:
Advanced Python for backend development, data processing, and ML/GenAI pipelines.

Kubernetes (Deployments, Services, Ingress) and Helm for large-scale distributed training and inference workloads.

LLM training, fine-tuning, and optimization (PyTorch, DeepSpeed, HF Transformers, LoRA/PEFT).

GPU optimization expertise: CUDA, mixed precision, tensor/sequence parallelism, memory management, and throughput tuning.

High-performance model serving with vLLM, TGI, LMDeploy, Triton, and API-based serving (OpenAI, Mistral, Anthropic).

Benchmarking and evaluation pipelines: dataset preparation, accuracy/latency/throughput measurement, and costperformance tradeoffs.

Multi-model, multi-engine comparative testing for optimal deployment decisions.

Hugging Face Hub for model/dataset management, including private hosting and pipeline integration.

GenAI development tools: OpenAI SDK, LangChain, LlamaIndex, Cursor, Copilot.

Argo CD & Argo Workflows for reproducible, automated ML pipelines.

CI/CD (GitHub Actions, Jenkins) for ML lifecycle automation.
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8381758
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
2 ימים
Location: Herzliya and Ra'anana
Job Type: Full Time and May Be Suitable For Moms
Were looking for a hands-on developer whos truly immersed in AI building custom chats, creating intelligent agents, and integrating with APIs. This is a role that requires sharp judgment, creativity, and a passion for exploring one of the most exciting and fast-growing fields out there.
Role Summary:

Develop and implement custom chat solutions using knowledge files and APIs.
Combine prompt engineering techniques with real-time integrations from reliable data sources.
Responsibilities:

Build tailored chat experiences around specific use cases.
Integrate models with trusted external APIs.
Design and validate Function/Tool Calling workflows.
Write prompts and output templates to reduce hallucinations.
Plan and implement guardrails and security measures.
Conduct A/B testing, regression testing, and ongoing monitoring.
Document processes and provide training to teams.
Requirements:
2+ years of software development experience.
Proficiency in JavaScript/TypeScript and at least one other major language (Python/Java).
Experience building chat solutions with LLMs (Custom GPTs / Assistants).
End-to-end API integration experience.
Strong understanding of prompt engineering patterns.
Hands-on experience with at least one model provider (OpenAI, Anthropic, Google, Cohere, etc.).
Working with real-time data from APIs.
Nice to Have:

Experience with GPT Actions / Assistants API.
Knowledge of OAuth2/OIDC and API security best practices.
Experience implementing guardrails and content filtering.
Writing automated tests for answer quality.
Cloud and serverless experience.
This position is open to all candidates.
 
Show more...
הגשת מועמדות
עדכון קורות החיים לפני שליחה
8301417
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
05/10/2025
Location: Ra'anana
Job Type: Full Time and Hybrid work
Were looking for a Principal Software Engineer to join our AI Catalyst Platform team as a key enabler for rapid AI prototyping and operational efficiency. This role focuses on providing technical and operational support to the AI Catalyst team, which explores upstream AI initiatives and drives innovation across the organization.

Youll help accelerate the development of AI prototypes by ensuring seamless platform integration, CI/CD pipelines, and other critical infrastructure to enable high-speed experimentation and iteration.

What youll do:

Platform Support and Optimization: Design and maintain scalable, secure, and efficient platforms to support AI Catalyst team initiatives, ensuring smooth integration of AI models and workflows.

Infrastructure Management: Provide expertise in Kubernetes and cloud platforms (GCP, AWS, Azure) for container orchestration, scalable deployments, and real-time operations.

Partner with the AI Catalyst team to identify bottlenecks, remove blockers, and optimize workflows for faster delivery of AI prototypes.

Technical Leadership: Lead the implementation of critical systems (APIs, orchestration, observability, deployment) to ensure speed, reliability, and maintainability.

Cross-Functional Collaboration: Work closely with engineering, product, and design teams to align technical priorities and drive impactful AI initiatives.

Mentorship: Guide and mentor engineers, fostering a culture of technical excellence, collaboration, and rapid execution.

Demonstrate proficiency in Kubernetes for container orchestration and scalable deployments.

Mentor senior engineers and contribute to a culture of technical excellence, velocity, and pragmatic decision-making.

Proactively utilize AI-assisted development tools (e.g., GitHub Copilot, Cursor, Claude Code) for code generation, auto-completion, and intelligent suggestions to accelerate development cycles and enhance code quality.

Explore and experiment with emerging AI technologies from the open source communities relevant to software development, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.
Requirements:
What youll bring:

10+ years of software engineering experience.

Strong background in Python and background in C, C++, Go or Rust.

Proficiency in RHEL or other Linux distributions.

Good background with MCP and AI agents.

Experience in working with upstream projects and Open Source communities.

Experience in working with upstream projects and Open Source communities.

PoC Experience: Proven ability to work on and deliver successful Proof of Concepts or initiatives, showcasing the ability to rapidly prototype and validate ideas.

Communication Skills: Strong ability to communicate technical tradeoffs and bring clarity to ambiguous situations.

Passion for AI Innovation: Enthusiasm for enabling AI initiatives that drive real-world impact and accelerate prototyping efforts.

Ability to move fast without compromising quality, thriving in environments where rapid iteration and high ownership are the norm.

Nice to have:

Experience with cloud platforms such as GCP, AWS, or Azure.

Experience in early-stage product incubation or 0→1 product delivery.

Contributions to internal AI platforms, model evaluation frameworks, or observability for AI systems.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8365559
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
12/10/2025
חברה חסויה
Location: Ra'anana
Job Type: Full Time
We are seeking a motivated and hands-on IT Software Engineer with 25 years of proven experience in AI development. As part of our IT development team, you will design, build, and deploy AI-powered applications and solutions that drive innovation and improve business productivity. You will work closely with software engineers, business analysts, and business stakeholders to implement AI models, agents, and copilots using modern tools and platforms.
This is a great opportunity for someone passionate about AI, who enjoys building practical solutions and growing their expertise in enterprise-scale environments.
How will you make an impact?
Be an active member of the IT development team, contributing to design, coding, and deployment of AI-driven solutions.
Develop and implement AI agents, copilots, and automation workflows using tools such as Microsoft Copilot Studio, Azure OpenAI, Azure AI Foundry, and Python.
Integrate AI models and components into existing applications, systems, and business processes.
Write clean, maintainable, and efficient code for AI-driven features and applications.
Apply prompt engineering techniques to optimize large language model (LLM) performance.
Collaborate with cross-functional teams (data engineering, product, business) to translate requirements into working AI solutions.
Participate in code reviews, testing, debugging, and deployment of AI applications.
Stay up to date with emerging AI technologies and contribute ideas for their adoption.
Support the monitoring and maintenance of AI models in production, including performance tuning and troubleshooting.
Requirements:
25 years of experience in software development with a focus on AI/ML implementation.
Proven hands-on experience building AI applications, agents, or copilots using Python and platforms such as Microsoft Copilot Studio, Azure OpenAI, Azure AI Foundry, or similar.
Understanding of LLMs, prompt engineering, vector databases, and real-time data integration.
Familiarity with DevOps/MLOps practices for deploying and monitoring AI models.
Strong programming skills in Python (experience with frameworks such as Langchain, Hugging Face, ).
Experience working with APIs, cloud platforms (Azure preferred), and data pipelines.
Ability to work collaboratively in a team and communicate technical concepts clearly.
Passion for AI innovation and eagerness to continuously learn new tools and techniques.
You will have an advantage if you also have:
Bachelors degree in Computer Science, Data Science, AI, or a related field.
Experience integrating AI into business applications (CRM, ERP, productivity tools).
Familiarity with responsible AI and governance practices.
Contributions to open-source AI projects a plus.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8369388
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
2 ימים
Location: Ra'anana
Job Type: Full Time and Hybrid work
Our team, a vital part of the Telco Verification Pillar, is developing and operating a cutting-edge CI/CD service to empower our Testing Teams with enhanced automation capabilities. As a Principal Software Engineer specializing in Test Automation, CI/CD, and DevOps, you will be instrumental in ensuring the quality, reliability, and efficiency of this critical service. You will collaborate closely with cross-functional teams to design, implement, and maintain robust automated testing frameworks, continuous integration/continuous deployment pipelines, and best-in-class DevOps practices.

What you will do:

Design, implement, and manage end-to-end CI/CD pipelines to automate build, test, and deployment processes.

Foster seamless collaboration with development, QA, and operations teams to fully integrate testing and deployment workflows.

Proactively monitor and analyze test results, identify root causes of issues, and work collaboratively with development teams for timely resolution.

Develop and maintain highly scalable and resilient infrastructure as code (IaC) using tools like Ansible or similar industry-standard solutions.

Implement and manage containerization and orchestration solutions using Docker and Kubernetes, with OpenShift knowledge being a significant advantage.

Identify and address test coverage gaps within existing test pipelines, proposing and developing new automated tests in close collaboration with Functional Testing (FT), System Testing (ST), and Solution Lifecycle Management (SLCM) teams.

Ensure the highest standards of security, scalability, and reliability for all CI/CD and DevOps processes.

Continuously research and integrate industry best practices and emerging technologies in test automation, CI/CD, and DevOps.

Act as a mentor and provide technical guidance to junior engineers, fostering a culture of best practices and continuous learning.

Leverage AI-driven insights to optimize test execution, prioritize critical test cases, and reduce overall testing cycles.

Integrate AI/ML models into test automation frameworks to enhance test case generation, anomaly detection in test results, and intelligent defect prediction.
Requirements:
Bachelor's or Master's degree in Computer Science, Engineering, or a closely related technical field.

Proven hands-on experience as a Senior Software Engineer with a strong focus on test automation, CI/CD, and DevOps principles.

Proficiency in modern programming languages such as Python and/or Go.

Extensive experience with leading CI/CD tools such as Jenkins, GitLab CI, CircleCI, or equivalent platforms.

Demonstrated expertise in containerization and orchestration tools, including Docker and Kubernetes.

Solid understanding and practical experience with infrastructure as code (IaC) tools like Ansible or similar.

Exceptional problem-solving abilities and meticulous attention to detail.

Outstanding communication and collaboration skills, with a proven ability to work effectively in cross-functional team environments.

Familiarity with data analysis and visualization techniques for interpreting AI/ML model outputs.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8378019
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
2 ימים
Location: Ra'anana
Job Type: Full Time and Hybrid work
The Ecosystems Engineering group is seeking a Principal Software Engineer to join our rapidly growing team. This is a game-changing opportunity to join an open-source AI platform that harnesses the power of hybrid cloud to drive innovation. In this role, you will work with a diverse team of highly talented engineers on designing, implementing, and productizing new AI solutions, with a focus on deep integration of the AI stack, hardware accelerators, and leading OEMs and Cloud Computing Service Providers (CCSPs).

You'll play a critical role in shaping the next generation of hybrid cloud platforms by directly contributing to our innovative AI and Edge products. This is your chance to be at the forefront of AI's exciting evolution, joining an ecosystem that champions continuous learning, career growth, and professional development. You'll also collaborate closely with product management, other engineering teams, and key partners and lighthouse customers.

What You Will Do:

Architect and lead the implementation of new features and solutions for our AI and Edge products.

Explore deep code integration into various products, ensuring optimal integration between the company`s portfolio, hardware accelerators and partners.

Provide technical vision and leadership on critical and high-impact projects, ensuring non-functional requirements including security, resiliency, and maintainability are met.

Integrate software that leverages hardware accelerators (e.g., DPUs, GPUs, AIUs) and perform performance analysis and optimization of AI workloads with accelerators.

Work with major AI and hardware partners such as NVIDIA, AMD, Dell, and others on building joint integrations and products.

Collaborate closely with UX, UI, QE, and cross-functional teams to deliver a great experience to our partners and customers.

Coordinate with team leads, architects, and other engineers on the design and architecture of our offerings.

Become responsible for the quality of our offerings, participate in peer code reviews and continuous integration (CI), and respond to security threats.

Mentor, influence, and coach a distributed team of engineers, contributing to a culture of continuous improvement by sharing recommendations and technical knowledge.
Requirements:
What You Will Bring:

7+ years of relevant technical experience in software development.

Advanced experience working in a Linux environment with at least one language like Golang, Rust, Java, C, or C++.

Advanced experience with a container orchestration ecosystem like Kubernetes, or Red Hat OpenShift.

Strong experience with microservices architectures and concepts including APIs, versioning, monitoring, etc.

Experience with AI/ML technologies, including foundational frameworks, large language models (LLMs), Retrieval Augmented Generation (RAG) paradigms, vector databases, and LLM orchestration tools.

Ability to quickly learn and guide others on using new tools and technologies.

Proven ability to innovate and a passion for staying at the forefront of technology.

Excellent system understanding and troubleshooting capabilities.

Autonomous work ethic, thriving in a dynamic, fast-paced environment.

Technical leadership acumen in a global team environment.

Proficient written and verbal communication skills in English.

The Following is Considered a Plus
Experience with cloud development for public cloud services (AWS, GCE, Azure).

Familiarity with virtualization, networking, or storage.

Background in DevOps or site reliability engineering (SRE).

Experience with hardware accelerators (e.g., GPUs, FPGAs) for AI workloads.

Recent hands-on experience with distributed computation, either at the end-user or infrastructure provider level.

Experience with performance analysis tools.

Experience with Linux kernel development.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8378022
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Ra'anana
Job Type: Full Time
We are looking for a highly skilled Senior Developer Integration Automation to join our R&D Integration Automation team. This role is focused on designing, developing, and maintaining advanced automation solutions across unit, integration, system, and end-to-end testing levels.
The ideal candidate is a hands-on developer with strong expertise in Python and Pytest, experienced in building and executing automated tests, and comfortable working with modern CI/CD pipelines and automation tools. The Senior Developer will act as a technical mentor and key contributor, ensuring best practices are followed and test automation is seamlessly embedded into the development lifecycle. Knowledge of RF systems is a strong plus.

Key Responsibilities:
Design, implement, and maintain automation frameworks and test scripts using Python and Pytest.
Write unit tests, integration tests, and regression suites to ensure comprehensive test coverage.
Develop and execute system-level and end-to-end automation, including API and UI layers (Selenium preferred).
Contribute to and optimize CI/CD pipelines (Jenkins, Docker, Git) for reliable and scalable test automation.
Collaborate closely with SW developers, RF, QA, and DevOps engineers to embed automated testing into every phase of the product lifecycle.
Perform code reviews, provide technical guidance, and ensure automation code meets high quality and maintainability standards.
Analyze test results, identify root causes of failures, and drive timely resolution in partnership with development teams.
Continuously evaluate and introduce new tools, libraries, and practices to improve automation efficiency.
Support integration activities and ensure smooth interaction between hardware, software, and automation systems.
Requirements:
Qualifications:
Bachelors or Masters degree in Computer Science, Electrical Engineering, or related field.
Strong proficiency in Python (object-oriented and scripting) with practical experience using Pytest.
Proven experience writing and maintaining unit, integration, and regression tests as part of agile development workflows.
Hands-on experience with CI/CD pipelines, Jenkins, Docker, and Git.
Strong problem-solving skills and ability to debug complex integration issues.
Experience working in cross-functional engineering environments (development, QA, DevOps, hardware).

Preferred Skills & Knowledge:
Experience with Selenium for UI automation.
Familiarity with RF testing and measurement techniques (e.g., VSGs, SAs, calibration).
Knowledge of monitoring and logging tools (e.g., ELK) for test infrastructure.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8350845
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Ra'anana
Job Type: Full Time
'we are seeking a highly motivated and skilled Software Engineer to join our Hardware Software team. In this role, you will be responsible for the development and integration of Board Support Packages (BSP) and low-level firmware components for our carrier-grade networking solutions. Carrier-grade routers/switches designed for service providers or data center networks. The systems integrate ASICs and high-throughput backplanes supporting multi-terabit line rates. You will work closely with hardware, platform, and system architects to bring up new hardware platforms and support advanced network functionalities in high-performance environments.
Key Responsibilities:
Develop, integrate, and maintain BSP components, including bootloaders (e.g., U-Boot), device trees, and hardware abstraction layers.
Design and implement firmware and low-level drivers for network-centric hardware platforms (e.g., ASICs, NICs, SoCs, CPLDs, FPGAs).
Support hardware bring-up and board validation, collaborating with hardware engineers and system integrators.
Work on performance optimization, debugging, and stability improvements of system software on embedded Linux platforms.
Interface with third-party SDKs and adapt them to fit within our companys software infrastructure.
Ensure compliance with industry standards and best practices for networking and embedded systems.
Requirements:
Requirements:
BSc or MSc in Computer Science, Electrical Engineering, or related technical field.
8+ years of experience in embedded software development, preferably in the networking or telecommunications industry.
Proficiency in C/C++ for low-level system development.
Strong experience with embedded Linux, bootloaders, kernel configuration, and driver development.
Familiarity with SoC architectures (e.g., ARM, MIPS) and board bring-up procedures.
Hands-on experience with hardware debugging tools (oscilloscopes, JTAG, logic analyzers).
Knowledge of networking protocols and hardware (Ethernet, switching/routing, PHYs) is a strong plus.
Experience with Broadcom SDKs, ONIE, or network operating systems (NOS) is an advantage.
Nice to Have:
Background in data center or service provider environments.
Exposure to high end routers or switches platforms
Why us?
Work on cutting-edge cloud-native networking solutions that scale to the worlds largest networks.
Be part of a fast-paced, innovative team thats transforming the telecom and hyperscale networking space.
Great growth opportunities in a global, technology-driven company.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8351582
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 9 שעות
Location: Ra'anana
Job Type: Full Time and Hybrid work
Join the Workload Availability team as a Software Engineer in Ecosystem Engineering. Your core mission is to safeguard the stability and availability of mission-critical workloads on our OpenShift by developing and integrating robust proactive and reactive remediation mechanisms. You will focus on the complex challenges of node health checking, automated fencing, and self-healing within the cluster, ensuring these critical availability features integrate seamlessly with the diverse ecosystem of third-party hardware, cloud platforms, and infrastructure providers.

What You Will Do

Maintain critical components, controllers, and operators (primarily in Go) responsible for detecting, diagnosing, and automating the recovery of unhealthy cluster nodes on OpenShift.

Become responsible for the quality of our offerings, participate in peer code reviews and continuous integration (CI), and software release process using Konflux CI/CD system.

Develop intelligent node mechanisms to accurately determine a node's operational status and trigger the appropriate self-healing or external remediation actions.

Contribute upstream to projects that focus on Kubernetes-native machine and node remediation to advance the platform's self-healing capabilities.

Troubleshoot and resolve complex cross-ecosystem availability and reliability failures, requiring deep debugging into kernel-level behavior, cloud infrastructure APIs, and Kubernetes control plane logic.

Proactively utilize AI-assisted development tools (e.g., GitHub Copilot, Cursor, Claude Code) for code generation, auto-completion, and intelligent suggestions to accelerate development cycles and enhance code quality.
Requirements:
What You Will Bring:

2+ years of experience working in a Linux environment with at least one language like Golang, Python, Java, or C or C++

Experience with a container ecosystem like Docker, Kubernetes, Red Hat OpenShift.

Excellent analytical and debugging skills, capable of diagnosing failures across operating system, container runtime, and Kubernetes control plane boundaries.

Strong communication and collaboration skills for successful engagement with both internal engineering teams and external ecosystem partners.

The following is considered a plus:

Familiarity with High Availability (HA) concepts, particularly focusing on node fencing (STONITH), cluster remediation, and machine lifecycle management.

Familiarity with the concepts and implementation of the Machine API in Kubernetes/OpenShift.

Familiarity with common fencing protocols or power management interfaces (e.g., IPMI, Redfish, or cloud-specific compute APIs).

Active contributions to upstream Kubernetes, OpenShift, or related cluster lifecycle projects.

Knowledge of storage-related HA concerns, such as the impact of node failure and fencing on data integrity.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8381690
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
2 ימים
Location: Ra'anana
Job Type: Full Time and Hybrid work
We are looking for a Software Engineer with Kubernetes and MLOps (Machine Learning Operations) experience to join our rapidly growing engineering team. Our focus is to create a platform, partner ecosystem, and community by which enterprise customers can solve problems to accelerate business success using AI. This is a very exciting opportunity to build and impact the next generation of hybrid cloud MLOps platforms, contribute to the development of the RHOAI product, participate in open source communities, and be at the forefront of the exciting evolution of AI. Youll join an ecosystem that fosters continuous learning, career growth, and professional development.

As a core developer for one of our OpenShift AI teams, you will have the opportunity to actively participate in one of our component teams as well as the affiliated open-source communities. You will work as part of an evolving development team to rapidly design, secure, build, test, and release new capabilities. The role is primarily an individual contributor who collaborates closely with other developers and cross-functional teams. You should have a passion for working in open-source communities and for developing solutions that integrate us, open-source, and partner technologies into a cohesive platform.

What you will do:

Architect and lead implementation of new features and solutions for RHOAI.

Innovate in the MLOps domain by participating in upstream communities.

Provide technical vision and leadership on critical and high impact projects.

Ensure non-functional requirements including security, resiliency, and maintainability are met.

Write unit and integration tests and work with quality engineers to ensure product quality.

Use CI/CD best practices to deliver solutions as productization efforts into RHOAI.

Contribute to a culture of continuous improvement by sharing recommendations and technical knowledge with team members.

Collaborate with product management, other engineering and cross-functional teams to analyze and clarify business requirements.

Communicate effectively to stakeholders and team members to ensure proper visibility of development efforts.

Give thoughtful and prompt code reviews.

Represent RHOAI in external engagements including industry events, customer meetings, and open source communities.

Mentor, influence, and coach a distributed team of engineers.
Requirements:
What you will bring:

Advanced experience developing applications in Go or Python, or other language.

Advanced experience in Kubernetes, OpenShift or other cloud-native technologies.

Ability to quickly learn and guide others on using new tools and technologies.

Experience with source code management tools such as Git.

Proven ability to innovate and a passion for staying at the forefront of technology.

Excellent system understanding and troubleshooting capabilities.

Autonomous work ethic, thriving in a dynamic, fast-paced environment.

Technical leadership acumen in a global team environment.

Excellent written and verbal communication skills.

The following will be considered a plus:

Masters degree or higher in computer science, machine learning, or related discipline.

Understanding of how Open Source and Free Software communities work.

Experience with development for public cloud services (AWS, GCE, Azure).

Experience working with or deploying MLOps platforms.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8378018
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
13/10/2025
חברה חסויה
Location: Tel Aviv-Yafo and Ra'anana
Job Type: Full Time
We have been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. Its an outstanding legacy of innovation thats motivated by extraordinary technology and amazing people. We are looking for a highly motivated DevOps/SRE engineer to join the AIR team the Digital Twin for Data Center Simulation web application. Our Air enables cloud-scale efficiency by creating identical replicas of real-world data center infrastructure deployments.

What you'll be doing: 

The person will be part of the AIR team that is building the SaaS/IaaS platform for digital twin of AI data centers.

The responsibility specifically is for infrastructure and Site Reliability Engineering (SRE) requirements for AIR.

Focus on efficiency by automating repetitive workflows.

Working on microservices based architecture.

Deploying and troubleshooting non-disruptive cloud operations with an emphasis on secure production infrastructure.

Continuous evaluation of existing system and driving improvements.

Managing deployment/upgrade for Operating Systems, Kubernetes(k8s) clusters and/or or other orchestration tools.

Day to day support for engineering activities with CI/CD tools like git, Jenkins.

Efficiently multi-tasking on the different tracks to efficiently address evolving priorities.
Requirements:
What we need to see: 

BSc in Engineering/ Relevant Certifications/ equivalent experience.

5+ years of experience in complex microservices based architectures.

Proven experience in best practices and discipline of managing and monitoring a highly available and secure production infrastructure.

Experienced with latest Observabilty tools, Prometheous stack, Data Dog, etc

Experienced with modern deployment architecture for non-disruptive cloud operations including blue green and canary rollouts.

Highly skilled in Kubernetes and Docker.

Experience in IaaS environment - deploying, configuring, and administering Linux-based bare metal servers.

Experience with relational databases(MySQL) and SQL.

Expert in AWS.


Ways to stand out from the crowd: 

Skills in Linux/Unix Administration.

Experience with Prometheus/Grafana.

Experience with APM tools like Dynatrace, Datadog, AppDynamics, New Relic, etc.

Implemented robust metrics collection and alerting infrastructure.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8369894
סגור
שירות זה פתוח ללקוחות VIP בלבד