Principal Software Engineer - OpenShift AI

עדכון קורות החיים לפני שליחה

8350004

שירות זה פתוח ללקוחות VIP בלבד

משרות דומות שיכולות לעניין אותך

דיווח על תוכן לא הולם או מפלה

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

1 ימים

Principal Machine Learning Engineer - OpenShift AI Observability

חברה חסויה

Location: Ra'anana

Job Type: Full Time and Hybrid work

The OpenShift team is looking for a Machine Learning Engineer with experience in building, scaling, and monitoring AI/ML systems to join our rapidly growing engineering team. Our focus is to create a platform, partner ecosystem, and community by which enterprise customers can solve problems to accelerate business success using AI. This is a very exciting opportunity to shape the observability and reliability of GenAI workloads, contribute to the development of the RHOAI product, participate in open source communities, and be at the forefront of the exciting evolution of AI. Youll join an ecosystem that fosters continuous learning, career growth, and professional development.
As a core ML engineer for one of our OpenShift AI teams, you will have the opportunity to design and build systems that monitor, validate, and improve AI model performance in production. You will work as part of an evolving development team to rapidly design, secure, build, test, and release new capabilities. The role is primarily an individual contributor who collaborates closely with other ML engineers, software developers, and cross-functional teams. You should have a passion for observability, MLOps, and building robust systems for real-world AI.
What you will do:
Architect and lead implementation of new features and solutions for RHOAI, focusing on observability, insights, and optimizations for large-scale GenAI workloads running on Kubernetes
Innovate in the MLOps domain by participating in leading upstream communities such as llm-d
Provide technical vision and leadership on critical and high impact projects
Use CI/CD best practices to deliver solutions as productization efforts into RHOAI
Proactively utilize AI-assisted development tools (e.g., GitHub Copilot, Cursor, Claude Code) for code generation, auto-completion, and intelligent suggestions to accelerate development cycles and enhance code quality.
Collaborate with product management, other engineering and cross-functional teams to analyze and clarify business requirements
Collaborate with cross-functional teams to identify opportunities for AI integration within the software development lifecycle, driving continuous improvement and innovation in engineering practices
Contribute to a culture of continuous improvement by sharing recommendations and technical knowledge with team members
Communicate effectively to stakeholders and team members to ensure proper visibility of development efforts
Represent RHOAI in external engagements including industry events, customer meetings, and open source communities
Explore and experiment with emerging AI technologies relevant to software development, proactively identifying opportunities to incorporate new AI
Mentor, influence, and coach a distributed team of engineers.

Requirements:
Advanced experience in machine learning engineering, with a focus on production-grade systems
Advanced experience in Kubernetes, OpenShift or other cloud-native technologies
Ability to quickly learn and guide others on using new tools and technologies
Experience with source code management tools such as Git
Proven ability to innovate and a passion for staying at the forefront of technology.
Excellent system understanding and troubleshooting capabilities
Autonomous work ethic, thriving in a dynamic, fast-paced environment.
Technical leadership acumen in a global team environment
Excellent written and verbal communication skills
The following will be considered a plus:
Masters degree or higher in computer science, machine learning, or related discipline
Understanding of how Open Source and Free Software communities work
Experience with development for public cloud services (AWS, GCE, Azure)
Experience working with or deploying MLOps platforms
Demonstrate proficiency in utilizing LLMs (e.g., Google Gemini), as relevant, for tasks such as brainstorming solutions, deep research, summarizing technical documentation, drafting communications.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8349997

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

4 ימים

Senior Principal Software Engineer - AI & Edge

חברה חסויה

Location: Ra'anana

Job Type: Full Time and Hybrid work

The Ecosystems Engineering group is seeking a Senior Principal Software Engineer to join our rapidly growing team. This is a game-changing opportunity to join an open-source AI platform that harnesses the power of hybrid cloud to drive innovation. In this role, you will work with a diverse team of highly talented engineers on designing, implementing, and productizing new AI solutions, with a focus on deep integration of the AI stack, hardware accelerators, and leading OEMs and Cloud Computing Service Providers (CCSPs).
You'll play a critical role in shaping the next generation of hybrid cloud platforms by directly contributing to our innovative AI and Edge products. This is your chance to be at the forefront of AI's exciting evolution, joining an ecosystem that champions continuous learning, career growth, and professional development. You'll also collaborate closely with product management, other engineering teams, and key partners and lighthouse customers.
What You Will Do:
Architect and lead the implementation of new features and solutions for our AI and Edge products.
Explore deep code integration into various products, ensuring optimal integration between our portfolio, hardware accelerators and partners.
Provide technical vision and leadership on critical and high-impact projects, ensuring non-functional requirements including security, resiliency, and maintainability are met.
Integrate software that leverages hardware accelerators (e.g., DPUs, GPUs, AIUs) and perform performance analysis and optimization of AI workloads with accelerators.
Work with major AI and hardware partners such as NVIDIA, AMD, Dell, and others on building joint integrations and products.
Collaborate closely with UX, UI, QE, and cross-functional teams to deliver a great experience to our partners and customers.
Coordinate with team leads, architects, and other engineers on the design and architecture of our offerings.
Become responsible for the quality of our offerings, participate in peer code reviews and continuous integration (CI), and respond to security threats.
Mentor, influence, and coach a distributed team of engineers, contributing to a culture of continuous improvement by sharing recommendations and technical knowledge.

Requirements:
10+ years of relevant technical experience in software development.
Advanced experience working in a Linux environment with at least one language like Golang, Rust, Java, C, or C++.
Advanced experience with a container orchestration ecosystem like Kubernetes, or OpenShift.
Strong experience with microservices architectures and concepts including APIs, versioning, monitoring, etc.
Experience with AI/ML technologies, including foundational frameworks, large language models (LLMs), Retrieval Augmented Generation (RAG) paradigms, vector databases, and LLM orchestration tools.
Ability to quickly learn and guide others on using new tools and technologies.
Proven ability to innovate and a passion for staying at the forefront of technology.
Excellent system understanding and troubleshooting capabilities.
Autonomous work ethic, thriving in a dynamic, fast-paced environment.
Technical leadership acumen in a global team environment.
Proficient written and verbal communication skills in English.
The Following is Considered a Plus
Experience with cloud development for public cloud services (AWS, GCE, Azure).
Familiarity with virtualization, networking, or storage.
Background in DevOps or site reliability engineering (SRE).
Experience with hardware accelerators (e.g., GPUs, FPGAs) for AI workloads.
Recent hands-on experience with distributed computation, either at the end-user or infrastructure provider level.
Experience with performance analysis tools.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8345118

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

1 ימים

Principal Solution Engineer - AI Agents & Ecosystems

חברה חסויה

Location: Ra'anana

Job Type: Full Time and Hybrid work

We are looking for a Principal Solution Engineer to join our Ecosystem Engineering team, focusing on the optimization of AI/ML models inference and serving. In this role, you will identify, build, and optimize emerging use-case patterns within the AI and vertical industries. These patterns will leverage our products, partner offerings, and open source projects deployed on our portfolio, with a specific emphasis on model serving, inference, and MLOps workflows. You will collaborate closely with our engineering and product management teams, using insights gained from interactions with partners and customers to influence product adoption and development. As part of a geographically distributed team, you will engage with multiple engineering teams and open source communities globally. Success in this role requires strong motivation, curiosity, a passion for problem-solving, and hands-on experience with Linux technologies and open source.
What you will do:
Identify emerging patterns for applying our offerings to business problems
Discover and describe what differentiates our solutions from competitive alternatives by working directly with key partners and customers who deploy and operate our solutions
Create reference architectures for optimized AI workloads running on/with our portfolio
Create and conduct on-demand demo labs, providing an initial practical experience for key partners and customers
Provide technical vision and leadership on critical and high impact projects
Communicate and promote the results of reference architectures globally through publication of blogs and speaking at webinars and conferences
Contribute to a culture of continuous improvement by sharing recommendations and technical knowledge.

Requirements:
Passionate about technology and continuous learning; an innovator with the ability to quickly master new tools and technologies and work independently.
Technical leadership acumen
Advanced experience with Kubernetes, OpenShift, or other cloud-native technologies
Experience with AI and Machine Learning platforms, tools, and frameworks, such as LangChain/LangGraph, PyTorch, vLLM, MCP, and Kubeflow
Ability to work on your own in a fast-paced, ever-changing environment
Excellent written and verbal communication skills, with proven experience in technical leadership, publishing content, or presenting at industry events
Passion for open source and community-based software development models
Bachelor's degree in a technical field or equivalent experience
The following will be considered a plus:
Bachelor's degree in statistics, mathematics, computer science, operations research, or a related quantitative field, or equivalent expertise; Masters or PhD is a big plus
Experience in a customer-facing role, such as: solution architecture or consulting, focused on deploying complex AI/ML solutions on Kubernetes.
Advanced development experience in Python or Go.
Familiarity with AI/ML services across major public clouds (AWS, Azure, GCP) and/or hardware accelerators (CUDA, ROCm).
Experience working with automation tools/frameworks (e.g., Ansible, GitOps) and with MLOps/LLMOps platforms
Knowledge and interest in developing tools and solutions using Agentic workflows.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8350012

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

1 ימים

Principal Software Engineer - Telco Partner Architecture

חברה חסויה

Location: Ra'anana

Job Type: Full Time and Hybrid work

Our Global Engineering organization is looking for a Principal Software Engineer to join the Telco Partner Architecture team in Telco Engineering. You will be part of a team responsible for designing and implementing the container platform for 5G telecommunication networks, contributing to industry-leading technologies in the Kubernetes and Telecom ecosystem with Telco Partners in EMEA or APAC, for example CNCF projects, and O-RAN.
As a part of a geographically distributed team, you will work with multiple teams and open source communities around the globe.
To be successful in this role, you will need to have motivation, curiosity, passion for problem solving, and experience with Linux technologies, Kubernetes and open source development models. We can hire you in any EMEA or APAC country where we have a legal presence.
What you will do
Define, contribute to, and collaborate with Telco partners on Blueprints and system architectures pairing platforms with Partner technology
Establish long-term technical relationships with key Telco partners, gathering and analyzing partner requirements and use cases to deliver meaningful business outcomes
Play an active hands-on role in researching and then architecting various OpenShift and Telco specific features into a unified solution, proactively test the involved technologies, experiment and provide demonstrations
Collaborate across teams (Product Management, Engineering, QE, Consulting, Support) to influence future directions and be an advocate for feature development and support needed for strategic partners next generation offerings
Evangelize the team's work through blogs, web postings, or conference talks
Collaborate with cross-functional teams to identify opportunities for AI integration, driving continuous improvement and innovation in engineering practices.

Requirements:
Strong architectural experience developing solution designs / reference implementations from concept to delivery
Telco specific experience and knowledge in Edge designs, High Availability, Hybrid Cloud, NFV architecture and containerized workload characteristics
Understanding of how open source and free software communities work
Experience working with Partners to develop and implement new technologies, with an ability to adapt and quickly learn
Hands on Kubernetes and/or OpenShift technologies experience - 2+ years
Comfortable working on complex multidisciplinary problems and bringing together a diverse set of technical options to a clear path forward.
Experience with Linux system programming in a distributed telecom environment, or experience and designing and integrating distributed systems in a telecom environment
Excellent written and verbal communication skills in English
The following are considered a plus:
7+ years of experience in a Linux environment with at least one of the following languages: Golang, Python, Java, or C/C++. Leading or contributing to open source communities or being an open source maintainer.
Experience with cloud-native design principles especially in the context of container technologies (docker, cri-o) and workloads (CNFs) on kubernetes
System and performance engineering analysis and a proven track record of unlocking performance in constrained environments, including latency sensitive workload tuning (IEQ lines, CPU pinning, NUMA affinity, etc)
Comfortable using Gen AI as a productivity enhancement and for building tooling and automations
Strong experience with automation tools and/or scripting languages (Ansible, bash, python)
Certified RHEL/Kubernetes Administrator.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8350041

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

4 ימים

Principal Machine Learning Engineer GenAI Benchmarking & Validation Infrastructure

חברה חסויה

Location: Ra'anana

Job Type: Full Time and Hybrid work

Required Principal Machine Learning Engineer GenAI Benchmarking & Validation Infrastructure
The Principal Machine Learning Engineer GenAI is responsible for hands-on design, development, and operation of large-scale systems and tools for AI model benchmarking, optimization, and validation.
Unlike traditional ML Engineers focused mainly on training models, this role centers on building, running, and continuously improving the infrastructure, automation, and services that enable rigorous, repeatable, and production-grade model evaluation at scale.
This is a hands-on principal role that combines strategic technical leadership with active engineering execution.
You will own the architecture, implementation, and optimization of benchmarking and validation capabilities across our AI ecosystem. This includes architecting Validation-as-a-Service platforms, delivering high-performance benchmarking pipelines, integrating with leading GenAI frameworks, and setting industry standards for model evaluation quality and reproducibility.
The role demands deep GenAI domain expertise, architectural foresight, and direct coding involvement to ensure evaluation platforms are flexible, extensible, and optimized for real-world, large-scale use.
What you will do
Architect and lead scalable benchmarking pipelines for LLM performance measurement (latency, throughput, accuracy, cost) across multiple serving backends and hardware types.
Build optimization & profiling tools for inference performance, including GPU utilization, memory footprint, CUDA kernel efficiency, and parallelism strategies.
Develop Validation-as-a-Service platforms with APIs and self-service tools for standardized, on-demand model evaluation.
Integrate and optimize model serving frameworks (vLLM, TGI, LMDeploy, Triton) and API-based serving (OpenAI, Mistral, Anthropic) in production environments.
Establish dataset & scenario management workflows for reproducible, comprehensive evaluation coverage.
Implement observability & diagnostics systems (Prometheus, Grafana) for real-time benchmark and inference performance tracking.
Deploy and manage workloads in Kubernetes (Helm, Argo CD, Argo Workflows) across AWS/GCP GPU clusters.
Lead performance engineering efforts to identify bottlenecks, apply optimizations, and document best practices.
Stay ahead of the GenAI ecosystem by tracking emerging frameworks, benchmarks, and optimization techniques, and integrating them into the platform.

Requirements:
Advanced Python for ML/GenAI pipelines, backend development, and data processing.
Kubernetes (Deployments, Services, Ingress) with Helm for large-scale distributed workloads.
Deep expertise in LLM serving frameworks (vLLM, TGI, LMDeploy, Triton) and API-based serving (OpenAI, Mistral, Anthropic).
GPU optimization mastery: CUDA, mixed precision, tensor/sequence parallelism, memory optimization, kernel-level profiling.
Design and operation of benchmarking/evaluation pipelines with metrics for accuracy, latency, throughput, cost, and robustness.
Experience with Hugging Face Hub for model/dataset management and integration.
Familiarity with GenAI tools: OpenAI SDK, LangChain, LlamaIndex, Cursor, Copilot.
Argo CD and Argo Workflows for reproducible ML orchestration.
CI/CD (GitHub Actions, Jenkins) for ML workflows.
Cloud expertise (AWS/GCP) for provisioning, running, and optimizing GPU workloads (A100, H100, etc.).
Monitoring and observability (Prometheus, Grafana) and database experience (PostgreSQL, SQLAlchemy).
Nice to Have
Distributed training across multi-node, multi-GPU environments.
Advanced model evaluation: bias/fairness testing, robustness analysis, domain-specific benchmarks.
Experience with OpenShift/RHOAI for enterprise AI workloads.Benchmarking frameworks: GuideLLM, HELM (Holistic Evaluation of Language Models), Eval Harness.
Security scanning for ML artifacts and containers (Trivy, Grype).

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8345088

שירות זה פתוח ללקוחות VIP בלבד