The OpenShift team is looking for a Machine Learning Engineer with experience in building, scaling, and monitoring AI/ML systems to join our rapidly growing engineering team. Our focus is to create a platform, partner ecosystem, and community by which enterprise customers can solve problems to accelerate business success using AI. This is a very exciting opportunity to shape the observability and reliability of GenAI workloads, contribute to the development of the RHOAI product, participate in open source communities, and be at the forefront of the exciting evolution of AI. Youll join an ecosystem that fosters continuous learning, career growth, and professional development.
As a core ML engineer for one of our OpenShift AI teams, you will have the opportunity to design and build systems that monitor, validate, and improve AI model performance in production. You will work as part of an evolving development team to rapidly design, secure, build, test, and release new capabilities. The role is primarily an individual contributor who collaborates closely with other ML engineers, software developers, and cross-functional teams. You should have a passion for observability, MLOps, and building robust systems for real-world AI.
What you will do:
Architect and lead implementation of new features and solutions for RHOAI, focusing on observability, insights, and optimizations for large-scale GenAI workloads running on Kubernetes
Innovate in the MLOps domain by participating in leading upstream communities such as llm-d
Provide technical vision and leadership on critical and high impact projects
Use CI/CD best practices to deliver solutions as productization efforts into RHOAI
Proactively utilize AI-assisted development tools (e.g., GitHub Copilot, Cursor, Claude Code) for code generation, auto-completion, and intelligent suggestions to accelerate development cycles and enhance code quality.
Collaborate with product management, other engineering and cross-functional teams to analyze and clarify business requirements
Collaborate with cross-functional teams to identify opportunities for AI integration within the software development lifecycle, driving continuous improvement and innovation in engineering practices
Contribute to a culture of continuous improvement by sharing recommendations and technical knowledge with team members
Communicate effectively to stakeholders and team members to ensure proper visibility of development efforts
Represent RHOAI in external engagements including industry events, customer meetings, and open source communities
Explore and experiment with emerging AI technologies relevant to software development, proactively identifying opportunities to incorporate new AI
Mentor, influence, and coach a distributed team of engineers.
Requirements: Advanced experience in machine learning engineering, with a focus on production-grade systems
Advanced experience in Kubernetes, OpenShift or other cloud-native technologies
Ability to quickly learn and guide others on using new tools and technologies
Experience with source code management tools such as Git
Proven ability to innovate and a passion for staying at the forefront of technology.
Excellent system understanding and troubleshooting capabilities
Autonomous work ethic, thriving in a dynamic, fast-paced environment.
Technical leadership acumen in a global team environment
Excellent written and verbal communication skills
The following will be considered a plus:
Masters degree or higher in computer science, machine learning, or related discipline
Understanding of how Open Source and Free Software communities work
Experience with development for public cloud services (AWS, GCE, Azure)
Experience working with or deploying MLOps platforms
Demonstrate proficiency in utilizing LLMs (e.g., Google Gemini), as relevant, for tasks such as brainstorming solutions, deep research, summarizing technical documentation, drafting communications.
This position is open to all candidates.