Were looking for Senior Software Engineers, Principal Software Engineers, and Senior Principal Software Engineers to join the Al Platform Core Components team. The AI Platform team helps power our enterprise Al products, like RHEL Al and OpenShift Al, through our high-performing and secure Al-building platform.
This senior role offers opportunities to architect and drive critical components of our AI platform, with a focus on automating and optimizing the integration and deployment of software, ensuring high quality and fast delivery. You will shape the technical direction that powers our enterprise AI products like RHEL AI and OpenShift AI, while mentoring team members and establishing best practices.
What you will do:
Design, implement, and maintain the product delivery pipeline by automating build, test, and deployment processes to enhance efficiency and reliability
Monitor CI/CD processes and performance, identify bottlenecks, and implement improvements
Work closely with Product Management and other engineering teams to ensure smooth deployment cycles and coordinate releases
Integrate automated testing into the delivery pipeline, ensuring code quality and reducing manual testing efforts
Maintain clear and comprehensive documentation for CI/CD processes, guidelines, and best practices
Implement security best practices in the CI/CD pipelines, ensuring compliance with industry standards
Diagnose and resolve issues in the CI/CD pipelines and assist with deployment failures
Coach and mentor junior members of the team
Participate in upstream AI/ML communities with a focus on learning more about the various technologies and how they might be used within our offerings.
Requirements: Hands-on experience with implementing and managing CI/CD pipelines and proficiency in CI/CD tools, such as Jenkins, GitHub Actions, Tekton, or GitLab CI
Deep expertise in developing and architecting applications in Go or Python, and understanding of scripting languages, such as Bash or Groovy
Experience with Kubernetes, OpenShift, Docker, or other cloud-native technologies
Demonstrable experience with implementing and owning complex features individually and in collaboration with others
Problem-solving and troubleshooting skills with a focus on root cause analysis
Experience in agile development, Jira, and Git
Ability to quickly learn and use new tools and technologies
Excellent written and verbal communication skills
The following would be considered a plus:
Experience with cloud platforms, such as AWS, Microsoft Azure, and Google Cloud Platform
Familiarity with infrastructure as code (IaC) tools, such as Terraform or Ansible
Relevant certifications, such as AWS Certified DevOps Engineer or Certified Jenkins Engineer (CJE)
Understanding of open source development models.
This position is open to all candidates.