We seek a highly-skilled Senior Site Reliability Engineer to join our team! In this role, you will drive best practices, optimize operational workflows, and mentor junior engineers, fostering a culture of collaboration and innovation. This is an exciting opportunity for someone passionate about building and integrating services and systems that ensure the availability, performance, and reliability of our SaaS environments. You will lead large-scale, cross-functional initiatives, You will work closely with P&E engineering and Cloud teams to design, build, and maintain scalable, resilient infrastructure while championing best practices for automation, monitoring, and incident response. If you're eager to make a significant impact in a fast-paced, high-growth environment, we encourage you to apply.
As a Senior Site Reliability Engineer you will
Lead and groom the team towards technical solutions guided by a strong understanding of the latest and greatest technologies like Kubernetes, Helm, Terraform, and more
Advocate, build, and manage scalable and reliable services and infrastructure to support our SaaS services
Apply SRE best practices, including incident management, performance and capacity planning, and disaster recovery flows
Drive the reliability, performance, and availability of our SaaS products, ensuring service-level objectives are met or exceeded
Design, develop, and manage large-scale systems with CI/CD in mind, to support multiple production environments and use cases
Tackle large-scale production issues and bring out-of-the-box thinking to the table
Evaluate new cloud-native technologies and vendor products to continuously improve our SaaS offering.
Requirements: 5+ years of relevant DevOps or SRE experience in large-scale production environments
2+ years of infrastructure automation, configuration management, or container orchestration using Kubernetes, Docker, Terraform, and Ansible
2+ years in Python or any other advanced programming language
Strong ability to lead, design, and execute cross-organization projects
Experience in managing container and infrastructure orchestration tools (e.g. Kubernetes, Terraform)
Hands-on experience administering public clouds (AWS, GCP, or Azure)
Experience with building CI/CD pipelines for applications and microservices (Jenkins/ArgoCD)
Experience with chaos, alerting & observability tools (Gremlin, PagerDuty, Opsgenie, New Relic, Coralogix).
This position is open to all candidates.