We are looking for a Production Engineer to join our existing Devops team. You will work alongside senior engineers to keep our production systems reliable, observable, and operationally sustainable.
Responsibilities
Lead incident response and on-call rotation.
Help define and refine SLOs and SLIs; tune alerts to reduce noise and improve signal quality.
Write and maintain runbooks; contribute to blameless postmortems and follow-up actions.
Build and improve dashboards , giving product teams clear visibility into the health of their services.
Automating repetitive operational work in Python, C#, and Bash.
Support our Kubernetes workloads, Terraform-managed infrastructure, and observability pipelines across our multi-cloud solution.
Collaborate with backend and platform engineers to make new services production-ready.
Requirements: B.Sc. in Computer Science, Software Engineering, or equivalent practical experience.
Comfort with Linux fundamentals and shell environments.
Hands-on scripting experience in Python** and C# or other comparable OOP languages.
Strong understanding of monitoring tools such as Prometheus, OpenTelemetry and Grafana.
Solid understanding of Kubernetes, cloud, and distributed systems.
Strong written and verbal communication; analytical mindset; willingness to participate in on-call after ramp-up.
Working proficiency in English.
Nice to Have
Hands-on experience with Google Cloud.
Exposure to Terraform.
Prior incident response experience.
Familiarity with SRE concepts: SLO / SLI / error budgets, toil, blameless culture.
This position is open to all candidates.