The role Own, design, and evolve our production cloud platform (AWS), Kubernetes, and IaC (Terraform) so teams ship reliably, safely, and fast. 
 What youll do 
* Architect, build, and run resilient, scalable cloud infrastructure; drive AWS Well-Architected pillars (operational excellence, security, reliability, performance, cost, sustainability). 
* Champion GitOps (e.g., Argo CD): declarative configs, PR-driven changes, continuous reconciliation. 
* Implement and evolve CI/CD (GitHub Actions/Argo CD), secrets management, policy-as-code, and environment promotion. 
* Build first-class observability (OpenTelemetry + Prometheus) across apps and infra. 
* Partner with internal and external teams (engineers,  data, vendors, customers) to deliver platform capabilities and service. 
* Lead/mentor engineers; drive Terraform standards, modules, and reviews. 
* Optimize cost and efficiency (FinOps) while maintaining reliability. 
* Define SLOs/SLIs and error budgets; lead incident readiness, response, and post-mortems.  
About us: 
we are the leading provider of security and safety solutions for online experiences, safeguarding more than 3 billion users, top foundation models, and the worlds largest enterprises and tech platforms every day. As a trusted ally to major technology firms and Fortune 500 brands that build user-generated and GenAI products, our company empowers security, AI, and policy teams with low-latency  Real-Time Guardrails and a continuous Red Teaming program that pressure-tests systems with adversarial prompts and emerging threat techniques. Powered by deep threat intelligence, unmatched harmful-content detection, and coverage of 117+ languages, our company enables organizations to deliver engaging and trustworthy experiences at global scale while operating safely and responsibly across all threat landscapes
Requirements:  What you bring (must-haves) 
* 5+ years hands-on DevOps /SRE in large-scale production. 
* Deep production experience: AWS (or major public cloud), Kubernetes, Terraform. 
* Proven ownership: design implement release operate improve (independent and team-based). 
* Excellent communication; comfortable collaborating with external stakeholders. Nice to have 
*  Linux, networking (DNS/HTTP/TCP/IP), and security fundamentals. 
* CI/CD with GitHub Actions/Argo CD; service mesh; policy-as-code. 
* Observability: OpenTelemetry, Prometheus, Grafana. 
* SRE practices (SLOs/error budgets); experience improving DORA-style outcomes. 
* FinOps experience; Python / Node.js ; DynamoDB/MongoDB/OpenSearch.
This position is open to all candidates.