As a DevOps Engineer, you won't just "maintain" infrastructure, you will own it.
You will lead the charge in cloud security automation, using an AI-first mindset to drive extreme efficiency. Your mission is to identify, rectify, and prevent misconfigurations and bottlenecks in advance. You are expected to operate with high autonomy, shaping the future of secure cloud environments by being a proactive force, not a reactive one.
Key Responsibilities:
Innovate & Implement:Design and implement cloud infrastructure solutions with a focus on GCP, including compute reservations, BigQuery, Pub/Sub, GCS, and networking.
Release Engineering: Lead weekly production upgrade cycles across global multi-region environments, including branch-out processes, hotfix management, version gating, and rollback procedures.
Service Deployment & Lifecycle: Own end-to-end service deployments on Kubernetes - from Helm chart creation and Flux/GitOps configuration to production rollout and scaling.
Database Administration:Manage and optimize database infrastructure including MySQL, Redis, BigQuery, Neo4j, Scylla, MongoDB, and PostgreSQL in production environments.
AI Integration: Utilize AI-supporting tools to optimize coding, automate repetitive tasks, and solve complex architectural puzzles. Contribute to AI-native infrastructure such as Vertex AI and AI Gateway services.
Tenant & Customer Infrastructure: Manage customer-specific infrastructure including dedicated compute reservations, tenant provisioning, licensing configuration, and feature flag management across multi-tenant and single-tenant environments.
Infrastructure Automation & Tooling:Develop internal CLI tools and automation scripts (Python) to streamline operations.
Cost Optimization: Drive cloud cost optimization through resource right-sizing, reservation management, database disk reduction, and efficiency improvements.
Service Reliability: Enhance uptime by establishing SLAs, setting up comprehensive monitoring (Prometheus, Grafana, Stackdriver), and participating in the production on-call rotation (including off-hours support).
Requirements: 3+ years in DevOps/SRE with a focus on multi-region production environments.
AI-First Workflow: Must be proficient in using LLM-based agents (Cursor, Claude Code, etc.) for coding and architecture.
Cloud & Containers: Deep expertise in GCP (GKE), including Kubernetes orchestration (HPA, Node Pools) and Terraform for IaC.
Automation: Strong Python/Bash skills and experience with GitOps workflows (Flux/ArgoCD) and CI/CD (GitLab/Jenkins).
Data & Infrastructure: Experience managing production databases (SQL/NoSQL) and standard Linux/Networking troubleshooting.
Nice to have:
Experience with AI-native infrastructure (Vertex AI, AI Gateways).
Observability stacks (Prometheus/Grafana) and managing Multi-tenant SaaS platforms.
Willingness to participate in on-call rotations.
This position is open to all candidates.