-Cloud Infrastructure Platform Ownership:
Design, implement, and maintain scalable infrastructure in a multi-account AWS organization
Manage and deploy applications using Helm charts, GitOps workflows using ArgoCD
Support CI/CD pipelines and release processes.
-Observability Reliability:
Implement and maintain monitoring and logging solutions.
Define alerts, dashboards, and SLOs to ensure system health and operational excellence.
-MLOps Enablement:
Support and operate ML infrastructure in AWS
Enable reliable model deployment, monitoring, and lifecycle management.
-DevSecOps Security:
Implement security best practices across infrastructure and CI/CD pipelines
Enforce IAM least-privilege policies and secure networking configurations
Integrate security ownership and compliance controls.
-DevFinOps Cost Optimization
-IT Operational Support
Requirements: -3-5 years of experience working in a DevOps or Platform Engineering role.
-BSc degree in Computer Science, Engineering, or equivalent (required).
-Strong hands-on experience operating in an AWS environment.
Solid understanding of networking fundamentals, including: VPC design, subnets, routing tables, Load- balancers, NAT/Internet gateways.
-Experience maintaining Kubernetes clusters at scale, including managing and deploying Helm charts.
Strong background in observability and monitoring using: Prometheus, Grafana, and Loki and CloudWatch.
-Experience with GitOps workflows and continuous delivery using ArgoCD.
Proficiency with Infrastructure as Code (Terraform / Cloudformation), preferably using AWS CDK ( Python and/or TypeScript).
Core understanding of MLOps principles, including model deployment, monitoring, versioning, and- lifecycle management.
This position is open to all candidates.