We are looking for a hands-on DevOps Tech Lead to join our R&D organization and lead the design, scalability, and reliability of our SaaS platform a large-scale, multi-tenant system fully operated on AWS (EKS, Lambda, S3, RDS, Kafka, Redis, and more).
The DevOps Tech Lead will be responsible for shaping our infrastructure roadmap, optimizing CI/CD, guiding engineers on best practices, and ensuring our system runs securely, efficiently, and at scale while driving automation and intelligent operations, leveraging AI-assisted tools and observability.
Why Join Us:
Be a key player in scaling and modernizing a global cyber intelligence SaaS serving leading enterprises.
Collaborate with top-tier engineers and architects driving automation and intelligent operations.
Take ownership and lead initiatives that directly affect uptime, reliability, and efficiency.
Work in an environment that encourages innovation, experimentation, and adoption of AI and automation in day-to-day operations.
Key Responsibilities
Lead the DevOps domain: define architecture, automation strategy, and reliability goals for the entire R&D organization.
Own infrastructure scalability and performance: ensure our Kubernetes (EKS)-based environments are resilient, efficient, and cost-optimized.
Develop and maintain CI/CD pipelines using GitHub Actions, Jenkins, or ArgoCD to support fast, reliable, and automated delivery.
Drive observability and reliability initiatives: monitor system health via Prometheus, Grafana, and CloudWatch; define metrics, alerts, and SLOs.
Leverage AI/automation tooling (e.g., anomaly detection, alert classification, cost prediction) to enhance monitoring, response, and efficiency.
Manage infrastructure as code (Terraform, Helm, CloudFormation) and enforce IaC best practices.
Collaborate with engineering teams to design infrastructure for new services, improve developer experience, and ensure secure deployments.
Ensure system uptime and production readiness: lead root cause analysis, incident response, and capacity planning.
Mentor DevOps engineers on cloud architecture, observability, and automation excellence.
Continuously evaluate emerging technologies, including AI-driven ops tools, to improve scalability, reliability, and delivery velocity.
Requirements: Must-Have:
5+ years of experience as a DevOps / SRE / Infrastructure engineer, with at least 2 years in a technical leadership role.
Proven experience managing large-scale SaaS systems on AWS (EKS, RDS, Kafka, Redis, S3, Lambda, CloudWatch).
Deep understanding of Kubernetes architecture and container orchestration at scale.
Hands-on experience with Terraform, Helm, and CI/CD automation (GitHub Actions, Jenkins, or ArgoCD).
Strong scripting skills in Python, Bash, or Go.
Familiarity with monitoring and alerting tools (Prometheus, Grafana, Loki, ELK).
Experience using or integrating AI-assisted tools (e.g., for observability, auto-remediation, or developer productivity).
Excellent troubleshooting skills and a proactive mindset for reliability and performance optimization.
Nice-to-Have:
Experience in multi-environment / multi-tenant SaaS or cybersecurity / threat intelligence systems.
Knowledge of AI/ML pipelines or AIOps concepts.
Background in cost optimization and FinOps practices.
Familiarity with Kafka scaling, Redis clustering, and AWS service-level tuning.
This position is open to all candidates.