As a Site Reliability Engineer on the SASE Platform team, you will play a critical role in building and operating highly available, secure, and globally distributed services. Your mission is to ensure our cloud-native security and networking platform is reliable, scalable, and performant from day one, protecting the users, applications, and data for the world's largest enterprises as they adopt cloud, remote work, and AI
Your Impact:
Proactively collaborate with development teams to embed reliability, scalability, and operability into services from the earliest design stages.
Design, review, and evolve cloud-native architectures to improve availability, performance, cost efficiency, and fault tolerance.
Build and operate automation for provisioning, deploying, and managing global infrastructure using Infrastructure as Code (IaC).
Improve CI/CD pipelines and release processes to enable safe, fast, and repeatable deployments.
Drive observability best practices, including metrics, logs, traces, and SLIs/SLOs to enable data-driven incident analysis.
Participate in on-call rotations, reducing mean time to resolution (MTTR) through automation and proactive reliability improvements.
Challenge existing processes by championing reliability, security, and operational maturity across the organization.
Requirements: Your Experience
5+ years of experience working with Unix/Linux systems, including shell, tools, networking, and kernel concepts.
2+ years of hands-on experience with microservices architectures running on Kubernetes and container platforms.
Proven experience operating workloads in public cloud environments (e.g., AWS, GCP, Azure) at scale.
Proficiency in building automation and tools in at least one scripting or programming language (e.g., Python, Go, Java).
Strong experience with Infrastructure as Code (IaC) tools such as Terraform or Ansible.
Bachelors degree in Engineering, Computer Science, or a related technical field, or equivalent practical experience.
Nice to have:
Deep expertise in designing and operating monitoring, alerting, and observability systems (e.g., Prometheus, Grafana, ELK Stack).
Advanced networking expertise, including TCP/IP, DNS, BGP, routing, and cloud networking concepts relevant to SASE architectures.
Prior experience operating or supporting SASE, SD-WAN, Zero Trust, or network security platforms.
Familiarity with using AI/LLM technologies to improve operational workflows (e.g., incident analysis, automation).
This position is open to all candidates.