Your Career The SASE Platform team builds and operates highly available, secure, and globally distributed services that protect users, applications, and data for some of the worlds largest enterprises. Our mission is to deliver cloud-native security and networking capabilities that seamlessly converge networking and security at scale. As enterprises accelerate adoption of cloud, remote work, and AI-driven workloads, the need for resilient, observable, and secure SASE platforms has never been greater. As an SRE, you will play a critical role in ensuring our platform is reliable, scalable, performant, and secure from day one. Your Impact As a Site Reliability Engineer, you will be an integral part of the product and platform lifecycle, partnering closely with software engineers, security experts, and infrastructure teams. You will: Collaborate with development teams to embed reliability, scalability, and operability into services from the earliest design stages Design, review, and evolve cloud-native architectures to improve availability, performance, cost efficiency, and fault tolerance Build and operate automation for provisioning, deploying, and managing infrastructure at global scale using Infrastructure as Code Improve CI/CD pipelines and release processes to enable safe, fast, and repeatable deployments Drive observability best practices, including metrics, logs, traces, SLIs/SLOs, and data-driven incident analysis Participate in on-call rotations, continuously reducing MTTR through automation, runbooks, and proactive reliability improvements Mentor and guide engineers on large-scale cloud and SASE deployments, fostering a strong SRE culture Participate in architecture and design reviews, bringing a reliability and operational excellence mindset Champion reliability, security, and operational maturity across the organization.
Requirements: Your Experience Bachelors degree in Engineering, Computer Science, or a related technical field (or equivalent practical experience) 5+ years of experience working with Unix/Linux systems (shell, tools, networking, storage, kernel concepts) 2+ years of hands-on experience with microservices architectures running on Kubernetes and container platforms Strong understanding of distributed systems design, fault tolerance, scalability patterns, and high-availability architectures Experience operating workloads in public cloud environments (AWS, GCP, Azure, or hybrid) at medium to large scale Proficiency in building automation and tools in Python, Java, or similar languages for production environments Strong Infrastructure as Code experience (Terraform, Ansible, Chef, Puppet, or similar) Experience designing and operating monitoring, alerting, and observability systems at scale A tools-first mindset with a passion for reducing toil and increasing engineering efficiency Excellent communication skills and the ability to lead discussions across engineering and security teams Experience applying reliability and security frameworks to design, review, and operate production systems Nice to have: Networking expertise, including TCP/IP, DNS, BGP, routing, load balancing, proxies, VPNs, and cloud networking concepts-especially relevant to SASE architectures Experience operating or supporting SASE, SD-WAN, Zero Trust, or network security platforms Familiarity with AI/LLM technologies, including: Using LLMs to improve operational workflows (incident analysis, alert enrichment, runbooks, automation) Experience integrating AI/ML services into production systems Understanding of reliability, security, and governance considerations for AI-driven services.
This position is open to all candidates.