We are seeking a results-driven Platform Engineering Team Lead to lead the DevOps and Infrastructure team within our R&D organization.
Job Id: 22175
This role requires strategic vision, technical expertise, and a proactive approach to drive operational excellence, empower teams with robust tools and automation, and ensure high system reliability and scalability.
As the Platform Engineering Team Lead, you will be instrumental in delivering critical KPIs, including system uptime, automation, incident management, and collaboration with development and QA teams to enable self-sufficiency. Additionally, you will serve as the leader for strategic projects, identifying opportunities to improve infrastructure and operational processes, setting long-term goals, and executing initiatives that align with our business objectives and growth.
Key Responsibilities:
Strategic Leadership
Identify and lead strategic projects to enhance our platform scalability, reliability, and operational efficiency.
Develop and execute a roadmap for critical infrastructure and DevOps initiatives that drive business success.
Collaborate with senior stakeholders to align projects with organizational priorities and deliver measurable outcomes.
System Reliability & Uptime
Lead initiatives to ensure system reliability, minimize disruptions, and maintain high availability for our SaaS platform.
Establish and manage proactive monitoring, alerting, and preventive maintenance strategies.
Drive incident prevention efforts, ensuring robust failover and disaster recovery mechanisms.
Develop and maintain playbooks to enable rapid diagnosis and resolution of issues.
Automation, Infrastructure as Code (IaC), & Self-Service Enablement
Champion the adoption of automation and IaC to streamline infrastructure management and deployments.
Build and enhance self-service tools and frameworks, empowering R&D teams to operate independently with minimal reliance on DevOps.
Continuously improve CI/CD pipelines to optimize deployment speed and reliability.
Collaboration & Support for Self-Sufficiency
Collaborate closely with development, QA, and support teams to deliver tools and frameworks that promote team autonomy and efficiency.
Advocate for cross-functional engagement to align operational processes with R&D objectives.
Provide training and mentorship to teams on using DevOps tools effectively.
Accountability, Ownership, & Scalability
Take ownership of all systems and infrastructure, ensuring solutions are scalable, resilient, and aligned with our growth objectives.
Establish clear accountability frameworks for maintaining infrastructure and delivering on key projects.
Design and execute a roadmap to support self-service-oriented and scalable solutions.
דרישות:
Experience
5+ years of experience in DevOps or SRE roles, with 2+ years in a leadership capacity.
Proven expertise in building and maintaining highly available, cloud-native environments (AWS preferred).
Experience with Kubernetes, Terraform, CI/CD pipelines, and monitoring technology and tools (Prometheus, Grafana, Jenkins, ArgoCD, Terraform, Elasticsearch, Redis, EKS, etc.).
Skills & Expertise
Strong understanding of automation, Infrastructure as Code (IaC), and self-service enablement.
Expertise in incident management and a track record of delivering reliable, scalable systems.
Hands-on experience with scripting and automation tools (Python, Bash).
Deep understanding of containerization, orchestration, and cloud-native architectures.
Familiarity with cost monitoring and optimization strategies to ensure infrastructure is both efficient and cost-effective.
Knowledge of security best practices for infrastructure and DevOps environments.
Leadership & Collaboration
Demonstrated ability to lead technical teams, manage priorities, and deliver high-impact results.
Excellent communication skills to effectively collaborate with stakeholders and align team efforts with organizational goals.#ENG המשרה מיועדת לנשים ולגברים כאחד.