We are looking for a seasoned leader to build and manage our Quality and Reliability domain, overseeing QA, SRE functions, and our 24/7 operational center. This pivotal role is essential for upholding the quality, robustness, and resilience of our SaaS solutions throughout the software development lifecycle (SDLC) and while in use by our customers. By working collaboratively with developers, product managers, and other key stakeholders, this leader will drive initiatives that enhance quality and resilience, prevent incidents, and facilitate the swift remediation of issues. The ideal candidate will have a blend of technical expertise and managerial skills and experience.
As the Head of Quality and Reliability, you will:
Formulate and guide a comprehensive strategy for quality and reliability, in collaboration with stakeholders from R&D, Product, Support, and Customer Success, ensuring alignment with the companys goals and plans.
Develop and enforce quality standards and key performance indicators (KPIs) in collaboration with Engineering to guarantee that our SaaS products are both high-quality and reliable.
Implement standardized methods, tools, and automated processes to bolster platform and service quality and reliability. This includes strategies for issue prevention, identification, and remediation, as well as practices for debriefing and correction.
Manage and mentor the QA, NOC, and SRE teams spread across Israel and Ireland, establishing objectives and ensuring efficient and effective operations. Additionally, you will supervise the professional growth of both managers and staff within these teams.
Establish and manage the SRE domain, which involves proactive monitoring, establishing fault tolerance, and formulating disaster recovery strategies. Additionally, execute tools and methods to elevate service and system reliability, scalability, and performance.
Design monitoring strategies and implement tools for proactive early detection and incident management. Produce "war room" playbooks for detailed monitoring routines and responsive incident management in both production and demonstration environments.
Oversee and elevate the quality and reliability of the companys solutions consistently while promoting a culture of continuous improvement.
Requirements: 15 years of experience in the high-tech/SaaS industry in QA and/or SRE domains, with at least 5 years as a manager.
A strong background in resilience engineering practices and principles, preferably in a SaaS context.
A Bachelors degree in Computer Science, Engineering, or a related technical field.
Familiarity with Continuous Integration and Continuous Deployment (CI/CD) tools and best practices.
The ability to lead cross-organization initiatives through collaboration, excellent project management skills, and change management practices.
Knowledge and experience with different testing automation and monitoring frameworks and tools (an advantage).
Experience with AWS cloud, Java, and Python.
A high sense of ownership and accountability.
Excellent spoken and written English and Hebrew expression.
This position is open to all candidates.