Required Site Reliability Engineer
Herzliya, Israel
תיאור התפקיד
Our world is transforming, and we are leading the way. Our software brings the physical and digital worlds together, enabling companies to improve operations, create better products, and empower people in all aspects of their business.
Our people make all the difference in our success. Today, we are a global team of nearly 7,000 and our main objective is to create opportunities for our team members to explore, learn, and grow all while seeing their ideas come to life and celebrating the differences that make us who we are and the work we do possible.
Your Day-to-Day Work:
Automation
SRE engineers build tools for automation to manage platform operations. Thus, instead of manually performing these functions, their aim is to automate them. Such functions include:
Continuous delivery, and Deployment
Toil reduction, and operations automation
Monitoring
Rapid Incident response
Alerts
Monitoring
SRE engineers are responsible for ensuring that the underlying infrastructure is running smoothly, and that systems and tools are working as expected.
They also monitor critical applications and services to minimize downtime and ensure their availability.
Issue resolution
The engineer works closely with developers, especially when issues arise so they will collaborate with developers to help with troubleshooting and provide consultation when alerts are issued.
This engineer will investigate and then resolve the issue in the event that a developer runs into a problem. Following the incident resolution, the engineer will revisit the issue and determine the cause to ensure it doesnt happen again
Cross team collaboration
Based on the above, SREs work across different teams, mainly operations and development. By building reliable systems and providing support to these teams, this will give these teams more time to divert their attention to building new features and hence get these out faster to customers.
Requirements: Must-Have Skills:
Hands-On experience an SRE gains working with developers and customers for 3-5 years of SRE along with DevOps Skills
Experience building automated monitoring tools like Datadog, Dynatrace, Splunk, SumoLogic etc.
Experience in Logging/Monitoring/Insights: Zabbix, Grafana, Azure Monitoring, Open Telemetry
Knowledge of ITIL, ITSM, Incident Management Tools.
Experienced with tools like, ServiceNow, PagerDuty, CatchPoint, PingDom
Perform Root Cause Analysis and write Runbooks.
Well versed with Application, Platform and Operation security.
Expert knowledge of continuous delivery and deployment of SaaS product is must. The candidate must be proficient and well versed with Deployment Methodologies: Canary, Rolling, B/G, A|B Feature Flag Management etc.
Should have a profound working knowledge of Cloud-native technologies esp. on Azure cloud, and should be abreast of IaaS, PaaS Services with appropriate backup, rollback, HA/DR technologies. Must possess deep working knowledge of infrastructure on IAM solutions, AAD/LDAP, Networking DNS, Firewall, Gateway, Load Balancers, storage etc.
Must be proficient, and skilled in programming/scripting with Java, Go, Python, shell, groovy scripting. Familiarity with DSLs like Yaml is expected.
Practical experience working, and troubleshooting with Oracle.
This position is open to all candidates.