.required Site Reliability Python Engineer
Responsibilities:
Monitor, manage and operate our cloud services including incident management.
Scale our service with required monitoring and alerting capabilities.
Develop tools and automations based on C# .Net and Python to support our operation and growth.
Work closely with R&D to make sure new features are reliable, easily deployable, and support the requirements of the service in terms of scale and security.
Establish a regular operational feedback cycle into our engineering teams.
Manage the Service Operations team to operate with a culture of business and customer-centricity by maintaining our company SLA for each service, including incident response, problem management, and service upgrades.
Develop and drive, as the primary owner, the communication strategy for internal and external stakeholders (including customers) to convey service health, tracking against SLAs, current and historical incidents, upcoming events, or upgrades.
Ensure all technical procedures are documented, reviewed, and updated and actively contribute to the maintenance of operational standards & policies.
Collaborate with the company Support team to understand and improve user experience, performance, incident response, and the serviceability of our offerings.
Collaborate with the internal R&D team to automate infrastructure services and system administration tasks wherever possible and implement a monitoring strategy to provide rapid feedback and diagnostics in the event of a service disruption.
Create relationships with other departments, including Marketing, Product Management, Engineering, and Customer Success, to make sure we provide services with high availability and superior performance for all our customers.
Requirements: At least 3 years of relevant industry experience in maintaining a high-availability production environment (SRE OR Automation).
At least 2 years experience in developing Python applications.
In-depth understanding of the entire web development process (design, development, and deployment)
Substantial experience in operating a high-availability cloud infrastructure.
Quick technology adaptation
Good interpersonal skills
BSc in computer science or a related field
Advantages:
Frontend development (Angular or equivalent)
Experience with designing and coding large scalable systems (multi-threaded and asynchronous development, high-performance processes in SQL)
Experience with Microsoft Azure or other cloud platforms (GCP, AWS)
Experience with Agile development, including CI/CD, and coding for automated testing.
This position is open to all candidates.