דרושים » מחשבים ורשתות » SRE Team Leader

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
Were growing and looking to hire SRE Team Leader who embodies our core values: People First, Customer Obsession, Strive for Excellence, and Integrity.
Responsibilities
As an SRE Team Leader, Your impact will be:
Site Reliability Engineering (SRE)
Production Gatekeeper: Design and enforce the rollout strategy for new technologies and oversee their execution to ensure minimal disruption to existing systems.
Production On-Call: Act as the first line of response for critical incidents, assessing issues, triaging, and coordinating with the team to prevent further issues and swiftly restore services.
Monitor Production Performance and Degradation: Keep a close eye on system performance metrics and detect any degradation early to prevent outages and disruptions.
Production Maintenance: Conduct regular infrastructure upgrades to accommodate changes, developments, and advancements in the technological landscape.
Manage Release Flow: Oversee the release of updates and new functionalities, ensuring a seamless transition while handling any potential negative impacts on production.
Staging Management: Oversee the management of the staging environment, ensuring that it accurately represents the production environment for effective testing and simulation.
Network Operations Center (NOC)
Build Playbooks: Develop and maintain comprehensive playbooks for managing system issues and incidents, setting guidelines for troubleshooting, escalation, and resolution processes.
Build Monitoring Dashboards: Design, set up, and maintain monitoring dashboards to visualize and track system performance and incidents in real-time.
Alerts and Incident Management: Establish protocols for issuing alerts in the event of system issues or anomalies and lead the team in incident resolution.
Requirements:
What do you need to succeed in this role?
Proven experience in SRE/DevOps roles (NOC role - advantage) and team management experience
Strong leadership qualities and team management skills.
Tech stack - Jenkins, TF, Ansible, Bash, Python, AWS, Argo
Expertise in system monitoring and incident management tools
Exceptional problem-solving and analytical skills
Excellent written and verbal communication abilities.
A Bachelor's degree in Computer Science, Information Technology, or a related field - Advantage
Familiarity with Agile methodologies.
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8125434
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
02/04/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking an experienced and motivated Backend & DevOps Technical Lead to join our dynamic Site Reliability Engineering (SRE) team. As a Technical Lead you will play a crucial role in enhancing the reliability, performance, and scalability of our systems and services. You will be a part of a global commando team of highly skilled SREs, driving best practices and innovations for optimal system operations, while protecting critical companies systems in a real time.
In this role, you will be responsible for:
Drive incident response and post-mortem processes, fostering a culture of continuous improvement.
Design, build and improve internal tools and automation software to make maintaining production services easier and safer.
Lead reliability-focused practices such as SLO (Service Level Objective) design and implementation, Failure Analysis, Load and Capacity Planning, Service Reviews, Architecture Designs, Incident Postmortems, and others.
Participate in the on-call rotation, providing expertise and support during critical system incidents and ensuring timely resolution.
Requirements:
Minimum 5 years of Software Engineering experience with .Net, NodeJs or other object-oriented languages.
Knowledge of architecture and application design experience.
Excellent troubleshooting and debugging skills.
Excellent verbal and written communication skills in English.
Basic knowledge of AWS or other cloud platforms on the infrastructure level
Preferred:
Experience with building AzureDevops CI/CD pipelines
Experience working on large-scale, high-traffic platforms.
Distributed monitoring experience with logging, metrics and tracing using OpenTelemetry and Prometheus.
Additional scripting languages: bash, powershell, python
Previous experience working as SRE
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8125381
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
27/03/2025
חברה חסויה
Location: Tel Aviv-Yafo and Netanya
Job Type: Full Time
We seek a highly-skilled Site Reliability Engineer to join our team! In this role, you will drive best practices, optimize operational workflows, and mentor junior engineers, fostering a culture of collaboration and innovation. This is an exciting opportunity for someone passionate about building and integrating services and systems that ensure the availability, performance, and reliability of our SaaS environments. You will play a critical role in ensuring the availability, performance, and reliability of our SaaS services and systems. You will work closely with P&E engineering and Cloud teams to build and maintain scalable, resilient infrastructure while championing best practices for automation, monitoring, and incident response. If you're eager to make a significant impact in a fast-paced, high-growth environment, we encourage you to apply.
As a Site Reliability Engineer you will
Support the building and managing of scalable, reliable services and infrastructure to support our SaaS services
Drive the reliability, performance, and availability of our SaaS products, ensuring service-level objectives are met or exceeded
Apply SRE best practices, including incident management, performance and capacity planning, and disaster recovery flows
Adhere to Incident management framework ensuring timely identification, escalation and resolution of incidents
Develop and manage large-scale systems with CI/CD in mind, to support multiple production environments and use cases
Tackle large-scale production issues and bring out-of-the-box thinking to the table
Implement SRE tools, technologies, and methodologies that align with meeting our SaaS uptime & reliability goals.
Requirements:
2+ years of relevant DevOps or SRE experience in large-scale production environments
1+ years of infrastructure automation, configuration management, or container orchestration using Kubernetes, Docker, Terraform, and Ansible
1+ years in Python or any other advanced programming language
Excellent communication, and collaboration skills with an ability to work effectively across globally-distributed teams
Experience in managing container and infrastructure orchestration tools (e.g. Kubernetes, Terraform)
Hands-on experience administering public clouds (AWS, GCP, or Azure)
Experience with building CI/CD pipelines for applications and microservices (Jenkins/ArgoCD)
Experience with Chaos, alerting & observability tools (Gremlin, PagerDuty, Opsgenie, New Relic, Coralogix).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8118181
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
31/03/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
seeking a results-driven DevOps Team Lead to head the DevOps and Infrastructure team within our R&D organization. This role requires strategic vision, technical expertise, and a proactive approach to drive operational excellence, empower teams with robust tools and automation, and ensure high system reliability and scalability.

As the DevOps Team Lead, you will be instrumental in delivering critical KPIs, including system uptime, automation, incident management, and collaboration with development and QA teams to enable self-sufficiency. Additionally, you will serve as the leader for strategic projects, identifying opportunities to improve infrastructure and operational processes, setting long-term goals, and executing initiatives that align with Cyberints business objectives and growth.

Key Responsibilities
Strategic Leadership
System Reliability & Uptime

Lead initiatives to ensure system reliability, minimize disruptions, and maintain high availability for Cyberints SaaS platform.
Establish and manage proactive monitoring, alerting, and preventive maintenance strategies.
Drive incident prevention efforts, ensuring robust failover and disaster recovery mechanisms.
Develop and maintain playbooks to enable rapid diagnosis and resolution of issues.
Automation, Infrastructure as Code (IaC), & Self-Service Enablement

Champion the adoption of automation and IaC to streamline infrastructure management and deployments.
Build and enhance self-service tools and frameworks, empowering R&D teams to operate independently with minimal reliance on DevOps.
Continuously improve CI/CD pipelines to optimize deployment speed and reliability.
Collaboration & Support for Self-Sufficiency

Collaborate closely with development, QA, and support teams to deliver tools and frameworks that promote team autonomy and efficiency.
Advocate for cross-functional engagement to align operational processes with R&D objectives.
Provide training and mentorship to teams on using DevOps tools effectively.
Accountability, Ownership, & Scalability

Take ownership of all systems and infrastructure, ensuring solutions are scalable, resilient, and aligned with Cyberints growth objectives.
Establish clear accountability frameworks for maintaining infrastructure and delivering on key projects.
Design and execute a roadmap to support self-service-oriented and scalable solutions.


Identify and lead strategic projects to enhance Cyberints platform scalability, reliability, and operational efficiency.
Develop and execute a roadmap for critical infrastructure and DevOps initiatives that drive business success.
Collaborate with senior stakeholders to align projects with organizational priorities and deliver measurable outcomes.
Requirements:
5+ years of experience in DevOps or SRE roles, with 2+ years in a leadership capacity.
Proven expertise in building and maintaining highly available, cloud-native environments (AWS preferred).
Experience with Kubernetes, Terraform, CI/CD pipelines, and monitoring technology and tools (Prometheus, Grafana, Jenkins, ArgoCD, Terraform, Elasticsearch, Redis, EKS, etc.).
Skills & Expertise

Strong understanding of automation, Infrastructure as Code (IaC), and self-service enablement.
Expertise in incident management and a track record of delivering reliable, scalable systems.
Hands-on experience with scripting and automation tools (Python, Bash).
Deep understanding of containerization, orchestration, and cloud-native architectures.
Familiarity with cost monitoring and optimization strategies to ensure infrastructure is both efficient and cost-effective.
Knowledge of security best practices for infrastructure and DevOps environments.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8121466
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
27/03/2025
Location: Tel Aviv-Yafo and Netanya
Job Type: Full Time
We are looking for a Site Reliability Engineering Manager to lead our Israel SRE team. In this role, you'll drive best practices in reliability engineering, ensuring the stability, availability, and performance of our SaaS services. You'll collaborate with global SRE leaders, refine processes, and foster a culture of accountability and continuous improvement.
As a Site Reliability Engineering Manager you will
Lead, mentor, and develop a high-performing SRE Israel team, fostering collaboration, innovation, and accountability
Ensure SaaS reliability, performance, and availability, meeting or exceeding service-level objectives
Drive SRE best practices, including capacity planning, incident management, chaos engineering, and disaster recovery
Implement proactive monitoring, alerting, and anomaly detection aligned with SaaS standards
Collaborate with P&E and Cloud engineering teams to embed reliability into the SDLC
Oversee incident management, ensuring swift identification, escalation, and resolution
Maintain comprehensive SRE documentation, including processes, incident reports, and system architecture
Evaluate and adopt tools, technologies, and methodologies to enhance uptime and reliability.
Requirements:
3+ years of management experience leading a team of SRE, DevOps, or a similar SaaS role
Bachelors degree in Computer Science, Engineering, or related field (or equivalent experience)
Strong expertise in cloud platforms (AWS, GCP, or Azure), containers (Kubernetes, Docker), and configuration management (Terraform, Ansible)
Proficiency in Python or Go for automation and system optimization, as well as GitOps experience with SCM tools (e.g., Git, Bitbucket)
Strong leadership, communication, and collaboration skills, working across globally distributed teams
Familiarity with Agile methodologies, CI/CD pipelines, and orchestration tools (Jenkins, ArgoCD, StackStorm)
Familiarity with Chaos Engineering (e.g., Gremlin, Litmus, Chaos Toolkit)
Hands-on with alerting & observability tools (e.g., PagerDuty, OpsGenie, New Relic, Coralogix)
Strong understanding of scalability, high availability, and security best practices in cloud & Kubernetes environments.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8118211
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
02/04/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
We are seeking an experienced and motivated SRE Tech Lead to join our dynamic Site Reliability Engineering (SRE) team. As a Tech Lead you will play a crucial role in enhancing the reliability, performance, and scalability of our systems and services. You will be a part of a global commando team of highly skilled SREs, driving best practices and innovations for optimal system operations, while protecting critical companies systems in a real time.
In this role, you will be responsible for:
Drive incident response and post-mortem processes, fostering a culture of continuous improvement.
Design, build and improve internal tools and automation software to make maintaining production services easier and safer.
Lead reliability-focused practices such as SLO (Service Level Objective) design and implementation, Failure Analysis, Load and Capacity Planning, Service Reviews, Architecture Designs, Incident Postmortems, and others.
Participate in the on-call rotation, providing expertise and support during critical system incidents and ensuring timely resolution.
Requirements:
Minimum 5 years of Software Engineering experience with .Net, NodeJs or other object-oriented languages.
Knowledge of architecture and application design experience.
Excellent troubleshooting and debugging skills.
Excellent verbal and written communication skills in English.
Basic knowledge of AWS or other cloud platforms on the infrastructure level
Preferred:
Experience with building AzureDevops CI/CD pipelines
Experience working on large-scale, high-traffic platforms.
Distributed monitoring experience with logging, metrics and tracing using OpenTelemetry and Prometheus.
Additional scripting languages: bash, powershell, python
Previous experience working as SRE
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8125103
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
30/03/2025
חברה חסויה
Location: Tel Aviv-Yafo and Netanya
Job Type: Full Time
We are looking for a highly experienced and visionary Senior Director to oversee our Cloud Production Operations and FinOps domains.
This role is pivotal in ensuring the reliability, scalability, and cost-effectiveness of our cloud-based infrastructure. You will lead a global team, including existing managers for each domain, and collaborate closely with engineering, product, customer success teams, and customers.
As a Senior Director of Production you will
Lead and mentor a globally distributed team of managers and engineers, fostering professional growth and collaboration across diverse geographies
Ensure the reliability, scalability, and performance of our multi-cloud SaaS infrastructure by driving the implementation and continuous improvement of SRE practices, with a focus on proactive monitoring, automation, and disaster recovery planning
Establish and maintain robust processes for production and incident management, including well-defined playbooks, escalation protocols, root cause analysis frameworks, and actionable post-mortem reviews to drive continuous improvement
Lead cross-functional initiatives, collaborating closely with engineering, product, and customer success teams to ensure operational priorities are aligned with business goals and deliver maximum impact
Provide strategic leadership to FinOps efforts, ensuring efficient use of cloud resources while balancing business and engineering priorities in cloud expenditure strategies
Align cloud cost optimization initiatives with overarching business objectives, collaborating with engineering, finance, and cloud vendors to maximize cost efficiency and transparency.
Requirements:
10+ years of professional experience, with 5+ years in leadership roles managing global teams
Proven track record in building and scaling teams and processes for cloud-based products
Strong background in cloud production operations, including expertise in SRE principles
Solid understanding of at least one cloud-based environment and its operational nuances
Experience overseeing FinOps initiatives, including cost analysis, optimization, and collaboration with cloud vendors
Proven experience developing and implementing robust production, incident, and knowledge management processes
Demonstrated ability to collaborate effectively with internal stakeholders and customers, with a proven track record of influencing and delivering results
Exceptional communication, stakeholder management, and strategic planning skills
Familiarity with DevSecOps practices is highly desirable.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8119369
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are growing and are looking for future workers who value personal and career growth, team-work, and winning!
About us:
we are revolutionizing how IT and security teams gain comprehensive visibility and control over their digital assets and relationships. As the system of record for digital infrastructure, we solve complex challenges by delivering the critical context required to manage and secure devices, users, software, SaaS applications, and cloud services. By connecting to hundreds of data sources and automating key processes, we empower organizations to close security gaps and prevent incidents.
Learn more about us and take our Product tour.
What your day will look like:
As the Team Leader for the Production Team, you will play a pivotal role in the production and delivery processes of our company.
Your primary responsibilities will be managing the team responsible for developing and managing the creation of high-quality production artifacts and ensuring their timely delivery to customers.
You will collaborate closely with cross-functional teams to maintain production schedules, manage resources, and uphold the highest standards of quality and efficiency.
Key Responsibilities:
Responsible for the orchestration of our product into an appliance, administration of the underlying Linux OS, and management of host services.
Management and development of the application boot process.
Responsible for solving difficult problems regarding the architecture and deployment of the company's Product.
All deployment operations to our online customer environments.
Leading and building new deployment mechanisms for complicated environments.
we support in multiple Linux distributions and variants (e.g RedHat, Ubuntu, CentOS)
Optimization of Linux configurations for better performance.
Compliance with industry-standard security and regulatory requirements.
Requirements:
Over 2 years of experience managing an engineering team.
At least 5 years of proven experience with Linux system administration in multiple Linux distributions (e.g Ubuntu, Redhat, CentOS).
Experience deploying and operating services based on Linux containers and virtualization (Docker, etc.).
Proven understanding of architecture principles across infrastructure platforms, security, data, integration, and application layers.
Hands-on dev experience in one or more programming languages - Python, Golang, etc.
Experience with CI/CD methodologies.
Effective communication skills, both verbal and written.
Experience with configuration management systems. Ansible - Advantage.
Experience with Cloud - AWS advantage.
Experience in optimizing Linux production environments for performance.
This role is an exciting opportunity for a dynamic and skilled leader to contribute to the core of our companys technology landscape, driving innovation and excellence in our engineering practices.
Join us in redefining the standards of technological engineering.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8131803
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
02/04/2025
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking an experienced and motivated Engineering Backend Tech Lead to join our dynamic Site Reliability Engineering (SRE) team. As an Engineering Backend Tech Lead you will play a crucial role in enhancing the reliability, performance, and scalability of our systems and services. You will be a part of a global commando team of highly skilled SREs, driving best practices and innovations for optimal system operations, while protecting critical companies systems in a real time.
In this role, you will be responsible for:
Drive incident response and post-mortem processes, fostering a culture of continuous improvement.
Design, build and improve internal tools and automation software to make maintaining production services easier and safer.
Lead reliability-focused practices such as SLO (Service Level Objective) design and implementation, Failure Analysis, Load and Capacity Planning, Service Reviews, Architecture Designs, Incident Postmortems, and others.
Participate in the on-call rotation, providing expertise and support during critical system incidents and ensuring timely resolution.
Requirements:
Minimum 5 years of Software Engineering experience with .Net, NodeJs or other object-oriented languages.
Knowledge of architecture and application design experience.
Excellent troubleshooting and debugging skills.
Excellent verbal and written communication skills in English.
Basic knowledge of AWS or other cloud platforms on the infrastructure level
Preferred:
Experience with building AzureDevops CI/CD pipelines
Experience working on large-scale, high-traffic platforms.
Distributed monitoring experience with logging, metrics and tracing using OpenTelemetry and Prometheus.
Additional scripting languages: bash, powershell, python
Previous experience working as SRE
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8125295
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking a highly skilled DevOps Engineer to join the TLV Foundation Services Team within the BDC organization. In this role, you will be responsible for designing, implementing, and maintaining scalable, highly available production systems. You will collaborate with developers and global infrastructure teams to ensure seamless deployment, automation, and system reliability.
What Youll Do:
System Reliability & Scalability: Build and maintain highly available, scalable, and resilient production systems.
Automation & Infrastructure as Code: Develop and manage infrastructure using Terraform and other IaC tools.
Cloud Infrastructure Management: Configure and manage AWS, Azure, or similar cloud platforms to optimize performance and cost efficiency.
Containerization & Orchestration: Work extensively with K8s and Istio to manage containerized environments.
CI/CD Pipelines: Design, build, and maintain CI/CD automation using Jenkins, GitHub, or similar tools.
Monitoring & Logging: Implement and manage observability tools for monitoring, logging, and metrics collection in large-scale production environments.
Incident Management: Rapidly identify and resolve production issues, ensuring minimal downtime.
Security & Compliance: Implement best practices for security, access control, and compliance within cloud and on-prem environments.
Collaboration: Work closely with developers and infrastructure teams to streamline deployment, automation, and operations.
Support & Maintenance: Provide on-call support as needed to ensure system reliability.
Requirements:
3+ years of DevOps experience in a cloud-based production environment.
Strong expertise in Docker, K8s, and containerized application management.
Experience with AWS, Azure, or similar cloud platforms.
Hands-on experience with Infrastructure as Code (IaC) tools like Terraform.
Proficiency in CI/CD automation, including Jenkins, GitHub Actions, or similar
Knowledge of monitoring and logging tools
Strong scripting skills in Bash and experience with programming languages like Python or Java.
Experience with GitOps methodologies.
Familiarity with Istio and service mesh architectures.
Bonus Points For:
Hands-on experience managing data lakes or data warehouses.
Prior experience in Enterprise environments.
Strong problem-solving abilities and a passion for learning new technologies.
Ability to thrive in a fast-paced, dynamic environment and tackle challenges head-on.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8120388
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
07/04/2025
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking to hire a talented, self-driven and passionate Senior Infrastructure Engineer to build and maintain the cloud infrastructure for our highly available SaaS application as well as our machine learning and data engineering stack.

As a Senior Infrastructure Engineer, you will be responsible for designing, implementing, and maintaining the cloud infrastructure and DevOps processes that power our products and internal tooling. You will work closely with all data and development teams and lead the companys security and compliance vectors. You will ensure a highly reliable, scalable, and secure infrastructure that supports our rapid growth and product innovation, while maintaining observability and cost-effectiveness of our cloud resources and data.

Responsibilities:
Cloud Infrastructure Management: Architect, deploy, and manage our cloud infrastructure (AWS), ensuring high availability, scalability, and security.
Software Engineering: Be a top notch SW engineer, harnessing your coding and architectural skills, as well as researching skills, for our infra stack.
Infrastructure as Code (IaC): Define and maintain infrastructure using tools like Terraform, CloudFormation, or Pulumi to manage resources efficiently and reproducibly.
Monitoring & Incident Management: Build and manage monitoring and alerting systems to ensure uptime, and respond to incidents with root cause analysis and remediation.
DevOps & Automation: Implement and maintain CI/CD pipelines to streamline development workflows and automate deployment processes across development, staging, and production environments, and across different parts of our solution. While our development teams are expected to write and maintain their own CI, you will act as a supervisor and professional authority, and maintain cross team and complex automation.
Collaboration and technical leadership: Partner with software engineers, data engineers, and machine learning teams to support their infrastructure needs and guide the evolution of our infrastructure team.
Cost Optimization: Monitor cloud spend and optimize resources to ensure cost-effective infrastructure without sacrificing performance or security.
Security & Compliance: Implement security best practices, including access control, network security, monitoring and ensuring the infrastructure is compliant with relevant industry standards (e.g., SOC2, GDPR).
דרישות:
Requirements:
Experience: 5+ years of hands-on experience in cloud infrastructure, DevOps and platform engineering in production environments.
Cloud Platforms and IaC: Expertise in managing cloud infrastructure on at least one of the major providers: AWS, GCP, Azure. Proficient in Infrastructure as Code tools such as Terraform, CloudFormation, or Pulumi.
Containerization & Orchestration: Solid experience with Docker and Kubernetes.
Monitoring & Logging: Hands-on experience with monitoring tools (Prometheus, Grafana) and logging systems (ELK, Splunk, or equivalent).
Software Engineering: Proficient Software engineering, architecture, as well as scripting languages such as Python, Bash, or Go. Full control of version control systems such as Git.
DevOps Tools: Strong experience with CI/CD pipelines and automation using Jenkins, CircleCI, GitHub Actions, GitLab CI, or similar.
Networking: Strong understanding of cloud networking, VPNs, VPCs, DNS, and firewalls.
Security Best Practices: Experience implementing cloud security best practices, including IAM, encryption, and key management.
Startup Experience: Previous experience in a fast-paced startup environment, where adaptability and hands-on execution are key.
Team Player: Strong communication skills and ability to work cross-functionally with different teams.

Advantages:
ML Infrastructure: Experience supporting machine learning pipelines and deploying ML models to production environments.
Data Engineering: Familiarity with data engineering tools like Apache Spark, Airflow, or similar.#EN המשרה מיועדת לנשים ולגברים כאחד.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8131896
סגור
שירות זה פתוח ללקוחות VIP בלבד