דרושים » תוכנה » Senior HPC AI Cluster Engineer

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
חברה חסויה
Location: More than one
Job Type: Full Time
we are looking for an experienced HPC Engineer to join the E2E software verification HPC/AI Infrastructure team. we are focused on building supercomputers and HPC clusters based on groundbreaking technologies. We are looking for an outstanding architect for a senior HPC, be a key player to the most exciting computing hardware and software to contribute to the latest breakthroughs in artificial intelligence and GPU computing. Provide insights on at-scale system design and tuning mechanisms for large-scale compute runs. You will work with the latest Accelerated computing and Deep Learning software and hardware platforms, and with many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. You will interact with HPC, OS, GPU compute, and systems specialist to architect, develop and bring up large scale performance platforms.
What you will be doing:
Design, implement and maintain large scale HPC/AI clusters with monitoring, logging and alerting
Manage Linux job/workload schedules and orchestration tools
Develop and maintain continuous integration and delivery pipelines
Develop tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources
Deploy monitoring solutions for the servers, network and storage
Perform troubleshooting bottom up from bare metal, operating system, software stack and application level
Being a technical resource, develop, re-define and document standard methodologies to share with internal teams
Support Research & Development activities and engage in POCs/POVs for future improvements.
Requirements:
A degree in Computer Science, Engineering, or a related field
5+ years of experience
Knowledge of HPC and AI solution technologies from CPUs and GPUs to high speed interconnects and supporting software
Experience with job scheduling workloads and orchestration tools such as Slurm, K8s
Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking (sockets, firewalld, iptables, wireshark, etc.) and internals, ACLs and OS level security protection and common protocols e.g. TCP, DHCP, DNS, etc.
Experience with multiple storage solutions such as Lustre, GPFS, zfs and xfs. Familiarity with newer and emerging storage technologies.
Python programming and bash scripting experience.
Comfortable with automation and configuration management tools such as Jenkins, Ansible, Puppet/chef
Deep knowledge of Networking Protocols like InfiniBand, Ethernet
Deep understanding and experience with virtual systems (for example VMware, Hyper-V, KVM, or Citrix)
Ways to stand out from the crowd:
Familiarity with cloud computing platforms (e.g. AWS, Azure, Google Cloud)
Knowledge of CPU and/or GPU architecture
Knowledge of Kubernetes, container related microservice technologies
Experience with GPU-focused hardware/software (DGX, Cuda)
Background with RDMA (InfiniBand or RoCE) fabrics.
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8317649
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 1 שעות
חברה חסויה
Location: Yokne`am
Job Type: Full Time
we are looking for an HPC and AI Data Center Engineer to join the networking cloud solutions HPC/AI Infrastructure team. We are focused on building supercomputers and HPC clusters based on groundbreaking technologies. We are looking for a lab manager, be a key player to the most exciting computing hardware and software to contribute to the latest breakthroughs in artificial intelligence and GPU computing. Take part of building large-scale compute and Deep Learning software and hardware platforms, work together and support many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions.
What you will be doing:
Plan and build complex cluster and supercomputers in various of data center and labs
Rack stack and cable management to ensure efficient use of space and easy maintenance
Ensure data centers and labs power and cooling efficiency while optimizing rack space utilization
Data centers and labs daily operation and support
Installations for variety of infrastructure and solutions - Cloud, VMs, Storage, Network, HPC and AI
Perform troubleshooting - network, optic cabling, bare metal, operating system.
Support Research & Development activities.
Requirements:
MCSE or MCITP/CCNA certification
3+ years of experience as lab manager
Experience in supporting large and complex data centers
Proven hands-on experience in Linux troubleshooting with good problem identification, resolution and solving skills.
In depth knowledge in Linux & Windows Core Services: DHCP, DNS, NIS, AD, etc.
Team Work, Service oriented, organized
Ways to stand out from the crowd:
Scripting experience in Bash and/or Python
Experience with configuration managements tools known in the community (e.g. Ansible, puppet)
CI & Known Job schedulers tools (e.g. Jenkins, SLURM)
Virtualization: KVM / VMware / Hyper-V
Experience with L2 & L3 network protocols.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8320144
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
Required Site Reliability Engineer- Infra
Realize your potential by joining the leading performance-driven advertising company!
As a Site Reliability Engineer- infra, on our Infrastructure team at the TLV office, you will play a key role in ensuring the reliability, scalability, and performance of our critical systems. You will be responsible for managing and improving our core infrastructure, with a focus on automation, monitoring, and incident response. You will work with a wide range of technologies, including Kubernetes, monitoring and observability tools, configuration management systems, and core networking services.
How youll make an impact:
As a Site Reliability Engineer, youll bring value by:
Ensure the reliability, availability, and performance of our infrastructure services.
Manage and maintain our Kubernetes infrastructure, including KubeVirt.
Design, implement, and maintain our monitoring and observability stack (SensuGo, VictoriaMetrics, Prometheus, ELK).
Automate infrastructure provisioning, configuration, and deployment processes using Puppet and Ansible.
Manage and maintain core services such as DNS and networking.
Troubleshoot and resolve complex infrastructure issues in a timely and efficient manner.
Participate in on-call rotations and incident response.
Develop and maintain infrastructure-as-code (IaC).
Identify and implement proactive measures to prevent incidents and improve system reliability.
Collaborate with development teams to ensure smooth and reliable deployments.
Contribute to the design and implementation of new infrastructure solutions.
Drive improvements in system architecture, processes, and tools.
Mentor and coach other team members.
Requirements:
5+ years of experience in a Site Reliability Engineering, Systems Engineering, or similar role.
Deep understanding of Site Reliability Engineering principles and practices.
Extensive experience with Kubernetes, including deployment, management, and troubleshooting.
Strong experience with monitoring and observability tools such as SensuGo, Zabbix, VictoriaMetrics, Prometheus, and ELK.
Proficiency in configuration management tools such as Puppet and Ansible.
Solid understanding of Linux internals and networking.
Experience with managing and maintaining core services such as DNS and networking.
Strong programming skills in Python and/or Go.
Experience with both on-premises and cloud environments.
Experience with KubeVirt.
Excellent troubleshooting and problem-solving skills.
Strong communication and collaboration skills.
Ability to work in a fast-paced, dynamic environment.
Ability to participate in on-call rotations including weekends.
Preferred Qualifications:
Experience with large-scale, distributed systems.
Experience with other cloud providers (e.g., AWS, Azure, GCP).
Contributions to open-source projects.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8272676
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 3 שעות
Location: Ra'anana
Job Type: Full Time
we are seeking a hands-on Software Manager to lead an engineering team developing next-generation, cloud-native infrastructure based on Kubernetes for AI and HPC workloads. Youll manage a high-impact team building scalable systems powered by DPUs and our companys advanced networking technologies. In this role, youll collaborate closely with architecture and marketing teams to shape system design, align technical direction with business goals, and ensure strong execution. This is a unique opportunity to lead a top-tier team delivering infrastructure at scale.
What youll be doing:
Lead and coordinate a team building K8s-based infrastructure for AI and HPC.
Oversee feature delivery with active involvement in design, development, and debugging.
Guide the team to build scalable, high-performance systems using our companys compute and networking technologies.
Collaborate with architecture and marketing to align strategy and influence design.
Drive recruitment, mentorship, and foster a culture of innovation and excellence.
Requirements:
Bachelors degree in Computer Science or equivalent experience.
8+ overall years of software development experience, including 2+ years in leadership roles managing teams.
Deep hands-on expertise with K8s and the cloud-native ecosystem.
Strong proficiency in Go and Python programming languages.
Experience designing and operating large-scale distributed systems.
Proven ability to work effectively in remote and cross-functional teams.
Ways to stand out from the crowd:
Background in AI, HPC, or cloud infrastructure.
Familiarity with our company hardware, including DPUs, BlueField, and ConnectX.
Active involvement in open source projects or communities.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8319779
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Ra'anana
Job Type: Full Time
we are a leader in disaggregated high-scale networking solutions for service providers and AI infrastructures. Founded in December 2015, our company created a radical new way to build networks by adapting the architectural model of the cloud to telco-grade networking. This solution accelerates network deployment, improves the networks economic model, and radically simplifies network operations. With customers including Comcast, Orange, and KDDI - over 80% of AT&Ts network traffic now runs through a disaggregated core powered by our company's software. our company's Network Cloud-AI solution, based on the same technology, was introduced to the market in 2023, providing the highest-performance Ethernet-based AI networking solution, and is already deployed by Hyperscalers, NeoClouds and Enterprises. Raising over $587 million in three funding rounds, our company continues to deploy the most innovative network infrastructure and is looking for the most talented people to be part of this journey.
The Role
we are seeking a highly motivated and skilled Software Engineer to join our Hardware Software team. In this role, you will be responsible for the development and integration of Board Support Packages (BSP) and low-level firmware components for our carrier-grade networking solutions. Carrier-grade routers/switches designed for service providers or data center networks. The systems integrate ASICs and high-throughput backplanes supporting multi-terabit line rates. You will work closely with hardware, platform, and system architects to bring up new hardware platforms and support advanced network functionalities in high-performance environments.
Key Responsibilities:
Develop, integrate, and maintain BSP components, including bootloaders (e.g., U-Boot), device trees, and hardware abstraction layers.
Design and implement firmware and low-level drivers for network-centric hardware platforms (e.g., ASICs, NICs, SoCs, CPLDs, FPGAs).
Support hardware bring-up and board validation, collaborating with hardware engineers and system integrators.
Work on performance optimization, debugging, and stability improvements of system software on embedded Linux platforms.
Interface with third-party SDKs and adapt them to fit within our companys software infrastructure.
Ensure compliance with industry standards and best practices for networking and embedded systems.
Requirements:
Requirements:
BSc or MSc in Computer Science, Electrical Engineering, or related technical field.
8+ years of experience in embedded software development, preferably in the networking or telecommunications industry.
Proficiency in C/C++ for low-level system development.
Strong experience with embedded Linux, bootloaders, kernel configuration, and driver development.
Familiarity with SoC architectures (e.g., ARM, MIPS) and board bring-up procedures.
Hands-on experience with hardware debugging tools (oscilloscopes, JTAG, logic analyzers).
Knowledge of networking protocols and hardware (Ethernet, switching/routing, PHYs) is a strong plus.
Experience with Broadcom SDKs, ONIE, or network operating systems (NOS) is an advantage.
Nice to Have:
Background in data center or service provider environments.
Exposure to high end routers or switches platforms
Why us?
Work on cutting-edge cloud-native networking solutions that scale to the worlds largest networks.
Be part of a fast-paced, innovative team thats transforming the telecom and hyperscale networking space.
Great growth opportunities in a global, technology-driven company.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8263768
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
5 ימים
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a talented highly-motivated experienced SW engineer to join one of its growing inspiring development teams.
You will work on multi-tenant, high-scale, distributed SaaS echo system on top of k8s platform which is used for managing the cloud security services infrastructure, customers' self-service configuration, monitoring and reporting, analytics and more.

As a SW engineer, you will manage and work with the different Engineering teams and architects in order to design, develop, monitor, scale and optimize the large-scale architecture of a winning SaaS security service.

What will you do?

Implement our implementation of next generation back-end infrastructure to help us scale our SaaS based infrastructure.

Be part of a team building tools to make our infrastructure scalable, and robust.

Leverage Generative AI tools for code generation, optimization, debugging, documentation, and prototyping.

Continuously research and integrate new AI-driven developer productivity tools.

Design and develop an always-available Cloud-based SaaS platform in AWS

Lead and Design the development of robust CI/CD pipelines for Kubernetes running Containerized applications

Design and build strong Application and System monitoring and automated self-healing procedures.

Maintain and support application deployments, building new systems and upgrading existing.

Working closely with all the Engineering and DevOps teams, taking full responsibility and ownership from conception to post-deployment in a collaborative, fast-paced environment.
Requirements:
6+ years of experience in infrastructure and Backend SW development roles.

Experience managing infrastructure on AWS.

Experience with architecture methodologies and paradigms like micro-services, distributed systems and more.

Experience integrating and actively using GenAI tools (e.g., GitHub Copilot, Claude, ChatGPT etc) in daily development must.

Open-minded to new workflows and AI-driven innovation.

An agile/DevOps way of thinking.

Experience with CI/CD tools (Jenkins, argot, Nexus and similar).

Experience with the K8S platform and tools (Helm charts and similar).

Experience with the following technologies/tools/fields: Elasticsearch , Clickhouse, Messaging (Kafka,NATS,, Redis etc), Monitoring and Visibility (Prometheus, Grafana, loki, etc).

Programming languages Golang/ Java.

Functioning well under pressure.

Strong problem-solving ability and a "Can-do approach".

Working in an agile environment.

Excellent communication and interpersonal skills.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8312368
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 2 שעות
Location: More than one
Job Type: Full Time
we are looking for an outstanding passionate and talented Senior SW & System Architect to join our SW Architecture group. The position includes researching new technologies with focus on architecture definition of groundbreaking technologies in different domains - networking, security, virtualization and orchestration. our company's Architecture group consists of world-class architects responsible for designing the next generation state-of-the-art architecture for our DPUs & NICs technologies. You will play a key role defining the future of cloud solution stack, from HW to Application level, including orchestration, provisioning, network programmability and SDN. You will be working with various teams around the world including SW architects, R&D, product, solution architects and external customers. This position offers a unique and exceptional opportunity to have real impact in a dynamic, technology-focused company shaping the future of data-centers technologies.
What you will be doing:
Lead architecture for cloud-networking including orchestration, provisioning and security solutions
Design state-of-the-art system architecture for DPUs & NICs technologies
Build end-to-end solutions from application level to HW
Responsible for writing effective, clear and reliable architecture specification
Evaluate new technologies and innovate & rapidly develop POC prototypes that can then be developed into full-fledged products/solutions
Work closely with different company teams around the world including SW & HW architects, R&D, product, solution architects, application and field engineers and more
Work with high profile customers on advanced and future technologies and solutions.
Requirements:
B.Sc./M.Sc./PhD degree in Computer Science, Computer Engineering, or Electrical Engineering
5+ years of experience as SW Architect/System Architect and/or SW developer
Deep knowledge and experience with C, Python
Hands on Linux development, Docker and Containers based technologies
Experience with cloud and Data Center networking
Wide knowledge and understanding of networking protocols and common network topologies
Strong design, coding, analytical, debugging and problem-solving skills
Ability to work concurrently with multiple groups locally and abroad in the organization
Excellent communication, documentation and presentation skills
Ways to stand out from the crowd:
Development experience with networking/security devices NICs/DPUs/Switches/Routers /Firewalls etc.
Experience with DPDK, OVS, OVN
Background with Kubernetes components & subsystems, CRDs, Operators, system plugins and CNI plugin development (Calico, Flannel)
Experience with OpenStack/OpenShift and/or Cloud APIs
Familiarity with different automation tools such as Ansible.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8319897
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Ra'anana
Job Type: Full Time
we are a leader in cloud-native networking software for Hyperscalers and service providers who are building the largest infrastructures in the world for network services and AI platforms. Founded in December 2015, our company disrupted some of the most challenging high-scale markets, transforming the way Networks are built, scaled, and consumed. We also built the largest network in the world, with more than half of AT&Ts backbone running on our companys Network Cloud. our companyhas raised $587 million in three funding rounds which enable us to dream big and bring on the most talented people.
The Role
We are looking for a Full Stack Software Engineer to join our Network Orchestration group. Our group is responsible for developing scalable, high-performance distributed systems that support complex network infrastructures.
In this role, you will be working on both backend and frontend development, contributing to the design, implementation, and optimization of our software solutions. You will work end to end on features, from design and development to deployment and monitoring in production.
You will collaborate closely with team members, QA engineers, Product Managers, Project Managers, and UI/UX Designers to deliver high-quality, production-ready applications.
We embrace an agile mindset you should be comfortable with context switching, handling multiple priorities, and adapting quickly to changing requirements.
Responsibilities
Develop and maintain backend services using Node.js (TypeScript, NestJS/Express) and frontend applications with Angular.
Work end to end on features, from design and development to deployment and monitoring in production.
Write clean, maintainable, and well-tested code, following best practices.
Work with relational and NoSQL databases, designing efficient data models and queries.
Collaborate with QA engineers, Product Managers, Project Managers, and UI/UX Designers to deliver high-quality features.
Participate in code reviews, providing constructive feedback and ensuring best practices.
Troubleshoot and debug application issues, ensuring smooth functionality in production.
Work with distributed systems, understanding their challenges and ensuring scalability and reliability.
Stay updated with modern development trends, frameworks, and best practices.
Requirements:
Full Stack Development: Strong hands-on experience with Node.js (TypeScript, NestJS/Express) for backend and Angular for frontend.
Technical Expertise: 3+ years of experience developing production-grade applications.
Backend Development: Knowledge of REST APIs, microservices, and working with structured data.
Frontend Development: Experience with Angular, TypeScript, RxJS, and understanding of component-based architecture.
Databases: Experience working with SQL (PostgreSQL, CockroachDB) and NoSQL (MongoDB, Redis).
Distributed Systems: Understanding of scalable architectures, with some experience working in distributed environments.
Testing & Debugging: Experience writing unit tests, automations tests, and debugging applications.
Agile & Adaptability: Ability to work in a fast-paced, dynamic environment, handling multiple priorities and adapting to changes effectively.
Collaboration & Communication: Ability to work closely with QA, Product, Project Managers, and UI/UX Designers to deliver high-quality features.
Nice to Have
Experience with Cloud providers (AWS, GCP, Azure).
Understanding of Kubernetes (k8s) and cloud-native deployments.
Experience with React.
Familiarity with Nx platform for monorepo management.
Knowledge of advanced security concepts such as TLS, encryption, and authentication mechanisms.
Experience working with event-driven architectures and messaging systems (RabbitMQ, Kafka, etc.).
Knowledge of gNMI, Netconf, gRPC, or other network management protocols.
Background in real-time telemetry and network monitoring.
Education And/or Relevant Experience.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8264223
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
Location: Tel Aviv-Yafo and Yokne`am
Job Type: Full Time
we are at the forefront of AI-driven innovation in VLSI design automation. Join us to shape the future of semiconductor design with cutting-edge AI tools and make a significant impact in a collaborative, high-performance environment. Are you ready to push the boundaries of whats possible in VLSI CAD? Come be part of our pioneering team!
What you'll be doing:
You will be responsible for developing and integrating advanced CAD solutions and automation flows using AI and machine learning for VLSI design, verification, and implementation.
Work closely with design, verification, and CAD teams to identify areas for improving VLSI workflows using advanced tools and methods.
Research, prototype, and deploy AI-based algorithms.
Develop and maintain scripts and automation infrastructure to enable seamless adoption of AI tools in the VLSI design process.
Continuously review emerging AI technologies and methodologies to keep our CAD environment up-to-date.
Provide technical support and training to engineering teams on AI-enabled CAD flows and best practices.
Requirements:
B.Sc./M.Sc. in Electrical Engineering, Computer Engineering, Computer Science, or equivalent experience.
5+ years of experience in VLSI CAD tool development, with a strong focus on integrating AI/ML techniques into EDA workflows.
Proficiency in Python and at least one AI/ML framework (such as TensorFlow, PyTorch, or scikit-learn).
Hands-on experience with VLSI physical design and familiarity with industry-standard EDA tools (e.g., Synopsys, Cadence).
Knowledge of data preprocessing, feature engineering, and model deployment as applied to VLSI design challenges.
Experience developing and maintaining automation scripts (Python, Perl, Tcl, Make).
Strong analytical skills in evaluating the impact of AI solutions on design quality, performance, and productivity.
Excellent communication skills and the ability to work cross-functionally in a fast-paced environment.
Self-motivation, attention to detail, and a track record of delivering robust solutions to production.
Ways to stand out from the crowd:
Demonstrated experience deploying AI/ML models in production VLSI CAD environments.
Contributions to open-source AI/EDA projects or publications in relevant conferences/journals.
Deep understanding of VLSI design challenges-such as timing closure, power optimization, or yield enhancement-and how AI can address them.
Experience with cloud-based or distributed compute environments for large-scale AI training and inference.
Strong ownership, curiosity, and a passion for continuous learning and innovation.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8318297
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 3 שעות
Location: Ra'anana
Job Type: Full Time
we are looking for an excellent Software Engineer for the Switch SDK Group. You will join the SDK group and take our product to next level, working closely with various other design and architecture teams and gain a deep understanding of our companys products and technologies. our company has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Its a unique legacy of innovation thats fueled by great technologyand amazing people.
Today, were tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing whats never been done before takes vision, innovation, and the worlds best talent. As a worker, youll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.
What youll be doing:
Design, develop, optimize and maintain APIs, tools and libraries for Switching, Routing, Analytics, Telemetry and many other modules
Collaborate with team members, Architects, QA teams, and customers (both external and internal)
Innovate & rapidly develop POC prototypes that can then be developed into full-fledged products/solutions.
Requirements:
B.Sc. in Software Engineering / Computer Science / related field or equivalent work experience will be considered as well
10+ years of experience as a Software Engineer, including experience with C programming
Experience with Embedded/ RT Embedded systems
Excellent C programming skills, with a keen eye for performance and writing optimized code
Strong analytical skills, deep knowledge of algorithms and proficiency with data structures
Excellent communication and documentation skills
Ways to stand out from the crowd:
Previous experience with Ethernet Switching or Routing protocols
Hands on Linux development, user-space and/or kernel-space.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8319754
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
24/07/2025
Location: Tel Aviv-Yafo
Job Type: Full Time
We are seeking a Senior Platform Engineer, Observability to join our Observability team. This role offers the opportunity to work at the intersection of software development and platform engineering, contributing to the tools, systems, and practices that improve visibility, reliability, and operational excellence across our engineering organization.

This position is ideally suited for experienced software engineers who are passionate about building high-quality systems and are interested in expanding their expertise in observability, distributed systems, and developer experience. You will help design, build and maintain systems that empower engineers across us to monitor, understand, and troubleshoot their services more effectively.

Our observability team is responsible for delivering scalable and user-friendly solutions to over 150 engineers working across more than 20 teams. Were focused on enabling rapid incident detection and resolution, improving our reliability posture, and supporting a culture of continuous improvement.

What you'll be doing:
Design, build, and maintain observability tools and infrastructure that help our engineers provide actionable insights into the performance and reliability of our systems.
Collaborate with other engineers and teams to enhance the developer experience around monitoring, logging, alerting, and tracing.
Develop and evolve our internal tooling to simplify the process of instrumenting and observing services.
Partner with engineering teams to improve incident response and recovery workflows, and ensure systems meet internal SLOs/SLAs and reliability targets.
Support the migration from our legacy ELK stack to a modern observability platform using Prometheus, Mimir, Grafana, Honeycomb, Loki, Quickwit, and OpenTelemetry.
Contribute to knowledge sharing and the ongoing development of best practices in observability across the organisation.
Requirements:
What you'll need:
4+ years of professional experience as a software engineer, with a strong foundation in building and maintaining production systems.
Proficiency in one or more modern programming languages such as Python, Java, JavaScript, or Ruby.
Familiarity with Kubernetes, AWS, and infrastructure-as-code tools such as Terraform.
Experience working with observability tools and platforms (e.g. Prometheus, Grafana, ELK, Honeycomb, Loki, or similar).
A strong interest in developer experience and platform tooling, with the ability to empathise with engineering teams as internal customers.
Excellent communication skills, with the ability to collaborate effectively across teams and explain complex technical concepts clearly.
A proactive mindset focused on long-term impact, sustainable engineering practices, and continuous improvement.

Preferred Qualifications:
Experience with OpenTelemetry or distributed tracing systems.
Understanding of observability-driven development and service reliability principles (e.g. SRE, MTTR, SLIs/SLOs).
Experience optimising observability systems for cost and performance at scale.
Knowledge of microservices architectures and how to monitor and debug distributed systems.
Contributions to open-source projects in the observability or monitoring space
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8274690
סגור
שירות זה פתוח ללקוחות VIP בלבד