דרושים » תוכנה » Senior Software Architect, advanced development

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
Location: Yokne`am
Job Type: Full Time
as a senior architect in the advanced development team, you will explore technological challenges on accelerate networking and building ai data centers. research new transport functions and semantics for optimizing ai workloads. you will also be leading architectural and development efforts across numerous technological fields, related to the modern data center, such as distributed ai and deep learning solutions, data analytics, high performance computing (hpc), software defined networking (sdn), virtualization, Storage, and more.
what youll be doing:
enhance gpu networking offerings for accelerating ai workloads, such as dynamo or nvidia nixl.
identify and evaluate new technologies, innovations and partner relationships for alignment with our technology roadmap and business value.
lead architecture and design of such technologies.
lead proof-of-concept development to evaluate and drive such technologies.
Requirements:
what we need to see:
hold a m.sc. or ph.d. in Computer Science, electrical or computer engineering from a leading university (or equivalent experience).
12+ years of industry experience (or equivalent) in systems architecture or related fields.
experienced in virtualization, networking and Storage.
experienced in either windows or Linux drivers, with a very good background of the other os.
deep understanding of performance profiling and optimization techniques, together with defining and using hw offloads.
a teammate with a can-do attitude, high energy and excellent interpersonal skills.
ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593689
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
02/03/2026
Location: Yokne`am
Job Type: Full Time
The Networking Advanced Development Software team develops new groundbreaking technologies to enable new market shares for the company and tighten customer relationships. These are emerging technologies in networking and distributed computing for the booming AI factories and data centers. They span areas such as AI neural networks, Deep Learning, High Performance Computing (HPC), Storage, Cloud, SW Defined Network, Network Function Virtualization and more. We develop the solutions top-down, all the way from application behavioral analysis, to architecture definition and down to the implementation, using the world-leading our devices. The development traverses any needed component - application SW, middleware SW, OS kernel subsystems, device drivers, embedded SW (Firmware) and CUDA GPU. We collaborate with partners and key customers in the analysis processes and engage with open source communities introducing our leading features.

What youll be doing:

Design and implement solutions throughout all layers from high level application, OS and driver subsystem to firmware.

Work on impactful projects involving state-of-the-art high-performance computing hardware and software.

Provide insight and technical guidance and collaborate with peers from across the company - including software architecture, chip architecture, and engineering departments to improve our future technology.

Collaborate with our partners and customers.
Requirements:
What we need to see:

B.Sc. in Computer Science, Electrical Engineering, Computer Engineering, or a related field.

5+ overall years of industry experience in system programming or related fields.

Understanding of multi core hardware, operating systems design, concurrency, virtual memory, caching, interrupts, device drivers, real-time

Excellent programming skills.

Ability to learn complex concepts in a fast pace environment.

A teammate with a can-do attitude, high energy and excellent interpersonal skills.

Ways to stand out from a crowd:

Familiarity with networking protocols.

Hands-on experience with CUDA programming and GPU acceleration.

Hands-on experience with LLM serving frameworks.

Experience with open-source projects (coursework, personal, or contributions).

Working in a fast-paced and dynamic environment.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8566056
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
02/03/2026
Location: Ra'anana and Yokne`am
Job Type: Full Time
We are currently seeking a hard-working Senior System and Hardware TimeSync Architect who flourishes with these types of challenges to join our Time Synchronization Architecture team.

In this role, you will be exposed to the newest technologies and will help define how our GPUs, CPUs, and networking devices use timing to power extraordinary products and applications. Specifically, you will focus on next-generation Radio Access Network (RAN) platforms for 5G and 6G, delivering innovative, scalable, and power-efficient TimeSync solutions. By working across hardware and software stacks, you will support sophisticated AI acceleration and drive industry-wide standardization. If you want to lead the industry and help us define the next generation of data center and telecommunications technology, this is where you belong.

What youll be doing:

Master our Time Synchronization Technology

Define of hardware and system architectures

Research and evaluate algorithms currently used in related applications

Develop complex proof-of-concepts to demonstrate ideas

Architect Time Synchronization hardware tailored to the requirements of next generation RAN workloads, including distributed, centralized, and small cell deployment scenarios.

Collaborate with software architects to define new features and robust SW-HW interfaces for diverse networking use cases.

Drive standardization and interoperability to support broad industry adoption.
Requirements:
What We Need to See:

M.Sc. or equivalent experience in Electrical Engineering or Computer Science from a leading university.

7+ years of experience in the industry, specifically in HW/SW architecture groups.

Familiarity with networking concepts, terms, and Software stack.

Consistent record to quickly adapt to new technologies and investigate emerging areas.

Hand-on programming capabilities in Python, C/C++.

Passion for problem-solving and algorithms research and development.

Strong capability to work independently, collaborate with multi-functional teams, and guide R&D efforts.

Excellent communication and presentation skills.

Ways To Stand Out From The Crowd:

Experience with IEEE 1588 PTP, Synchronous Ethernet, GPS/GNSS, oscillators and clock control algorithms.

Experience in networking hardware and software architecture, ideally with relation to telecom and RAN networks.

Knowledgeable about O-RAN architecture.

An understanding of clocks and signal processing.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8566012
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
Location: Yokne`am
Job Type: Full Time
come be part of our company, the industry leader in ai data centers. we are now innovating the future of data center, and defining the next generation of networking solutions. our team is dedicated to pushing boundaries and overcome the challenges involved in providing high performance data centers. as a network rdma algorithms architect, you'll contribute to our creative and cooperative setting, focusing on the connectx network adapter, spc-x end to end solution and more exciting technologies.
what you'll be doing:
conduct research and analysis on networking solution and end to end algorithms.
work with a creative and experienced team to outline the next generation of our rdma load balance and congestion control algorithms.
work on simulation environment and on real hw systems
engage with other research teams to develop proof of concepts using our technology.
Requirements:
what we need to see:
2+ years of experience.
b.sc. in electrical engineering or computer engineering.
high motivation to learn and explore new fields.
proven problem-solving skills.
excellent interpersonal skills.
knowledge and understanding of compute and networking systems is an advantage.
passion and attention to detail in building with a high focus on building quality.
ways to stand out from the crowd:
passion and love for system architecture, including cpu/gpu/memory/ Storage /networking.
background with ai workloads.
background with networking.
experience in the development of simulation environments.
our company values diversity in employees. we are an equal opportunity employer, not discriminating in hiring or promotions based on various protected characteristics.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593411
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
Location: Yokne`am
Job Type: Full Time
join our growing software architecture research team. the ideal candidate will be conducting cutting-edge research at the intersection of networking, security, and communications, and working alongside top experts in these fields. with incredible resources in networking, you will be able to impact, contribute and advance these domains for scalable accelerated computing. topics include but are not limited to remote direct memory access, hardware offloading and hardware acceleration, distributed accelerator networks, ai for networking and security, Storage management, cryptography accelerators and architecture. with its unique open culture, nvidia is one of the best industry labs to do accelerated computing research.
 
what youll be doing:
develop novel hw architecture models
simulations ranging from specific components to complete data center environments
develop sdks for novel hw capabilities
designing and implementing services, runtime systems, and applications over sdk
evaluate and optimize application performance
partner and collaborate with other forward-thinking team members and external researchers
participate and speak at conferences and events
work with intelligent networking machines powered by ai systems that can learn, reason and interact with other network components
Requirements:
what we need to see:
hold a b.sc. or m.sc. in Computer Science, electrical or computer engineering from a leading university (or equivalent experience).
0-2 years of industry experience (or equivalent) in systems architecture or related fields.
knowledge in networking, operating systems, accelerator programming, and systems
track record of research excellence
good communications skills
 
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593691
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
5 ימים
Location: Yokne`am
Job Type: Full Time
We are seek a forward-thinking, innovative, and experienced Solutions Architect with hands-on experience.

By joining the team, youll take a part in the design and implementation of complex data center solutions, including solution architecture, HW and SW deployment, performance, and return on investment analysis. You will work with software developers and architects to research and analyze new, groundbreaking technologies.

The candidate is expected to have hands-on experience with data center technologies, including operating systems, virtualization and containerization technologies, storage, networking, and infrastructure as code (IaC) tools such as Ansible. A candidate for this position must excel at working independently as well as in a group, in a multidisciplinary, dynamic environment.

What youll be doing:
Implement cutting-edge, end-to-end data center solutions with modern applications.
Architect, develop, deploy, automate, analyze, formalize findings, and release summary reports.
Develop and release reference deployment guides and how-to tutorials.
Support company business units on complex solutions delivery by consulting, providing solutions designs, running POCs, etc.
Research technology trends and deliver best practices for integrating NVIDIA products and technologies.
Requirements:
What we need to see:
Bachelor's degree in Exact Science or higher, or equivalent experience.
5+ years of field experience as a hands-on Solution Engineer, Solution Architect, or Senior DevOps.
Experience with the design and implementation of complex data center solutions.
Vast experience with Linux, Kubernetes, Docker containers, and QEMU/KVM virtualization.
Understanding of Networking layers and services (L2/L3, TCP/IP, Firewalls, DNS, DHCP, etc.).

Ways to stand out from the crowd:
Experience with distributed systems and system analysis, with modern big data frameworks, and with AI / ML frameworks and use cases.
High level of personal responsibility and the ability to optimally prioritize and complete tasks in a fast-paced and frequently changing environment.
Excellent Hebrew/English interpersonal and written communication skills.
Fast ramp up, quick learning, high motivation, independent and effective troubleshooting and problem-solving skills.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8586578
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 9 שעות
חברה חסויה
Location: Yokne`am
Job Type: Full Time
in this role, you will help build and evolve systems that support performance analysis, telemetry, and optimization for large-scale gpu- and cpu-based clusters used in ai and high-performance computing environments. you will work closely with hardware, networking, firmware, and software teams to collect, analyze, and interpret performance data from live systems. this is a fast-paced r&d environment where system behavior and requirements evolve rapidly, requiring adaptable engineering solutions and strong analytical thinking.
what youll be doing:
profile, benchmark, and analyze ai and hpc workloads on gpu and cpu clusters
explore performance characteristics of high-performance networking and collective communications (e.g., nccl, rdma, mpi, roce)
identify performance bottlenecks across networking, compute, memory, and system architecture
develop and enhance performance analysis, benchmarking, and diagnostic tools
define performance TEST plans and establish expectations for new technologies and platforms
collaborate across hardware, firmware, networking, systems, and software teams to provide actionable performance insights
support telemetry collection and data refinement efforts to enable accurate performance analysis
maintain high standards for  data quality, reproducibility, and traceability of performance results
Requirements:
what we need to see:
b.sc. or m.sc. in Computer Science, computer engineering, software engineering, or equivalent experience
5+ years of experience in performance analysis, systems engineering, or hpc/ai infrastructure
demonstrated expertise in performance analysis skills and methodologies
hands-on experience with high-performance networking (rdma, mpi, nccl, congestion control)
strong understanding of  system performance metrics (latency, throughput, resource utilization)
exposure to hardware, firmware, or Embedded telemetry environments
strong analytical, problem-solving, and communication skills
ability to work effectively in cross-functional, fast-paced r&d teams
ways to stand out from the crowd:
knowledge of cuda, nccl internals, and congestion control algorithms
deep system -level understanding of cpu architectures, gpus, hcas, memory, and pcie
experience with nvidia gpus, cuda, and deep learning frameworks such as pytorch or tensorflow
experience with cloud platforms 
proficiency in  Python ; experience with bash and C / C ++ is a plus as well as a strong experience working in  Linux environments
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8594112
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
18/03/2026
חברה חסויה
Location: Tel Aviv-Yafo and Yokne`am
Job Type: Full Time
We are looking for a creative and experienced Senior Firmware Engineer to join our PCIe Firmware team-someone passionate about using artificial intelligence to engineer the foundational hardware of the AI revolution.

As an integral part of our team, you'll architect and implement the core of our next-generation devices. This senior role places you at the center of innovation, where you will have a direct impact on our business and technology by solving sophisticated technical challenges. Its a unique opportunity to shape our technology and empower customers to build the supercomputers and AI fabrics of tomorrow.

What You'll Be Doing:

Lead the architectural design, development, and optimization of cutting-edge PCIe firmware, using AI-driven modeling and insights to deliver exceptional performance.

Serve as a trusted technical expert by investigating, debugging, and resolving challenging PCIe firmware issues for our most important customers.

Collaborate closely with our Chip Design, Verification, Software, and Architecture engineers to find root causes and develop robust, long-term solutions.

Champion the integration of AI-assisted diagnostics and generative AI tools across the entire development lifecycle to boost team productivity and innovation.

Translate customer needs and field data into actionable feedback that directly shapes the future of our products.
Requirements:
What We Need to See:

A degree in Electrical Engineering, Computer Science, Computer Engineering, or equivalent practical experience.

8+ years of significant professional experience in embedded firmware development, with a deep understanding of PCIe.

A strong foundation in computer architecture, operating systems, and object-oriented programming.

Proficiency in scripting languages like Python to automate tasks and workflows.

An innovative approach with a genuine desire to apply AI and machine learning to accelerate firmware development.

Ways to Stand Out from the Crowd:

Track record of applying AI-powered tools like Cursor to accelerate the development lifecycle.

Previous experience in a customer-facing or application engineering role.

Direct, hands-on experience with PCIe switch architecture and its firmware in high-performance applications.

Deep knowledge of hardware verification concepts and tools (e.g., C++, Python, Jenkins).

Extensive knowledge of networking protocols and the Linux operating system.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8584100
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 23 שעות
Location: Yokne`am
Job Type: Full Time
looking for senior software program manager that will be responsible for software programs and projects. the pm should drive planning and execution of fw/sw projects while aligning with corporate priorities and constraints.
 a leading supplier of innovative end-to-end infiniband and ethernet connectivity solutions and services for servers and Storage. we offer best-in-class solutions that include adapter cards, switches, cables, and software to support networking technologies. our products optimize data center performance and deliver industry-leading bandwidth and scalability. in addition, we serve a wide range of markets including high-performance computing, enterprise, data centers, cloud computing, Big Data and web 2.0. we are constantly reinventing ourselves to stay ahead of the market and bring groundbreaking products and services to the industry. our product line is focusing on delivering the most optimized ethernet solutions for industries like media and entertainment as well as any other industry that can benefit from our datastream and tcp/ip acceleration. 
what you'll be doing:
you will manage the networking software programs for nvidia next generation ai  data centers 
responsible to coordinate between all project stakeholders such as marketing, engineering teams in il and around the world, operations, etc. from initial requirements definition through architectural stage, execution, and delivery.
develop and execute feature planning and prioritization of perception capabilities to meet the software programs' needs
identify risks, gaps, and bottlenecks in time, and find resolution with technical leaders and project management
work with product managers, architects, and engineers to ensure consistency with company strategy, commitments, and goals
Requirements:
what we need to see:
b.sc. or m.sc. in Computer Science, electrical engineering, or related field
expert with software project management methodologies and tools
8+ years experience in software project management or leadership
experience in software development over hardware/silicon products
teammate, independent, responsible, capable of multi-tasking, ability to drive people and tasks
excellent verbal and written communication skills with english proficiency
ability and willingness to work in a dynamic environment and flexible hours, with teams all over the world 
ways to stand out from the crowd:
technical orientation, including the ability to conduct technical discussions
experience with tools such as ms excel, ms project, power BI
networking background
experience in multiple groups coordination
familiarity with sw agile concept
if you're creative and autonomous, we want to hear from you! nvidia is committed to fostering a diverse work environment and is proud to be an equal opportunity employer. as we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593858
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
18/03/2026
Location: Yokne`am
Job Type: Full Time
We are looking for a Senior networking test engineer with strong system‑level debugging skills to join our End‑to‑End Verification team. You will work on cutting‑edge Ethernet‑based AI clusters, owning complex issues across hardware, system software and AI workloads. NVIDIA is widely considered to be one of the technology worlds most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!

What youll be doing:

Design and review test and product requirements across the Ethernet / NIC / DPU / Switch portfolio, focusing on large‑scale AI cluster behavior.

Build and maintain realistic customer‑like testbeds, including heterogeneous hardware, OS / driver combinations and complex network fabrics.

Own end‑to‑end cluster troubleshooting: reproduce customer scenarios, triage across the stack and drive issues to root cause and fix.

Read and understand relevant source code to identify defects, validate fixes and improve logging and instrumentation.

Collaborate closely with development teams to debug NCCL, RoCE/RDMA and related networking components using logs, code inspection and targeted experiments.

Define tests and guide the automation team to implement robust suites that produce actionable logs, metrics and traces.

Run Regression, Performance, Functional and Scale testing, analyze results and provide clear, data‑driven reports to stakeholders.

Profile and benchmark deep learning training and inference workloads, correlating model‑level metrics with system and network telemetry to uncover bottlenecks.
Requirements:
What we need to see:

B.A./B.Sc. in Computer Science, Electrical Engineering, or equivalent IT/Network/Systems experience.

5+ years of hands‑on networking or system‑level testing and debugging on Linux.

Strong Linux networking and debugging skills (for example perf, tcpdump, ethtool, iproute2).

Proven production‑grade debugging experience: forming hypotheses, running experiments, and driving issues to root cause under pressure.

Expertise in host‑side NIC validation and tuning (offloads, queues, interrupts, firmware/driver interactions).

Strong knowledge of AI networking libraries (such as NCCL) and protocols (such as RoCE and RDMA), including performance and correctness debugging.

Ability to read and reason about source code (C/C++/Python or similar) and collaborate closely with developers on fixes.

Solid scripting and automation skills with Bash / Python / Ansible for setup, log collection, and experiment orchestration.

Fast learner, familiar with modern AI tools and workflows, able to adapt quickly.

Excellent analytical, problem‑solving and communication skills, with strong ownership and a collaborative mindset.

Ways to stand out from the crowd:

Hands‑on debugging of collective communication libraries (for example NCCL) or large‑scale LLM training / inference clusters.

Experience with large cluster environments (tens to thousands of GPUs or nodes), including incident response and post‑mortem analysis.

Deep expertise in tuning and debugging congestion control and lossless Ethernet for AI workloads (for example DCQCN, ECN, PFC).

Familiarity with NVIDIA networking technologies (for example BlueField / BF3, ConnectX NICs) and their software stack and diagnostics.

Experience debugging issues that span multiple layers (L2/L3, transport, AI frameworks) or contributing to open‑source networking / AI systems.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8584095
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
5 ימים
Location: Yokne`am
Job Type: Full Time
We are looking for a Senior networking test engineer with strong system‑level debugging skills to join our End‑to‑End Verification team. You will work on cutting‑edge Ethernet‑based AI clusters, owning complex issues across hardware, system software and AI workloads. We are widely considered to be one of the technology worlds most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!

What youll be doing:

Design and review test and product requirements across the Ethernet / NIC / DPU / Switch portfolio, focusing on large‑scale AI cluster behavior.

Build and maintain realistic customer‑like testbeds, including heterogeneous hardware, OS / driver combinations and complex network fabrics.

Own end‑to‑end cluster troubleshooting: reproduce customer scenarios, triage across the stack and drive issues to root cause and fix.

Read and understand relevant source code to identify defects, validate fixes and improve logging and instrumentation.

Collaborate closely with development teams to debug NCCL, RoCE/RDMA and related networking components using logs, code inspection and targeted experiments.

Define tests and guide the automation team to implement robust suites that produce actionable logs, metrics and traces.

Run Regression, Performance, Functional and Scale testing, analyze results and provide clear, data‑driven reports to stakeholders.

Profile and benchmark deep learning training and inference workloads, correlating model‑level metrics with system and network telemetry to uncover bottlenecks.
Requirements:
What we need to see:

B.A./B.Sc. in Computer Science, Electrical Engineering, or equivalent IT/Network/Systems experience.

5+ years of hands‑on networking or system‑level testing and debugging on Linux.

Strong Linux networking and debugging skills (for example perf, tcpdump, ethtool, iproute2).

Proven production‑grade debugging experience: forming hypotheses, running experiments, and driving issues to root cause under pressure.

Expertise in host‑side NIC validation and tuning (offloads, queues, interrupts, firmware/driver interactions).

Strong knowledge of AI networking libraries (such as NCCL) and protocols (such as RoCE and RDMA), including performance and correctness debugging.

Ability to read and reason about source code (C/C++/Python or similar) and collaborate closely with developers on fixes.

Solid scripting and automation skills with Bash / Python / Ansible for setup, log collection, and experiment orchestration.

Fast learner, familiar with modern AI tools and workflows, able to adapt quickly.

Excellent analytical, problem‑solving and communication skills, with strong ownership and a collaborative mindset.

Ways to stand out from the crowd:

Hands‑on debugging of collective communication libraries (for example NCCL) or large‑scale LLM training / inference clusters.

Experience with large cluster environments (tens to thousands of GPUs or nodes), including incident response and post‑mortem analysis.

Deep expertise in tuning and debugging congestion control and lossless Ethernet for AI workloads (for example DCQCN, ECN, PFC).

Familiarity with our networking technologies (for example BlueField / BF3, ConnectX NICs) and their software stack and diagnostics.

Experience debugging issues that span multiple layers (L2/L3, transport, AI frameworks) or contributing to open‑source networking / AI systems.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8586994
סגור
שירות זה פתוח ללקוחות VIP בלבד