Senior Software Advanced Developer

עדכון קורות החיים לפני שליחה

8465368

שירות זה פתוח ללקוחות VIP בלבד

משרות דומות שיכולות לעניין אותך

דיווח על תוכן לא הולם או מפלה

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

18/11/2025

VLA Deep Learning Engineer, End-To-End Autonomous Driving

חברה חסויה

Location: More than one

Tel Aviv-Yafo

Ra'anana

Yokne`am

Job Type: Full Time

We have been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Its a unique legacy of innovation thats fueled by great technologyand amazing people. Today, were tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing whats never been done before takes vision, innovation, and the worlds best talent. Youll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.

We are in need of skilled engineers to join our autonomous driving team to invent, execute, and deploy pioneering end-to-end autonomous driving systems. Our strategy has progressed from AI 1.0 constructing a driver from the ground up to AI 2.0 training an intelligent agent to drive. This is achieved by developing LLMs, VLMs, and VLAs to offer exceptional reasoning, planning capabilities, and interaction with the driving system for autonomous driving and general robotics. Lets innovate the future of autonomytogether!

What you will be doing:

Build and train innovative large-scale modelsincluding generative, imitation, and reinforcement learningto improve the planning and reasoning capabilities of our driving systems.

Explore novel data generation and collection strategies to improve diversity and quality of training datasets. Develop, pre-train, and optimize LLM/VLM/VLA models for autonomous driving and robotics applications.

Collaborate cross-functionally to deploy and integrate AI models into vehicle firmware.

Deliver production-quality, safety-critical software that meets performance, safety, and reliability standards.

Requirements:
What we need to see:

PhD or Master's degree with equivalent experience.

8+ years of experience.

Hands-on experience training LLMs/VLMs/VLAs from scratch, or a proven record as a top-tier ML engineer/researcher passionate about autonomous systems.

Strong programming skills in Python and proficiency with major deep learning frameworks. Basic familiarity with C++ for model deployment and integration in safety-critical systems.

Comprehensive grasp of current deep learning structures and improvement methods. Consistent track record of deploying production-grade ML models for self-driving, robotics, or related fields at scale.

Ways to stand out from the crowd:

Experience developing and shipping LLM/VLM/VLA solutions for autonomous vehicles or general robotics products.

Publications, contributions to open-source projects, or victories in competitions connected to LLM/VLM/VLA systems.

Profound comprehension of behavior and motion planning in real-world autonomous vehicle (AV) applications.

Experience building and training large-scale datasets and models and/or training agents with reinforcement learning.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8418914

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

18/11/2025

Senior Software Research Architect, AI Networking

חברה חסויה

Location: Tel Aviv-Yafo

Job Type: Full Time

We are in search of a Senior Software Architect- a creative, forward-thinking, and practical researcher to improve the framework for widespread LLM learning and prediction. As part of our dynamic E2E Architecture group, you will design and optimize systems driving generative AI workloads, working at the intersection of software and hardware on some of the most advanced GPU clusters worldwide. You will define how AI models are deployed and scaled in production using the NVIDIA Spectrum-X Networking Platform, influencing decisions from inter-node communication and compute scheduling to system-level optimization. This is an opportunity to collaborate with best-in-class engineers and researchers and shape the future of generative AI in real-world applications. Your work will make a lasting impact by enabling generative AI technologies to reach real-world applications and improve global computing capabilities.

What Youll Be Doing:

Lead research and development of end-to-end networking solutions for distributed AI training and inference at scale, with a focus on job completion time, failure resiliency, telemetry, scheduling, and placement.

Analyze current deployments, develop prototypes, and recommend architectural improvements.

Stay abreast of the latest research; become the teams authority in emerging networking techniques and technologies.

Design, simulate, and validate new systems using novel, scalable network simulator NSX.

Develop and test prototypes on large-scale GPU clusters (e.g., Israel-1).

Collaborate across hardware, firmware, and software teams to translate ideas into real networking product features.

Publish patents and present research at leading conferences.

Requirements:
What We Need to See:

M.Sc. or PhD (preferred) in Computer Science, Electrical/Computer Engineering, or related fieldor B.Sc. with research experience and publications.

5+ years of relevant experience.

Deep expertise in networking and communication internals (NCCL, RDMA, congestion control, routing).

Strong software engineering skills in C++ and/or Python.

Excellent system-level design and problem-solving abilities.

Outstanding communication and collaboration skills across technical domains.

Ways to Stand Out from the Crowd:

Proven passion for solving sophisticated technical problems and delivering impactful solutions.

Record of publications in top-tier conferences.

Experience in designing and building large-scale AI training clusters.

Post-PhD research experience

Practical understanding of deep learning systems, GPU acceleration, and AI model execution flows.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8418932

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

לפני 2 שעות

Senior System Software Performance Engineer

חברה חסויה

Location: Tel Aviv-Yafo

Job Type: Full Time

our company has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Its a unique legacy of innovation thats fueled by great technologyand amazing people. We seek an Senior SW Performance Engineer to join our performance verification team. As a Performance Engineer at our company, you will have to work closely with our companys development and architecture teams responsible for Ethernet AI solution and gain a deep understanding of our company products and technologies.
What youll be doing:
Participate in an international team of software engineers working on products for testing our company products
Build automated verification environment for high-end hardware and software which is at the forefront of innovation
Identify, analyze, and report software defects, inconsistencies, and other quality issues.
Drive improvements for performance, quality, stability around SW acceleration solutions.
Stay up to date with industry standard methodologies, new technologies, and emerging trends in software verification.

Requirements:
B.Sc. degree or equivalent experience in Engineering/Computer Science/related field
4+ years of experience as a Software Engineer
Strong programming skills in Python
Expertise in networking & compute infrastructure (servers, switches, routers, TCP/UDP).
Knowledge of how to tune environment for the best performance and deploy infrastructure based on innovate technologies and high-end hardware.
Strong technical abilities, problem-solving skills, coding, and design skills
Ability to lead feature development, take full ownership and deliver independently
Linux knowledge: have a general understanding of Linux operation system concepts
Ways to stand out from the crowd:
Knowledge in performance testing scenarios and creation of performance reports.
Proven experience in a leadership role, with a track record of successfully leading scrums and projects
Strong communication and interpersonal skills, with the ability to motivate and inspire others.
Knowledge in one or more Networking areas: Ethernet, VLANs, TCP/UDP/IP, QoS, L2-L3 protocols
Prior software testing experience, with an understanding of Software Testing Tools and Methodologies and Python expertise.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8465384

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

לפני 3 שעות

Software Advanced Development Engineer

חברה חסויה

Location: Tel Aviv-Yafo and Yokne`am

Job Type: Full Time

The company Networking Advanced Development Software team develops new groundbreaking technologies to enable new market shares for the company and tighten customer relationships. These are emerging technologies in networking and distributed computing for the booming AI factories and data centers. They span areas such as AI neural networks, Deep Learning, High Performance Computing (HPC), Storage, Cloud, SW Defined Network, Network Function Virtualization, 5G NR and more. We develop the solutions top-down, all the way from application behavioral analysis, to architecture definition and down to the implementation, using the world-leading company devices. The development traverses any needed component - application SW, middleware SW, OS kernel subsystems, device drivers, embedded SW (Firmware) and CUDA GPU. We collaborate with partners and key customers in the analysis processes and engage with open source communities introducing our leading features.
What youll be doing:
Design and implement solutions throughout all layers from high level application, OS and driver subsystem to firmware
Work on impactful projects involving state-of-the-art high-performance computing hardware and software
Provide insight and technical guidance and collaborate with peers from across the company - including software architecture, chip architecture, and engineering departments to improve our future technology
Collaborate with our company partners and customers.

Requirements:
B.Sc. in Computer Science, Electrical Engineering, Computer Engineering, or a related field
Understanding of multi core hardware, operating systems design, concurrency, virtual memory, caching, interrupts, device drivers, real-time
Programming skills
Ability to learn complex concepts in a fast pace environment.
A teammate with a can-do attitude, high energy and excellent interpersonal skills
Ways to stand out from a crowd:
Familiarity with networking protocols
Experience with open-source projects (coursework, personal, or contributions)
Working in a fast-paced and dynamic environment.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8465199

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

לפני 3 שעות

Senior Manager, Software Advanced Development

חברה חסויה

Location: Tel Aviv-Yafo and Yokne`am

Job Type: Full Time

The company Networking Advanced Development Software team develops new groundbreaking technologies to enable new market shares for the company and tighten customer relationships. These are emerging technologies in networking and distributed computing for the booming AI factories and data centers. They span areas such as AI neural networks, Deep Learning, High Performance Computing (HPC), Storage, Cloud, SW Defined Network, Network Function Virtualization, 5G NR and more. We develop the solutions top-down, all the way from application behavioral analysis, to architecture definition and down to the implementation, using the world-leading company devices. The development traverses any needed component - application SW, middleware SW, OS kernel subsystems, device drivers, embedded SW (Firmware) and CUDA GPU. We collaborate with partners and key customers in the analysis processes and engage with open source communities introducing our leading features.
What youll be doing:
Lead a team of 5 engineers in the advanced technologies development
Design and implement solutions throughout all layers from high level application, OS and driver subsystem to firmware
Work on impactful projects involving state-of-the-art high-performance computing hardware and software
Provide insight and technical guidance and collaborate with peers from across the company - including software architecture, chip architecture, and engineering departments to improve our future technology
Collaborate with our company partners and customers.

Requirements:
B.Sc. in Computer Science, Electrical Engineering, Computer Engineering, or a related field, or equivalent practical experience
10+ overall years of industry experience in system programming or related fields and 3+ years of experience leading a team
Understanding of multi core hardware, operating systems design, concurrency, virtual memory, caching, interrupts, device drivers, real-time
Excellent programming skills
Ability to learn complex concepts in a fast pace environment
A teammate with a can-do attitude, high energy and excellent interpersonal skills
Ways to stand out from a crowd:
Familiarity with networking protocols
Experience with open-source projects (coursework, personal, or contributions)
Working in a fast-paced and dynamic environment.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8465195

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

18/11/2025

Senior Software Engineer, AI Infra

חברה חסויה

Location: Tel Aviv-Yafo

Job Type: Full Time

We have improved AI infrastructure by merging GPU virtualization with Kubernetes-native tech to power innovative AI factories. We aim to speed up enterprise AI projects with smart orchestration, and scalability for AI workloads. Seeking a skilled Senior Software Engineer for our Infrastructure Group to innovate AI technology. The Infrastructure Group is tasked with composing and evolving the core systems responsible for thousands of GPUs and nodes driving enterprise AI. We invent the foundation that facilitates elastic, secure, and observable AI operations at extensive scale. We are seeking engineers who are passionate about distributed systems, modern cloud-native infrastructure, and AI performance optimization.

What youll be doing:

Crafting and developing enterprise-grade systems with a strong focus on scalability, reliability, and performance.

Building and optimizing microservices-based architectures using Kubernetes and cloud-native technologies.

Collaborating closely with backend engineers, product managers, and other partners to deliver impactful solutions.

Writing clean, maintainable, and testable code in Go, contributing to our CI/CD pipelines.

Conducting code and build reviews to uphold high-quality standards and mentor team members.

Leading the development and implementation of advanced identity management systems that secure our innovative AI and GPU cloud.

Developing scalable multi-tenant solutions that allow our diverse clientele to harness the power of our platforms securely and efficiently.

Collaborating with multi-functional teams to integrate identity and access management features seamlessly into our products, from cloud services to edge computing devices.

Requirements:
What we need to see:

B.Sc. in Computer Science or a related field (or equivalent experience).

5+ years of experience

Experience in backend software development, including system design and architecture.

Proficiency in at least one backend programming language (Go preferred).

Strong knowledge in microservices architecture, RESTful APIs, and relational databases.

Proficient knowledge of security guidelines and experience applying them in large-scale systems.

Expertise in implementing OAuth, OIDC, SAML, and other modern authentication protocols - Advantage

Ways to stand out from the crowd:

Expertise in Kubernetes internals and advanced cloud-native technologies.

Experience working in Linux environments with knowledge of networking, security, and virtualization.

Contributions to open-source projects or active participation in tech communities.

Agile approach and familiarity with standard methodologies.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8418975

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

לפני 1 שעות

Senior Software Architect, GPU Networking

חברה חסויה

Location: More than one

Tel Aviv-Yafo

Ra'anana

Yokne`am

Job Type: Full Time

our company has been defining computer graphics, PC gaming, and accelerated computing for more than 25 years. As a Senior Software Architect in the GPU Networking Architecture team, you will define Software Defined Networking architectural solutions. You'll also be part of a team of specialists who span across numerous technological fields related to the modern data center, such as distributed AI and deep learning systems, Networking Operating Systems, Virtualization, Storage, and more.
What youll be doing:
Define system and software architecture for Software Defined Networking (SDN) of ground breaking emerging AI networks which involves innovative software and hardware.
Be an active member in setting the use-cases and metrics for Monitoring Complex High-speed Networks Control-plane.
Work closely with various groups within our company to bring AI network technologies to reality, including GPU and Switch HW and SW teams, Product as well as fellow architects.

Requirements:
Hold a B.Sc., M.Sc. or Ph.D. in Computer Science, Electrical or Computer Engineering (or equivalent experience).
8+ years of proven experience as a software architect.
Proven Networking experience
A teammate with a can-do attitude, high energy and excellent interpersonal skills.
Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
Ways to stand out from the crowd:
SDN definition/development experience
InfiniBand hands-on experience
Experience in Kubernetes.
Stellar communication skills.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8465511

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

לפני 2 שעות

Research Software Engineer, Advanced Development

חברה חסויה

Location: Tel Aviv-Yafo

Job Type: Full Time

we are searching for world-class Software Engineers to join our growing software architecture Research team. The ideal candidate will be conducting cutting-edge research at the intersection of Networking, Security, Communications, AI and Distributed GPU computing, and working alongside top experts in these fields. With incredible resources in networking and compute, you will be able to impact, contribute and advance these domains for scalable accelerated computing. Topics include but are not limited to remote direct memory access, hardware offloading and hardware acceleration, distributed accelerator networks, AI for networking and security, storage management, cryptography accelerators and architecture, LLM network traffic optimizations and AI collectives. With its unique open culture, we are one of the best industry labs to do Accelerated Computing research.
What youll be doing:
Enhance our company's GPU Networking offerings for accelerating AI workloads, such as our company Dynamo or our company NIXL.
Develop and evaluate new technologies, innovations relevant for scientific, Deep Learning, and data-intensive workloads.
Create proof-of-concept to evaluate and drive such new technologies.
Work on impactful projects involving state-of-the-art high-performance computing software and hardware.
Designing and implementing services, runtime systems, and applications over SDK
Partner and collaborate with other forward-thinking team members and external researchers.

Requirements:
Hold a B.Sc. or M.Sc. or Ph.D. in Computer Science, Electrical or Computer Engineering from a leading university.
0-2 years of industry experience (or equivalent) in system programming or related fields.
Background in algorithm design, system programming, and computer architecture.
Strong programming and software development skills.
A teammate with a can-do attitude, high energy and excellent interpersonal skills.
Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
Ways to stand out from the crowd:
Proven research track record.
Experience and passion for system architecture, CPU/GPU/Memory/Storage/Networking.
Stellar communication skills.
Knowledge in Deep Learning frameworks and AI communication libraries (NCCL, UCX, MPI and equivalents).

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8465302

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

3 ימים

Senior Performance and Scale Engineer - Distributed LLM Inference

חברה חסויה

Location: Ra'anana

Job Type: Full Time

This role needs a seasoned engineer that thinks creatively, adapts to rapid change, and has the willingness to learn and apply new technologies. You will be joining a vibrant open source culture, and helping promote performance and innovation in this company engineering team. The border mission of the Performance and Scale team is to establish performance and scale leadership of the company product and cloud services portfolio. The scope includes component level, system and solution analysis and targeted enhancements. The team collaborates with engineering, product management, product marketing and customer support as well as our companys hardware and software ecosystem partners.
At our company, our commitment to open source innovation extends beyond our products - its embedded in how we work and grow. workers embrace change especially in our fast-moving technological landscape and have a strong growth mindset. That's why we encourage our teams to proactively, thoughtfully, and ethically use AI to simplify their workflows, cut complexity, and boost efficiency. This empowers our associates to focus on higher-impact work, creating smart, more innovative solutions that solve our customers' most pressing challenges.
What you will do:
Define and track key performance indicators (KPIs) and service level objectives (SLOs) for large-scale, distributed LLM inference services in Kubernetes/OpenShift
Participate in the performance roadmap for distributed inference, including multi-node and multi-GPU scaling studies, interconnect performance analysis, and competitive benchmarking
Formulate performance test plans and execute performance benchmarks to characterize performance, drive improvements, and detect performance issues through data analysis and visualization
Develop and maintain tools, scripts, and automated solutions that streamline performance benchmarking tasks.
Collaborate with cross-functional engineering teams to identify and address performance issues.
Partner with DevOps to bake performance gates into GitHub Actions/OpenShift Pipelines.
Explore and experiment with emerging AI technologies relevant to software development, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.
Triage field and customer escalations related to performance; distill findings into upstream issues and product backlog items.
Publish results, recommendations, and best practices through internal reports, presentations, external blogs, and official documentation.
Represent the team at internal and external conferences, presenting key findings and strategies.

Requirements:
3+ years in performance engineering or systems‑level software design
Hands‑on expertise with Kubernetes/OpenShift
Basic understanding of AI and LLMs fundamentals
Fluency in Python (data & ML), strong Bash/Linux skills
Exceptional communication skills - able to translate raw performance numbers into customer value and executive narratives
Commitment to open‑source values
The following is considered a plus:
Masters or PhD in Computer Science, AI, or a related field
History of upstream contributions and community leadership
Hands-on experience with Kubernetes/OpenShift
Familiarity with performance observability stacks such as perf/eBPF‑tools, Nsight Systems, PyTorch Profiler, among others
Hands-on experience with modern LLM inference server stack (e.g., vLLM, TensorRT-LLM, TGI, Triton Inference Server).

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8463103

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

לפני 3 שעות

Senior Network Performance Exploration Engineer

חברה חסויה

Location: Tel Aviv-Yafo and Yokne`am

Job Type: Full Time

We seek a highly motivated Network Performance Exploration Engineer to join our team of experts and help shape the foundational infrastructure for the AI revolution. Our next-generation networking systems are at the forefront of connecting and powering the world's most advanced AI clusters. As a key member of our architecture team, you will be responsible for exploring and identifying critical network optimization opportunities across our entire hardware and software stack, analyzing how system-level changes impact application-level performance.
What Youll Be Doing:
Explore and validate end-to-end application performance, defining comprehensive test plans and critical metrics to identify optimization opportunities in both hardware and software.
Establish and maintain a comprehensive database of benchmark results, tracking performance across releases to drive data-informed decisions.
Conduct deep-dive analysis into communication libraries (like NCCL), system software, and hardware configurations to investigate performance characteristics, validate architectural theories, and identify bottlenecks.
Provide critical performance data to correlate and enhance simulation tools, ensuring our models accurately predict real-world hardware behavior.
Analyze application-level traffic patterns (e.g., LLMs) on our advanced networking fabrics to identify hardware and software optimization opportunities and tune system parameters.
Lead Proof-of-Concept (POC) projects to prototype and evaluate potential hardware and software optimizations and their impact on application performance.

Requirements:
B.Sc. or M.Sc. degree in Computer Science, Computer Engineering, or Electrical Engineering, or equivalent experience.
5+ years of relevant industry or research experience in high-performance computing, computer architecture, or computer networks.
Hands-on programming skills in Python and/or C/C++ for system analysis, automation, and customizing benchmarks.
Excellent understanding of large-scale system behavior and the effect of distributed computing workloads on network and system performance.
Proven experience in performance analysis, benchmarking, and identifying system bottlenecks.
Exceptional analytical, problem-solving, and systems-thinking skills, with the ability to dive deep into complex software and hardware interactions.
Ability to thrive in a a fast-paced, dynamic environment and work concurrently with multiple cross-functional teams.
Ways To Stand Out From The Crowd:
Deep understanding of and hands-on experience with communication libraries such as NCCL, UCX, or MPI.
Direct experience debugging or modifying the source code of a major communication library.
Expertise in the architecture and system-level requirements of large-scale, distributed Deep Learning workloads (e.g., LLMs).
Expertise in high-performance network protocols (Ethernet, InfiniBand, RoCE) and interconnect technologies like NVLink.
Familiarity with the PyTorch ecosystem, especially for distributed workloads.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8465097

שירות זה פתוח ללקוחות VIP בלבד