We seek a highly motivated Network Performance Exploration Engineer to join our team of experts and help shape the foundational infrastructure for the AI revolution. Our next-generation networking systems are at the forefront of connecting and powering the world's most advanced AI clusters. As a key member of our architecture team, you will be responsible for exploring and identifying critical network optimization opportunities across our entire hardware and software stack, analyzing how system-level changes impact application-level performance.
What Youll Be Doing:
Explore and validate end-to-end application performance, defining comprehensive test plans and critical metrics to identify optimization opportunities in both hardware and software.
Establish and maintain a comprehensive database of benchmark results, tracking performance across releases to drive data-informed decisions.
Conduct deep-dive analysis into communication libraries (like NCCL), system software, and hardware configurations to investigate performance characteristics, validate architectural theories, and identify bottlenecks.
Provide critical performance data to correlate and enhance simulation tools, ensuring our models accurately predict real-world hardware behavior.
Analyze application-level traffic patterns (e.g., LLMs) on our advanced networking fabrics to identify hardware and software optimization opportunities and tune system parameters.
Lead Proof-of-Concept (POC) projects to prototype and evaluate potential hardware and software optimizations and their impact on application performance.
Requirements: B.Sc. or M.Sc. degree in Computer Science, Computer Engineering, or Electrical Engineering, or equivalent experience.
5+ years of relevant industry or research experience in high-performance computing, computer architecture, or computer networks.
Hands-on programming skills in Python and/or C/C++ for system analysis, automation, and customizing benchmarks.
Excellent understanding of large-scale system behavior and the effect of distributed computing workloads on network and system performance.
Proven experience in performance analysis, benchmarking, and identifying system bottlenecks.
Exceptional analytical, problem-solving, and systems-thinking skills, with the ability to dive deep into complex software and hardware interactions.
Ability to thrive in a a fast-paced, dynamic environment and work concurrently with multiple cross-functional teams.
Ways To Stand Out From The Crowd:
Deep understanding of and hands-on experience with communication libraries such as NCCL, UCX, or MPI.
Direct experience debugging or modifying the source code of a major communication library.
Expertise in the architecture and system-level requirements of large-scale, distributed Deep Learning workloads (e.g., LLMs).
Expertise in high-performance network protocols (Ethernet, InfiniBand, RoCE) and interconnect technologies like NVLink.
Familiarity with the PyTorch ecosystem, especially for distributed workloads.
This position is open to all candidates.