The System & Network Architecture Team within Computing Network Innovation Lab is responsible for next generation computing network architecture research, ranging from network architecture evolution technology to large scale network technology (e.g. Ethernet/IB/RoCE), bus network technology (PCIE/CXL), chiplet interconnect technology and strategic technology planning
DESCRIPTION:
With the explosive growth of AI technologies and the Internet industry, data centers have become digital hubs and infrastructures of the Internet industry in the digital economy era. Computing networks, as core components of data centers, have features such as topology, scalability, throughput, reliability, and latency that directly affect data center functionality and performance. The computing cluster networking lab explores architecture and technological innovation for meeting the challenges of future large-scale AI & HPC data centers. .. The labs mission is to lead our company to achieve differentiated competitiveness in high-performance computing cluster network infrastructure, and to support Huawei's industry-leading computing cluster.
Position Overview
In this role, you will be responsible for several teams of architects, engineers and software developers, all working together to conduct state-of-the-art R&D in system and network architecture. As the group lead, you will guide and mentor the individual team leads, and also conduct hands-on work leading architecture, technology innovation and technical planning and of high-performance computing cluster network, which oriented at AI, HPC, and big data.
Responsibilities
You will perform a wide range of duties including:
Architecture Innovation:
Deeply analyzing the advantages and disadvantages of mainstream network systems, to find opportunities for network architecture innovation;
Insight into the technology developing trend of the high-performance computing network field, and leading the corresponding technology planning.
Exploring new architectures of high-performance computing network systems and efficiently integrating communication library, topology, and network protocol to solve performance bottlenecks.
Technical breakthroughs in networking and cluster routing algorithm:
Analyzes computing cluster network performance and leads the development of computing cluster network technologies
Research and optimize the heterogeneous interconnection topology of key computing chips to continuously improve the key competitiveness of Huawei computing heterogeneous chipsets
Responsible for the research of data center network technologies, and guide network topology design and routing algorithm development
Group leadership:
Lead the development of a comprehensive system architecture for AI Fabric and HPC Fabric solutions
Manage and mentor highly skilled team leaders, to ensure that the group operates together in pursuit of common goal
Foster a collaborative and innovative work environment
Provide technical guidance and support to team members
Collaborate closely with cross-functional teams internationally, including hardware,
software, and ucode design teams, to ensure alignment of architectural decisions with
product and platform common objectives
Initiate and supervise collaborations with top academic researchers in Israel and abroad
Stay up to date with emerging technologies and industry trends in AI, HPC and big data industries.
Requirements: At least 10 years of hands-on experience in system architecture design, or equivalent research experience
Demonstrated experience in leading R&D team
Familiarity with high-performance computing cluster services and system architectures, such as AI, HPC and big data.
In-depth understanding of computer networks, communication libraries, and design of AI or HPC cluster networks.
This position is open to all candidates.