דרושים » תוכנה » Software Engineer, DOCA

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
 
נאספה מאתר אינטרנט
31/10/2024
חברה חסויה
Job Type: Full Time
We are seeking an enthusiastic individual to join our DOCA NVQual team for Data Processing Units (DPUs) and ConnectX (CX) as a Junior Software Engineer. In this role, you will have the opportunity to learn and contribute to various features related to our product integration for AI cluster systems, helping to develop the next generation of advanced data centers worldwide.

What you'll be doing:
Work on NVQual (our Qualification), a software validation package for our partners to integrate enterprise products into their systems.
Analyze next-generation AI systems from NVIDIA and its partners, focusing on HW integration aspects.
Design and develop system workloads.
Be responsible for the end-to-end development of features within the DOCA NVQual framework.
Requirements:
What we need to see:
B.Sc. or equivalent experience in Electrical Engineering/Computer Science or SW/Computer Engineering.
1-3 years of programming in Python.
Motivated, responsive, and keen on process improvement.
Strong analytical, debugging, and problem-solving skills.

Ways to stand out from the crowd:
Experience with Python.
Knowledge with Linux.
Background in networking or low-level programming.
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
7921759
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
 
נאספה מאתר אינטרנט
31/10/2024
Location: Tel Aviv-Yafo and Ra'anana
Job Type: Full Time
Are you ready to build innovative, next-generation infrastructure for AI supercomputers and data-centers?

We are looking for an excellent Senior Software Developer to work on our next generation cloud platforms. We are seeking an experienced engineer who is deeply technical, hands-on, and has a wide system view. You will craft, build, and deploy high-performance and scalable clouds based on our outstanding GPU/NVLink, ConnectX NICs and Bluefield DPUs.

The team is responsible for developing high-performance computing and cloud infrastructure, for the worlds largest supercomputers and data-centers. The work environment is educational, dynamic, and challenging as our employees are currently working on innovative, next-generation products at the forefront of technology in terms of performance, scalability, and features.

What you'll be doing:
Design and build innovative features for High-Performance Networking of IaaS in both private and public cloud environments, enhancing functionality and performance.
Develop a high speed networking solution that accelerates HPC and AI workloads using our advanced technologies in cloud environments, e.g. DPU, ConnectX and GPU/NVLink.
Take part in developing our pioneering AI supercomputer.
Work closely with other teams on new products or features/improvements of existing products.
Support, maintain and document software functionality.
Requirements:
What we need to see:
BSc in Computer Science or equivalent program.
5+ years of hands-on experience in software development, preferably with C, Python, Rust and Golang.
Wide hands-on experience with high speed network, e.g. IB, RoCE and NVLink.
Experience with Jenkins, GitLab and/or GitHub.
Strong background in designing, implementing, and debugging sophisticated software.
Highly motivated with strong interpersonal skills, ability to work successfully with multi-functional teams, developers, and architects.
Coordinate effectively across organizational boundaries and geographies.
Strong self-initiative, independence, and flexibility to a new technology.

Ways to stand out from the crowd:
R&D background with OpenStack or IaaS of Cloud
Experience with working on open-source projects
Understanding of HPC/AI systems and related technologies
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
7921941
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
 
נאספה מאתר אינטרנט
29/10/2024
Location: More than one
Job Type: Full Time
We are looking for an outstanding Software Verification Engineer for our SW Host Verification group. You will closely work with our Driver and SDK developers, performance team and gain a deep understanding of our Networking products and technologies on top of our BlueField network cards. You will lead feature verification from design through implementation to integration into frameworks, develop robust infrastructure, and work collaboratively with cross-functional teams.

This position offers the opportunity to have real impact in a dynamic, technology-focused company impacting product lines that empower the most advanced data centers in the world. We have crafted a team of outstanding people stretching around the globe, whose mission is to push the frontiers of what is possible today and define the platform for the future of computing.  We are a strong believer in developing our employees and giving them the tools to succeed.  

What youll be doing: 

In this role, you will work closely with developers to test new components including crafting and executing unit, functional, and performance tests. 

Develop a verification environment using Python to qualify the product from functional and performance perspectives. 

Analyze coverage measures to identify verification holes and to show progress toward product development and releases. 

Identify and write all types of coverage measures for stimulus and corner cases. 

Be responsible for verification of system design and software using advanced verification methodologies. 
Requirements:
What we need to see: 

B.Sc. or equivalent experience in Computer Science or SW/Computer Engineering. 

5+ years of work experience in software development. 

Strong programming skills in Python and/or C. 

Knowledge of Networking and protocols.

Strong debugging and analytical skills. 

Creative, motivated, and results-driven worker. 


Ways to stand out from the crowd: 

Background in Operation Systems: Windows, Linux, VMWare.

Knowledge in Virtualization.

Strong background in designing, implementing, and debugging software.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
7917798
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
 
נאספה מאתר אינטרנט
29/10/2024
חברה חסויה
Location: More than one
Job Type: Full Time
We are seeking an outstanding Team Leader for the Driver Verification team. Our team collaborates closely with Driver developers to define and verify the next-generation network technologies of the Data Center. This position offers a great opportunity to lead a team of 4-5 people, with potential for future team expansion, and acquire extensive knowledge in the newest company technologies. We are strong believers in developing our employees and providing them with the tools to succeed.

What youll be doing:

Lead and manage a team of engineers in developing new code and creating and implementing verification tests to assess the functionality of customer features.

This position requires a hands-on approach, where you will actively participate in design, coding, debugging, and the continuous improvement of verification tests and infrastructure alongside your team.

Build and continuously improve verification infrastructure and methodologies to meet the demands of next-generation networking cards.

Collaborate with design engineers to debug tests and develop code, ensuring the delivery of high-quality functionality.

Analyse coverage measures to identify verification gaps and demonstrate progress in product development and releases.
Requirements:
What we need to see:

5+ overall years of experience of networking and protocols, kernel drivers.

2+ years of work experience as a team or technical leader.

Proven managerial skills with the ability to lead and develop a team.

Excellent analytical, debugging, and problem-solving skills with attention to detail.

Strong programming skills in Python, C/C++.

Ability to work independently with other teams, excellent communication skills, self-motivated and well-organized.

B.Sc. in Computer Science or SW/Computer Engineering.

Ways to stand out from the crowd:

Strong knowledge of VMware technologies and ESXi/Linux kernel drivers.

Strong understanding of Windows Driver.

Prior experience in operating systems.

Familiarity with Smart NICs, Storage, and Virtualization.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
7917812
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
 
נאספה מאתר אינטרנט
29/10/2024
Job Type: Full Time
We are looking for Senior Networking (ETH/IB) Solutions Architect to join its Infrastructure Specialst Team. Academic and commercial groups around the world are using our products to revolutionize deep learning and data analytics, and to power data centers. Join the team building many of the largest and fastest AI/HPC systems in the world! We are looking for someone with the ability to work on a dynamic customer focused team that requires excellent interpersonal skills. This role will be interacting with customers, partners and internal teams, to analyze, define and implement large scale Networking projects. The scope of these efforts includes a combination of Networking, System Design and Automation and being the face to the customer!

What you'll be doing:

Primary responsibilities will include building AI/HPC infrastructure for new and existing customers.

Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, real-time monitoring, logging, and alerting.

Engage in and improve the whole lifecycle of servicesfrom inception and design through deployment, operation, and refinement.

Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.

Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.

Worldwide travel is required for on-site visits with customers.
Requirements:
What we need to see:

BS/MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering fields with at least 8 years work or research experience in networking fundamentals, TCP/IP stack, and data center architecture.

8+ years of experience with configuring, testing, validating, and issue resolution of LAN and InfiniBand networking, including use of validation tools for InfiniBand health and performance including medium to large scale HPC/AI network environments.

Knowledge and experience with Linux system administration/dev ops, process management, package management, task scheduling, kernel management, boot procedures, troubleshooting, performance reporting/optimization/logging, and network-routing/advanced networking (tuning and monitoring).

Driven focus on customer needs and satisfaction. Self-motivated with excellent leadership skills including working with customers.

Extensive knowledge of automation, delivering fully automated network provisioning solutions using Ansible, Salt, and Python.

Strong written, verbal, and listening skills in English are essential.

Ways to stand out from the crowd:

Linux or Networking Certifications.

Experience with High-performance computing architectures. Understanding of how job schedulers(Slurm, PBS) work.

Proven knowledge of Python or Bash. Infrastructure Specialist's delivery experience.

luster management technologies knowledge (bonus credit for BCM (Base Command Manager).)

Experience with GPU (Graphics Processing Unit) focused hardware/software.

Experience with MPI (Message Passing Interface).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
7917769
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
 
נאספה מאתר אינטרנט
29/10/2024
חברה חסויה
Location: Tel Aviv-Yafo and Yokne`am
Job Type: Full Time
We are looking for an Asic Design Engineer to join the DFT design team and develop the next generation DFT technologies.

As a design engineer in the DFT design team, you will participate in definition and implementation of our DFT technologies in various projects. This position offers the opportunity to have real impact in a dynamic, technology-focused company impacting Switches, Nic and SoC product lines. We are working closely with a wide range of aspects - chip design, backend, verification and production testing. We are working on the most advanced technologies and sophisticated products, our DFT solutions are unique, innovative, and we are continuously improving and evolving the solutions to meet the challenging goals.

What you'll be doing:

In this position, you will be responsible for defining, coding and integrating sophisticated DFT components into various projects and using state-of-the-art technologies.

As a member of our DFT design team, you will participate in defining various DFT features and improvements, write micro-architecture documents, code design blocks, integrate them into various projects, bring your design to silicon tape-out and silicon testing and production.

Strong collaboration with architects, other design teams, verification, back-end and production testing to accomplish your tasks.
Requirements:
What we need to see:

B.Sc. in Electrical Engineering or Computer engineering or equivalent experience.

1+ years of practical experience.

Exposure to rtl implementation and coding.

Familiarity with verification tools.

Strong debugging, problem solving and analytical skills.

Strong communication and social skills are required.

Ability to work in a geographically diverse team environment.

Self motivated, independent and target oriented.

Ways to stand out from the crowd:

Prior Design or Verification experience.

Experience in developing sophisticated design blocks.

Integration of design elements to large cluster or full-chip.

Experience in working with back-end on area, power and timing closures.

Scripting ability.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
7917985
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
 
נאספה מאתר אינטרנט
31/10/2024
Location: Tel Aviv-Yafo and Ra'anana
Job Type: Full Time
We have been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Its a unique legacy of innovation thats fueled by great technologyand amazing people. We seek an SW Automation Senior Engineer to join our performance verification team. As a Performance Automation Engineer, you will have to work closely with our development and architecture teams responsible for Ethernet AI solution and gain a deep understanding of our products and technologies.

What youll be doing:

Participate in an international team of software engineers working on products for testing our products

Build automated verification environment for high-end hardware and software which is at the forefront of innovation

Identify, analyze, and report software defects, inconsistencies, and other quality issues.

Drive improvements for performance, quality, stability around SW acceleration solutions.

Stay up to date with industry standard methodologies, new technologies, and emerging trends in software verification.
Requirements:
What we need to see:

B.Sc. degree or equivalent experience in Engineering/Computer Science/related field.

4+ years of experience as a Software Engineer.

Experience in developing modern Software Verification System/QA automation and contribution with a real passion for automation.

Strong programming skills in Python.

Expertise in networking & compute infrastructure (servers, switches, routers, TCP/UDP).

Knowledge of how to tune environment for the best performance and deploy infrastructure based on innovate technologies and high-end hardware.

Strong technical abilities, problem-solving skills, coding, and design skills.

Ability to lead feature development, take full ownership and deliver independently.

Linux knowledge: have a general understanding of Linux operation system concepts.

Ways to stand out from the crowd:

Knowledge in performance testing scenarios and creation of performance reports.

Proven experience in a leadership role, with a track record of successfully leading scrums and projects.

Strong communication and interpersonal skills, with the ability to motivate and inspire others.

Knowledge in one or more Networking areas: Ethernet, VLANs, TCP/UDP/IP, QoS, L2-L3 protocols.

Prior software testing experience, with an understanding of Software Testing Tools and Methodologies and Python expertise.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
7921412
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
 
נאספה מאתר אינטרנט
29/10/2024
Job Type: Full Time
We are looking for Senior Cloud Infrastructure/DevOps Solutions Architect to join its our Infrastructure Specialist Team. Academic and commercial groups around the world are using our products to revolutionize deep learning and data analytics, and to power data centers. Join the team building many of the largest and fastest AI/HPC systems in the world! We are looking for someone with the ability to work on a dynamic customer focused team that requires excellent interpersonal skills. This role will be interacting with customers, partners and internal teams, to analyze, define and implement large scale Networking projects. The scope of these efforts includes a combination of Networking, System Design and Automation and being the face to the customer!

What you'll be doing:

Design, implement and maintain large scale HPC/AI clusters with monitoring, logging and alerting Manage Linux job/workload schedulers and orchestration tools.

Develop and maintain continuous integration and delivery pipelines .

Develop tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources.

Deploy monitoring solutions for the servers, network and storage.

Perform troubleshooting bottom up from bare metal, operating system, software stack and application level.

Being a technical resource, develop, re-define and document standard methodologies to share with internal teams Support Research & Development activities and engage in POCs/POVs for future improvements.

Worldwide travel is required for on-site visits with customers.
Requirements:
What we need to see:

BS/MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering fields with at least 8 years work or research experience in networking fundamentals, TCP/IP stack, and data center architecture.

Knowledge of HPC and AI solution technologies from CPUs and GPUs to high speed interconnects and supporting software.

Direct design, implementation and management experience with cloud computing platforms (e.g. AWS, Azure, Google Cloud).

Experience with job scheduling workloads and orchestration technologies such as Slurm, Kubernetes and Singularity.

Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking (sockets, firewalld, iptables, wireshark, etc.) and internals, ACLs and OS level security protection and common protocols e.g. TCP, DHCP, DNS, etc.

Experience with multiple storage solutions such as Lustre, GPFS, zfs and xfs. Familiarity with newer and emerging storage technologies.

Python programming and bash scripting experience.

Comfortable with automation and configuration management tools including Jenkins, Ansible, Puppet/Chef, etc.

Deep knowledge of Networking Protocols like InfiniBand, Ethernet Deep understanding and experience with virtual systems (for example VMware, Hyper-V, KVM, or Citrix).

Strong written, verbal, and listening skills in English are critical.

Ways to stand out from the crowd:

Knowledge of CPU and/or GPU architecture .

Knowledge of Kubernetes, container related microservice technologies.

Experience with GPU-focused hardware/software (DGX, CUDA.)

Background with RDMA (InfiniBand or RoCE) fabrics.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
7917834
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
 
נאספה מאתר אינטרנט
29/10/2024
Job Type: Full Time
We are looking for Senior NIC/DPU Solutions Architect to join its our Infrastructure Specialist Team. Academic and commercial groups around the world are using our products to revolutionize deep learning and data analytics, and to power data centers. Join the team building many of the largest and fastest AI/HPC systems in the world! We are looking for someone with the ability to work on a dynamic customer focused team that requires excellent interpersonal skills. This role will be interacting with customers, partners and internal teams, to analyze, define and implement large scale Networking projects. The scope of these efforts includes a combination of Networking, System Design and Automation and being the face to the customer!

What you'll be doing:

Support GPU, NIC, and networking applications on the converged GPU/DPU/NIC and x86 platforms work on customer production activities, introducing and integrating our networking products to new and existing customers.

Gain customers trust and understand their needs.

Work closely with support cross-functional teams, optimize customer environment, and maintain resiliency.

Help with customer production requirements alongside engineering and product teams.

Address sophisticated and obvious customer issues.

Worldwide travel is required for on-site visits with customers.
Requirements:
What we need to see:

BS/MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering fields with at least 8 years work or research experience in networking fundamentals, TCP/IP stack, and data center architecture.

8+ years of experience with configuring, testing, validating, and issue resolution of LAN and InfiniBand networking, including use of validation tools for InfiniBand health and performance including medium to large scale HPC/AI network environments.

Knowledge and experience with Linux system administration/dev ops, process management, package management, task scheduling, kernel management, boot procedures, solving, performance reporting/optimization/logging, and network-routing/advanced networking (tuning and monitoring).

Driven focus on customer needs and satisfaction. Self-motivated with excellent leadership skills including working with customers.

Strong written, verbal, and listening skills in English are critical.

Ways to stand out from the crowd:

Familiarity with the InfiniBand protocol and RDMA concepts.

Having experience with GPUs, CUDA, GPUDirect or NVIDIAS'a Bluefield Data Processing Unit (DPU).

Experience with high-performance computing architectures. Understanding of how job schedulers(Slurm, PBS) work.

Coding development experience with multiple programming languages (from low-level C programming language to high-level languages such as Python/Bash.)

Cluster management technologies knowledge and bonus credit for BCM (Base Command Manager.)
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
7917721
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
 
נאספה מאתר אינטרנט
31/10/2024
חברה חסויה
Location: Tel Aviv-Yafo and Yokne`am
Job Type: Full Time
Our technology has no boundaries! We are building the worlds most groundbreaking and state of the art compute platforms for the world to use. Its because of our work that scientists, researchers and engineers can advance their ideas. At its core, our visual computing technology not only enables an outstanding computing experience, it is energy efficient! We pioneered a supercharged form of computing loved by the most fast paced computer users in the world - scientists, designers, artists, and gamers.

We are now looking for a motivated engineer to use creativity and problem solving skills to work on the ConnectX network adapter and Bluefield Data Processing Unit with the highly inventive and knowledgeable team.

What you'll be doing:

Be part of the team that defines the Network Interface Card (NIC) and Data Processing Unit (DPU) architecture end to end from the market requirements through design and all product life cycles (post/pre-silicon, on deployments).

Perform research and analysis with our simulation model to define the next generation of our products.

Collaborate with other research teams.

Develop Proof of Concepts using our technology, collaborating with our most sophisticated customers on state-of-the-art innovations.
Requirements:
What we need to see:

B.Sc. in Electrical or Computer Engineering (or equivalent experience).

Programming skills.

Knowledge and understanding of compute and networking systems.

Your can-do attitude and high energy with leadership and excellent interpersonal skills and possess ability to learn complex concepts in a fast pace environment.

Passion and attention to details in design and a high focus on design quality.

Ways to stand out from the crowd:

Experience and love for system architecture, CPU/GPU/Memory/Storage/Networking.

Background with AI workloads.

Experience in development of simulation environments.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
7921623
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
 
נאספה מאתר אינטרנט
29/10/2024
Job Type: Full Time
We are looking for Senior HPC/AI Solutions Architect to join its our Infrastructure Specialists Team. Academic and commercial groups around the world are using our products to revolutionize deep learning and data analytics, and to power data centers. Join the team building many of the largest and fastest AI/HPC systems in the world! We are looking for someone with the ability to work on a dynamic customer focused team that requires excellent interpersonal skills. This role will be interacting with customers, partners and internal teams, to analyze, define and implement large scale AI/HPC projects. The scope of these efforts includes a combination of Networking, System Design and Automation and being the face to the customer!

What Youll Be Doing:

Primary responsibilities will include building robust AI/HPC infrastructure for new and existing customers.

Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, training stability, real-time monitoring, logging, and alerting.

Engage in and improve the whole lifecycle of services from inception and design through deployment, operation, and refinement.

Your primary focus would be on understanding the AI workload and how it interacts with other parts of the system like networking, storage, deep learning frameworks, data cleaning tools, etc.

Help maintain services once they are live by measuring and monitoring progress of AI jobs and helping engineering design solutions for more robust training at scale.

Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.

Worldwide travel is required for on-site visits with customers.
דרישות:
What We Need to See:

BS/MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering fields with at least 8 years work or research experience with Python/ C++ / other software development.

Track record of medium to large scale AI training and understanding of key libraries used for NLP/LLM/VLA training (NeMo Framework, DeepSpeed etc.)

Experience with integration and deployment of software products in production enterprise environments, and microservices software architecture.

You are excited to work with multiple levels and teams across organisations (Engineering, Product, Sales and Marketing team) Capable of working in a constantly evolving environment without losing focus. Ability to multitask in a fast-paced environment.

Driven with strong analytical and problem-solving skills. Strong time-management and organization skills for coordinating multiple initiatives, priorities and implementations of new technology and products into very sophisticated projects.

You are a self-starter with demeanour for growth, passion for continuous learning and sharing findings across the team.

Technical leadership and strong understanding of NVIDIA technologies, and success in working with customers.

Excellent verbal, written communication, and technical presentation skills in English.

Ways to Stand Out from The Crowd:

Experience working with large transformer-based architectures for NLP, CV, ASR or other. Experience running large scale distributed DL training.

Understanding of HPC systems: data center design, high speed interconnect InfiniBand, Cluster Storage and Scheduling related design and/or management experience.

Proven experience with one or more Tier-1 Clouds (AWS, Azure, GCP or OCI) and cloud-native architectures and software.

Expertise with parallel filesystems (e.g. Lustre, GPFS, BeeGFS, WekaIO) and high-speed interconnects (InfiniBand, Omni Path, and Gig-E).

Strong coding and debugging skills, and demonstrated expertise in one or more of the following areas: Machine Learning, Deep Learning, Slurm, Docker/Kubernetes, Kubernetes, Singularity, MPI, MLOps, LLMOps, Ansible, Terraform, and other high-performance AI cluster המשרה מיועדת לנשים ולגברים כאחד.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
7917855
סגור
שירות זה פתוח ללקוחות VIP בלבד