Senior Software Engineer, AI-Driven Performance Engineering

עדכון קורות החיים לפני שליחה

8321836

שירות זה פתוח ללקוחות VIP בלבד

משרות דומות שיכולות לעניין אותך

דיווח על תוכן לא הולם או מפלה

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

לפני 10 שעות

Senior System Software Engineer, NCCL - Partner Enablement

חברה חסויה

Location: Tel Aviv-Yafo and Yokne`am

Job Type: Full Time

we are leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.
Come work for the team that brought to you NCCL, NVSHMEM & GPUDirect. Our GPU communication libraries are crucial for scaling Deep Learning and HPC applications! We are looking for a motivated Partner Enablement Engineer to guide our key partners and customers with NCCL. Most DL/HPC applications run on large clusters with high-speed networking (Infiniband, RoCE, Ethernet). This is an outstanding opportunity to get an end to end understanding of the AI networking stack. Are you ready for to contribute to the development of innovative technologies and help realize our company's vision?
What you will be doing:
Engage with our partners and customers to root cause functional and performance issues reported with NCCL
Conduct performance characterization and analysis of NCCL and DL applications on groundbreaking GPU clusters
Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP, etc.)
Guide our customers and support teams on HPC knowledge and standard methodologies for running applications on multi-node clusters
Document and conduct trainings/webinars for NCCL
Engage with internal teams in different time zones on networking, GPUs, storage, infrastructure and support.

Requirements:
B.S./M.S. degree in CS/CE or equivalent experience with 5+ years of relevant experience. Experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM)
Excellent C/C++ programming skills, including debugging, profiling, code optimization, performance analysis, and test design
Experience working with engineering or academic research community supporting HPC or AI
Practical experience with high performance networking: Infiniband/RoCE/Ethernet networks, RDMA, topologies, congestion control
Expert in Linux fundamentals and a scripting language, preferably Python
Familiar with containers, cloud provisioning and scheduling tools (Docker, Docker Swarm, Kubernetes, SLURM, Ansible)
Adaptability and passion to learn new areas and tools
Flexibility to work and communicate effectively across different teams and timezones
Ways to stand out from the crowd:
Experience conducting performance benchmarking and developing infrastructure on HPC clusters. Prior system administration experience, esp for large clusters. Experience debugging network configuration issues in large scale deployments
Familiarity with CUDA programming and/or GPUs. Good understanding of Machine Learning concepts and experience with Deep Learning Frameworks such PyTorch, TensorFlow
Deep understanding of technology and passionate about what you do.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8321595

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

2 ימים

Senior VLSI CAD and AI Automation Engineer

חברה חסויה

Location: Tel Aviv-Yafo and Yokne`am

Job Type: Full Time

we are at the forefront of AI-driven innovation in VLSI design automation. Join us to shape the future of semiconductor design with cutting-edge AI tools and make a significant impact in a collaborative, high-performance environment. Are you ready to push the boundaries of whats possible in VLSI CAD? Come be part of our pioneering team!
What you'll be doing:
You will be responsible for developing and integrating advanced CAD solutions and automation flows using AI and machine learning for VLSI design, verification, and implementation.
Work closely with design, verification, and CAD teams to identify areas for improving VLSI workflows using advanced tools and methods.
Research, prototype, and deploy AI-based algorithms.
Develop and maintain scripts and automation infrastructure to enable seamless adoption of AI tools in the VLSI design process.
Continuously review emerging AI technologies and methodologies to keep our CAD environment up-to-date.
Provide technical support and training to engineering teams on AI-enabled CAD flows and best practices.

Requirements:
B.Sc./M.Sc. in Electrical Engineering, Computer Engineering, Computer Science, or equivalent experience.
5+ years of experience in VLSI CAD tool development, with a strong focus on integrating AI/ML techniques into EDA workflows.
Proficiency in Python and at least one AI/ML framework (such as TensorFlow, PyTorch, or scikit-learn).
Hands-on experience with VLSI physical design and familiarity with industry-standard EDA tools (e.g., Synopsys, Cadence).
Knowledge of data preprocessing, feature engineering, and model deployment as applied to VLSI design challenges.
Experience developing and maintaining automation scripts (Python, Perl, Tcl, Make).
Strong analytical skills in evaluating the impact of AI solutions on design quality, performance, and productivity.
Excellent communication skills and the ability to work cross-functionally in a fast-paced environment.
Self-motivation, attention to detail, and a track record of delivering robust solutions to production.
Ways to stand out from the crowd:
Demonstrated experience deploying AI/ML models in production VLSI CAD environments.
Contributions to open-source AI/EDA projects or publications in relevant conferences/journals.
Deep understanding of VLSI design challenges-such as timing closure, power optimization, or yield enhancement-and how AI can address them.
Experience with cloud-based or distributed compute environments for large-scale AI training and inference.
Strong ownership, curiosity, and a passion for continuous learning and innovation.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8318297

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

לפני 10 שעות

Senior HPC Performance Engineer

חברה חסויה

Location: Tel Aviv-Yafo and Yokne`am

Job Type: Full Time

we are leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.
Come work for the team that brought to you NCCL, NVSHMEM & GPUDirect. Our GPU communication libraries are crucial for scaling Deep Learning and HPC applications! We are looking for a motivated Performance engineer to influence the roadmap of our communication libraries. The DL and HPC applications of today have a huge compute demand and run on scales which go up to tens of thousands of GPUs. The GPUs are connected with high-speed interconnects (eg. NVLink, PCIe) within a node and with high-speed networking (eg. Infiniband, Ethernet) across the nodes. Communication performance between the GPUs has a direct impact on the end-to-end application performance; and the stakes are even higher at huge scales! This is an outstanding opportunity for someone with HPC and performance background to advance the state of the art in this space. Are you ready for to contribute to the development of innovative technologies and help realize our company's vision?
What you will be doing:
Conduct in-depth performance characterization and analysis on large multi-GPU and multi-node clusters.
Study the interaction of our libraries with all HW (GPU, CPU, Networking) and SW components in the stack
Evaluate proof-of-concepts, conduct trade-off analysis when multiple solutions are available
Triage and root-cause performance issues reported by our customers
Collect a lot of performance data; build tools and infrastructure to visualize and analyze the information
Collaborate with a very dynamic team across multiple time zones.

Requirements:
M.S. (or equivalent experience) or PHD in Computer Science, or related field with relevant performance engineering and HPC experience
3+ yrs of experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM)
Experience conducting performance benchmarking and triage on large scale HPC clusters
Good understanding of computer system architecture, HW-SW interactions and operating systems principles (aka systems software fundamentals)
Implement micro-benchmarks in C/C++, read and modify the code base when required
Ability to debug performance issues across the entire HW/SW stack. Proficient in a scripting language, preferably Python
Familiar with containers, cloud provisioning and scheduling tools (Kubernetes, SLURM, Ansible, Docker)
Adaptability and passion to learn new areas and tools. Flexibility to work and communicate effectively across different teams and timezones
Ways to stand out from the crowd:
Practical experience with Infiniband/Ethernet networks in areas like RDMA, topologies, congestion control
Experience debugging network issues in large scale deployments
Familiarity with CUDA programming and/or GPUs
Experience with Deep Learning Frameworks such PyTorch, TensorFlow.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8321604

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

לפני 10 שעות

Senior Software Architect - Deep Learning and HPC Communications

חברה חסויה

Location: Tel Aviv-Yafo and Yokne`am

Job Type: Full Time

we are leading groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU -- our invention -- serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables groundbreaking creativity and discovery, and powers inventions that were once considered science fiction, including artificial intelligence to autonomous cars. Come work for the team that brought to you NCCL, NVSHMEM & GPUDirect. Our GPU communication libraries are crucial for scaling Deep Learning and HPC applications! We're seeking a Senior Software Architect to help co-design next-gen data center platforms and scalable communications software.
DL and HPC applications have a huge compute demands and already run at scales of up to tens of thousands of GPUs. GPUs are connected with high-speed interconnects (e.g. NVLink, PCIe) within a node and with high-speed networking (e.g. InfiniBand, Ethernet) across nodes. Efficient and fast communication between GPUs directly impacts end-to-end application performance. This impact continues to grow with the increasing scale of next generation systems. This is an outstanding opportunity to advance the state-of-the-art, break performance barriers, and deliver platforms the world has never seen before. Are you ready to build the new and innovative technologies that will help realize our company's vision?
What you will be doing:
Investigate opportunities to improve communication performance by identifying bottlenecks in today's systems.
Design and implement new communication technologies to accelerate AI and HPC workloads.
Explore innovative solutions in HW and SW for our next generation platforms as part of co-design efforts involving GPU, Networking, and SW architects.
Build proofs-of-concept, conduct experiments, and perform quantitive modeling to evaluate and drive new innovations.
Use simulation to explore performance of large GPU clusters (think scales of 100s of 1000s of GPUs).

Requirements:
M.S./Ph.D. degree in CS/CE or equivalent experience.
5+ years of relevant experience.
Excellent C/C++ programming and debugging skills.
Experience with parallel programming models (MPI, SHMEM) and at least one communication runtime (MPI, NCCL, NVSHMEM, OpenSHMEM, UCX, UCC).
Deep understanding of operating systems, computer and system architecture.
Solid in fundamentals of network architecture, topology, algorithms, and communication scaling relevant to AI and HPC workloads.
Strong experience with Linux.
Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
Ways to stand out from the crowd:
Expertise in related technology and passion for what you do. Experience with CUDA programming and our company GPUs. Knowledge of high-performance networks like InfiniBand, RoCE, NVLink, etc.
Experience with Deep Learning Frameworks such PyTorch, TensorFlow, etc. Knowledge of deep learning parallelisms and mapping to the communication subsystem. Experience with HPC applications.
Strong collaborative and interpersonal skills and a proven track record of effectively guiding and influencing within a dynamic and multi-functional environment.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8321599

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

2 ימים

Senior Software Developer

חברה חסויה

Location: More than one

Tel Aviv-Yafo

Ra'anana

Yokne`am

Job Type: Full Time

We are looking for an enthusiastic software engineer to join our AI networking acceleration team, to work on a groundbreaking open-source library, using hardware offloads, GPU Kernels and RDMA network cards. Our product is a performance-oriented low-level infrastructure, crafted to change the way inference works.
We thrive as a team in a deeply strong environment, and we're passionate about innovation. The rewards are sweet and include working with some of the brightest people in the industry, an aggressive compensation plan that rewards top performers, and the opportunity to collaborate on products that transform daily the way people work and play.
What you'll be doing:
Developing a highly optimized inference framework
Running on the worlds largest supercomputers and data centers.
The work environment is dynamic and challenging as our employees work on innovative, next-generation products at the forefront of technology in terms of performance, scalability, and features.

Requirements:
B.Sc. or equivalent experience in Computer Science or Software Engineering
5 years experience in modern C++ / C / Rust development
3 years experience in Linux environment and familiarity with development tools
Deep knowledge of the TCP/IP network stack
Understanding of computer architecture and operating systems concepts
Ways to stand out from the crowd:
Background in Linux internals and low-level software optimizations (benchmarking, bottleneck research, performance tuning)
Experience in programming CUDA kernels is an advantage
Familiarity with ML frameworks and LLMs
Background in parallel programming / high-performance computing / RDMA technology.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8318133

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

לפני 9 שעות

Senior Software Engineer, DOCA

חברה חסויה

Location: Ra'anana and Yokne`am

Job Type: Full Time

We are looking for a Senior Software Engineer. You will work with highly experienced engineers to provide the world's outstanding SmartNIC products for cloud-computing, research, medical, automotive, finance, weather, telco, and more. We are developing some of the core libraries of the company DOCA SDK, rapidly growing DOCA functionality and use cases. With DOCA, developers can program the data center infrastructure by creating software-defined, cloud-native, secured, HW-accelerated services.
We also take significant part in the Linux-foundation DPDK (dpdk.org) project, and expand the company-Mellanox PMD in particular, providing the framework and common API for fast packet processing in user space. Our goal is to enable breakthrough network performance, using our companySmartNIC hardware capabilities and address the performance, scale and security demands of modern software-defined enterprise data centers and public cloud infrastructure.
What you'll be doing:
You will architect, design, and develop the next-generation technology in network acceleration, as well as work with best-in-class technical leaders in this domain
Engage with customers and architects to understand the requirements and derive the software design accordingly
Collaborate with other engineering teams that develop the upper layers applications like virtual switches (OVS, VPP, and etc.) and lower layers like driver, kernel, FW, and HW.

Requirements:
B.Sc. (or equivalent experience) in computer science/software engineering
5+ years confirmed experience of Programming C/C++
5+ years confirmed experience in Linux environment and tools
Deep experience with Networking Protocols mainly Ethernet, and security protocols
Experience with virtualization technologies
Strong analytical, debugging, and problem-solving skills
Deep knowledge of computer architecture and operating systems.
Experience in performance optimizations
Ways to stand out from the crowd:
Knowledge and experience in DPDK
Knowledge and experience with designing SDKs
Open Source Software Contributor to relevant projects (OvS, DPDK, Linux Kernel..)
A positive demeanor, a growth mindset, and excellent interactions with colleagues.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8321760

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

לפני 9 שעות

Senior Software Developer

חברה חסויה

Location: Tel Aviv-Yafo and Yokne`am

Job Type: Full Time

we are spearheading the AI revolution and the creation of state-of-the-art accelerated compute platforms for global utilization. Our Network Modeling and Performance Insights group is seeking a skilled and driven Software Developer for the design and development of our infrastructure for a complex networking simulation as a service. In this role, you will be responsible for developing and optimizing our network simulation software, and to enhance its performance and quality. You will work on integrating this infrastructure with cloud computation services for various use cases and ensure the simulation is available as a service for internal and external customers. If you're passionate about tackling intricate challenges and contributing to comprehensive software solutions, we want to hear from you.
What you'll be doing:
Enhance simulation runtime and memory consumption through innovative optimization techniques.
Improve the quality of the simulation as a software product, ensuring robustness and reliability.
Expends the simulation versatility to accommodate new various and complex user use cases and bleeding-edge requirements.
Design and expose the simulation as a service to facilitate easier access for different stakeholders.
Integrate a new simulation management system, making simulated experiments data accessible to all users.
Design and develop a CI/CD infrastructure for our complex networking simulation tool, ensuring efficient deployment and smooth integration processes.

Requirements:
BSc or above in Computer Science, Computer Engineering, or a related field, or equivalent experience.
5+ years of relevant practical experience in software development, including working on a large-scale software product, preferably with strict performance considerations.
Proficiency in C++ and optimization techniques for improving code performance
In-depth knowledge of computer science fundamentals, and computer architecture.
Strong communication skills.
Experience with simulation environments (specifically, network related) - a significant advantage
Prior experience with multi-core computation and parallel code acceleration
Familiarity with cloud computing and parallelization of computational workloads - an advantage.
Experience in developing CI/CD pipelines and integrating services - an advantage.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8321816

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

לפני 9 שעות

Senior Software Engineer, DPU Platform

חברה חסויה

Location: Tel Aviv-Yafo and Yokne`am

Job Type: Full Time

We are looking for a versatile Senior Software Engineer for the company DPU Platform team. This position offers the opportunity to have real impact in a multifaceted, technology-focused company affecting product lines that empower the most advanced data centers in the world. Using your deep knowledge of embedded platforms, operating systems, and software distribution technologies, you will work with a world-wide development team to solve the unique challenges of delivering the world's most powerful platforms.
What you'll be doing:
Develop system software components including processor firmware and bootloaders, kernel drivers/modules, and user space applications and libraries
Collaborating with hardware and product design teams to develop software for sophisticated SOC platform designs
Assisting world-wide teams with various customers' and internal DPU projects
Tackle complex system-level optimization and resource utilization challenges
Participate across all levels of product development lifecycle that values high-standards for clear requirements, software quality and performance
Collaborate within a worldwide matrixed software development team, and have broad impact within our highly-dynamic and technology-focused company.

Requirements:
Bachelor's degree in Computer Science/Engineering or equivalent experience
5+ years developing software for embedded systems (C is required, Python)
Proven understanding of the system software stack, with a focus on software/hardware interaction, including platform firmware, device drivers, Linux kernel, and how user-space applications utilize system services to achieve high performance
A deep knowledge of high-performance processor architecture including CPU and cache coherency concepts, as well as hardware accelerators
Well-rounded engineering skills, including technical investigation, design, testing, and agile software engineering process
Outstanding written and oral communication skills
Must be proficient in the C programming language
Experienced with build environment tools (gcc, git, github, make, bitbake, shell scripts, gerrit, jenkins, etc)
Ways to stand out from the crowd:
Background with ARMv8 microarchitecture, ATF and/or UEFI software is a strong plus
Experience with multiple Linux distributions, with the ability to compare and contrast them
Experience developing security key management solutions is very desirable
Exposure to secure boot flows and/or trusted computing environments.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8321946

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

לפני 9 שעות

Senior Software Engineer, DPU BMC Platform

חברה חסויה

Location: Tel Aviv-Yafo and Yokne`am

Job Type: Full Time

We are seeking a highly motivated Senior Software Engineer with expertise in embedded software development to join our Data Processing Unit (DPU) Software Group. We are looking for a candidate with the ability to thrive in an environment with sophisticated software and hardware designs, take ownership and lead the SW development of key components of the DPU. The role includes working closely with HW, FW, and SW teams all over the world, and take our product to next level.
What youll be doing:
Design and develop high performance networking solutions based on our company's outstanding Bluefield networking cards hardware
Engage closely with customers and partners.
Collaborate with multiple teams in our multi-functional environment on developing new features/improvements.
Stay up to date with industry best practices, new technologies, and emerging trends in software verification.
Write fast, effective, maintainable, reliable and well documented code
Innovate! Bring our company's DPU products to shine in customer's view.

Requirements:
Bachelor's degree in Computer Science, Software Engineering, or a related field (or equivalent work experience).
5+ years of experience in writing programs using C/C++.
Experience with embedded SW development
Good background in designing, implementing, and debugging Software.
Experience in development under a Linux environment..
Extensive knowledge in Software debugging and problem solving skills.
Strong design, coding, analytical, debugging and problem-solving skills
Ability to work concurrently with multiple groups in the organization
Creative, motivated, and value driven person
Ways to stand out from the crowd:
Experience with networking applications and protocols.
Expertise in driver development along with deep knowledge of modern C++ programming.
Proficiency in Python development.
Background in BMC, UEFI, Secure Boot, U-Boot, ATF, and Yocto.
Previous experience working closely with hardware and board design teams.
Experience in software development within the Linux kernel.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8321937

שירות זה פתוח ללקוחות VIP בלבד

שמך המלאמה השם שלך?

מייל

תיאור

שליחה

תודה על שיתוף הפעולה

מודים לך שלקחת חלק בשיפור התוכן שלנו :)

המשרה נמחקה

תוכל לצפות בה בדף המשרות שלי

המשרה הוחזרה לרשימת תוצאות החיפוש

האם תרצה להסיר את המשרה מרשימת

המשרות השמורות שלך?

כן לא

אירעה שגיאה בשליחת פרטיך למשרה

2 ימים

Senior HPC AI Cluster Engineer

חברה חסויה

Location: More than one

Job Type: Full Time

we are looking for an experienced HPC Engineer to join the E2E software verification HPC/AI Infrastructure team. we are focused on building supercomputers and HPC clusters based on groundbreaking technologies. We are looking for an outstanding architect for a senior HPC, be a key player to the most exciting computing hardware and software to contribute to the latest breakthroughs in artificial intelligence and GPU computing. Provide insights on at-scale system design and tuning mechanisms for large-scale compute runs. You will work with the latest Accelerated computing and Deep Learning software and hardware platforms, and with many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. You will interact with HPC, OS, GPU compute, and systems specialist to architect, develop and bring up large scale performance platforms.
What you will be doing:
Design, implement and maintain large scale HPC/AI clusters with monitoring, logging and alerting
Manage Linux job/workload schedules and orchestration tools
Develop and maintain continuous integration and delivery pipelines
Develop tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources
Deploy monitoring solutions for the servers, network and storage
Perform troubleshooting bottom up from bare metal, operating system, software stack and application level
Being a technical resource, develop, re-define and document standard methodologies to share with internal teams
Support Research & Development activities and engage in POCs/POVs for future improvements.

Requirements:
A degree in Computer Science, Engineering, or a related field
5+ years of experience
Knowledge of HPC and AI solution technologies from CPUs and GPUs to high speed interconnects and supporting software
Experience with job scheduling workloads and orchestration tools such as Slurm, K8s
Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking (sockets, firewalld, iptables, wireshark, etc.) and internals, ACLs and OS level security protection and common protocols e.g. TCP, DHCP, DNS, etc.
Experience with multiple storage solutions such as Lustre, GPFS, zfs and xfs. Familiarity with newer and emerging storage technologies.
Python programming and bash scripting experience.
Comfortable with automation and configuration management tools such as Jenkins, Ansible, Puppet/chef
Deep knowledge of Networking Protocols like InfiniBand, Ethernet
Deep understanding and experience with virtual systems (for example VMware, Hyper-V, KVM, or Citrix)
Ways to stand out from the crowd:
Familiarity with cloud computing platforms (e.g. AWS, Azure, Google Cloud)
Knowledge of CPU and/or GPU architecture
Knowledge of Kubernetes, container related microservice technologies
Experience with GPU-focused hardware/software (DGX, Cuda)
Background with RDMA (InfiniBand or RoCE) fabrics.

This position is open to all candidates.

עדכון קורות החיים לפני שליחה

8317649

שירות זה פתוח ללקוחות VIP בלבד