דרושים » תוכנה » Senior Software Engineer, Cloud Platforms

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
27/08/2025
Location: Yokne`am and Ra'anana
Job Type: Full Time
we are looking for an excellent Senior Software Developer to work on open-source cloud platforms such as Kubernetes. We are seeking an experienced engineer who is deeply technical, hands-on, and has a wide system view. You will design, build and deploy high-performance and scalable clouds based on our company's superior ConnectX NICs and Bluefield DPUs. We are looking to grow our teams with the smartest people in the world. If you're creative and autonomous, we want to hear from you!
What youll be doing:
Design and implement new features to accelerate Network and Storage
Work closely with open source communities, participate in the relevant working groups
Work with different teams across our company
Mentor members of the team, enabling them to deliver high-quality software.
Requirements:
BSc in Computer Science or equivalent program experience
5+ years of hands-on experience in software development, preferably with C/Python/Golang
Highly motivated with strong communication skills, ability to work successfully with multi-functional teams, developers, and architects
Coordinate effectively across organizational boundaries and geographies
Strong self-initiative, independence, and flexibility to a new technology
Deep understanding of network protocols, virtualization, and containers
Strong background in designing, implementing, and debugging complex software
Wide hands-on experience with Kubernetes or OpenStack echo systems
Ways to stand out from the crowd:
Experience with working on open source projects
Background with SR-IOV, K8S, K8S controllers, CNI.
Wide hands-on experience with OVN and OVS.
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8321730
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
26/08/2025
Location: Ra'anana
Job Type: Full Time
we are seeking an exceptional Senior K8s Software Engineer to help design and build our next-generation cloud platforms. Were looking for a deeply technical, hands-on engineer with a broad systems perspective and a passion for scalable cloud infrastructure powered by ConnectX, BlueField NICs, and GPUs. Youll join a dynamic team developing high-performance computing infrastructure used in some of the worlds largest supercomputers and data centers. This is a fast-paced, collaborative environment where youll work on innovative, next-gen products at the forefront of performance, scalability, and functionality.
What youll be doing:
Design and develop scalable, cloud-native solutions to accelerate HPC and AI workloads using our companys advanced technologies (GPUs, DPUs, ConnectX).
Contribute to our companys AI supercomputing platforms
Collaborate with cross-functional teams to deliver new features and improve existing products.
Support, maintain, and document robust software systems.
Requirements:
BSc in Computer Science or equivalent program.
5+ years of software development experience with Go and Python.
Strong hands-on development experience with the K8s ecosystem.
Familiarity with CI/CD tools such as Jenkins, GitLab, or GitHub.
Proven ability to design, debug, and maintain complex distributed systems.
Excellent communication skills and the ability to collaborate across teams and geographies.
Self-starter with adaptability and eagerness to learn new technologies.
Ways to stand out from the crowd:
Experience building K8s operators/controllers.
Background in HPC or AI infrastructure technologies.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8319793
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
28/08/2025
Location: Yokne`am
Job Type: Full Time
we are seeking an exceptional DevOps & Software Engineer to join our innovative development team. This team is at the heart of our companys software infrastructurebuilding and maintaining a wide range of solutions including internal cloud environment, scalable build systems, automation frameworks and AI-driven tools. You will have a direct impact on our companys products and development workflows. This role blends hands-on development with deep DevOps practices. Its ideal for engineers passionate about scalability, infrastructure, automation, and modern development platforms.
What Youll Be Doing:
Develop and maintain internal cloud solutions based on a microservices architecture to enable efficient, high-quality software development and delivery.
Design and implement automation tools, infrastructure services, and advanced build systems.
Work across a wide variety of operating systems, building virtualization and system-level capabilities.
Provide resilient solutions for a sophisticated infrastructure stack with the latest network devices.
Contribute to a positive and collaborative team culture that values creativity and agility.
Partner with engineering teams across our company to deliver scalable and reliable infrastructure solutions.
Requirements:
B.Sc. in Computer Science, Computer Engineering, or equivalent technical field.
5+ years of hands-on experience in software development or DevOps roles.
Proficiency in Python and familiarity with Linux environments.
Solid understanding of software design, implementation, and debugging.
Strong analytical skills, ability to troubleshoot complex systems.
Self-driven, quick learner, comfortable with multitasking and dynamic environments.
Ways to Stand Out from the Crowd:
Experience with virtualization and operating system internals.
Familiarity with tools like Kubernetes, Rancher, MongoDB, Redis, Docker, Vagrant, Ansible, or similar.
Strong background in CI/CD practices, particularly using Jenkins and infrastructure-as-code.
Knowledge of networking fundamentals and protocols.
Prior experience with cloud infrastructure or grid computing environments.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8322800
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
27/08/2025
Location: Tel Aviv-Yafo and Yokne`am
Job Type: Full Time
we are leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.
Come work for the team that brought to you NCCL, NVSHMEM & GPUDirect. Our GPU communication libraries are crucial for scaling Deep Learning and HPC applications! We are looking for a motivated Partner Enablement Engineer to guide our key partners and customers with NCCL. Most DL/HPC applications run on large clusters with high-speed networking (Infiniband, RoCE, Ethernet). This is an outstanding opportunity to get an end to end understanding of the AI networking stack. Are you ready for to contribute to the development of innovative technologies and help realize our company's vision?
What you will be doing:
Engage with our partners and customers to root cause functional and performance issues reported with NCCL
Conduct performance characterization and analysis of NCCL and DL applications on groundbreaking GPU clusters
Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP, etc.)
Guide our customers and support teams on HPC knowledge and standard methodologies for running applications on multi-node clusters
Document and conduct trainings/webinars for NCCL
Engage with internal teams in different time zones on networking, GPUs, storage, infrastructure and support.
Requirements:
B.S./M.S. degree in CS/CE or equivalent experience with 5+ years of relevant experience. Experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM)
Excellent C/C++ programming skills, including debugging, profiling, code optimization, performance analysis, and test design
Experience working with engineering or academic research community supporting HPC or AI
Practical experience with high performance networking: Infiniband/RoCE/Ethernet networks, RDMA, topologies, congestion control
Expert in Linux fundamentals and a scripting language, preferably Python
Familiar with containers, cloud provisioning and scheduling tools (Docker, Docker Swarm, Kubernetes, SLURM, Ansible)
Adaptability and passion to learn new areas and tools
Flexibility to work and communicate effectively across different teams and timezones
Ways to stand out from the crowd:
Experience conducting performance benchmarking and developing infrastructure on HPC clusters. Prior system administration experience, esp for large clusters. Experience debugging network configuration issues in large scale deployments
Familiarity with CUDA programming and/or GPUs. Good understanding of Machine Learning concepts and experience with Deep Learning Frameworks such PyTorch, TensorFlow
Deep understanding of technology and passionate about what you do.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8321595
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
03/09/2025
Location: Yokne`am
Job Type: Full Time
we are looking for an outstanding Senior Software Performance Engineer for our Linux Drivers group. You'll closely work with our company Driver developers, verification teams, and performance architects, gaining a deep understanding of our companys Networking products and technologies built on top of our company ConnectX and BlueField network cards. You will lead feature verification from design through implementation to integration into frameworks, develop robust infrastructure, and work collaboratively with cross-functional teams.
This position offers the opportunity to have real impact in a dynamic, technology-focused company, influencing product lines that empower the most advanced data centers in the world. We've built a team of outstanding people stretching around the globe, whose mission is to push the frontiers of what's possible today and define the platform for the future of computing. We strongly believe in developing our employees and giving them the tools to succeed.
What youll be doing:
Work closely with developers to test new components, including crafting and executing unit, functional, and performance tests.
Develop a verification environment using Python to qualify the product from functional and performance perspectives.
Investigate performance-related issues in networking Linux drivers.
Analyze coverage measures to identify verification holes and to show progress toward product development and releases.
Identify and write all types of coverage measures for stimulus and corner cases.
Requirements:
B.Sc. or equivalent experience in Computer Science or SW/Computer Engineering.
5+ years of work experience in software development.
Strong programming skills in Python and/or C.
Background with Networking and protocols.
Knowledge working with and debugging Linux kernel drivers.
Strong debugging and analytical skills.
Creative, motivated, and results-driven worker.
Ways to stand out from the crowd:
Background in Linux Operating Systems.
Knowledge in Virtualization.
Strong background in designing, implementing, and debugging software.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8331727
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
27/08/2025
Location: Tel Aviv-Yafo and Yokne`am
Job Type: Full Time
We are seeking a highly motivated Senior Software Engineer with expertise in embedded software development to join our Data Processing Unit (DPU) Software Group. We are looking for a candidate with the ability to thrive in an environment with sophisticated software and hardware designs, take ownership and lead the SW development of key components of the DPU. The role includes working closely with HW, FW, and SW teams all over the world, and take our product to next level.
What youll be doing:
Design and develop high performance networking solutions based on our company's outstanding Bluefield networking cards hardware
Engage closely with customers and partners.
Collaborate with multiple teams in our multi-functional environment on developing new features/improvements.
Stay up to date with industry best practices, new technologies, and emerging trends in software verification.
Write fast, effective, maintainable, reliable and well documented code
Innovate! Bring our company's DPU products to shine in customer's view.
Requirements:
Bachelor's degree in Computer Science, Software Engineering, or a related field (or equivalent work experience).
5+ years of experience in writing programs using C/C++.
Experience with embedded SW development
Good background in designing, implementing, and debugging Software.
Experience in development under a Linux environment..
Extensive knowledge in Software debugging and problem solving skills.
Strong design, coding, analytical, debugging and problem-solving skills
Ability to work concurrently with multiple groups in the organization
Creative, motivated, and value driven person
Ways to stand out from the crowd:
Experience with networking applications and protocols.
Expertise in driver development along with deep knowledge of modern C++ programming.
Proficiency in Python development.
Background in BMC, UEFI, Secure Boot, U-Boot, ATF, and Yocto.
Previous experience working closely with hardware and board design teams.
Experience in software development within the Linux kernel.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8321937
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
04/09/2025
Location: More than one
Job Type: Full Time
we are leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. we are looking for phenomenal people like you to help us accelerate the next wave of artificial intelligence.
We are looking for highly motivated Senior Software Engineers to work on our GPU NVLink Fabric Networking team. Youll be part of a team responsible for defining the next generation communications standards and products building on our current NVLink and NVSwitch technology.
What you will be doing:
Design, develop, and maintain system-level software to enable high-performance GPU-to-GPU communication.
Collaborate closely with cross-functional teams including hardware, firmware, system software to build and deliver next-generation GPU networking solutions.
Contribute to scalable and reliable GPU fabric architecture for large compute clusters.
Align software development with customer needs and real-world deployment environments.
Requirements:
B.S/M. S/ Ph.D. in computer science or a related field with 5+ years of relevant experience.
Excellent C/C++ programming and debugging skills, with some familiarity with Python.
Experience writing software applications that interface with device drivers and expose associated hardware functionality.
Solid understanding of computer system architecture, operating system and kernel internals.
Experience with Linux development; familiarity with Windows is a plus.
Background in multi-core / multi-process / multi-threaded programming environment.
Strong understanding of networking fundamentals and high-performance interconnection (e.g., InfiniBand, Ethernet)
Familiarity with OS virtualization technologies like KVM/QEMU/Hyper-V, etc.
Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
Ways to stand out from the crowd:
Understanding of CUDA programming model and our company GPUs.
Knowledge of memory coherence and consistency models.
Familiarity with static and dynamic code analysis, fuzzing, negative testing, and other techniques.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8333391
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
01/09/2025
Location: Yokne`am
Job Type: Full Time
We are looking for a creative and independent Software Engineer for tools, infrastructure, and workflows development for the IC test and product engineering group in our company Networking Business Unit.
our company Networking Business Unit has continuously reinvented itself over two decades. Our high-speed buses & network products are leading in the markets with innovative ways to improve speed and bandwidth from one generation to another and today we are known as the go-to place for End-to-End High-Speed Ethernet and InfiniBand Solutions.
We're looking to grow our company and build our teams with smart people who can join us at the cutting-edge technology. We need a creative individual, who will help move Network Silicon ICs products (Switch, NIC, SmartNic) from design to mass production. You will work with test engineers, test house, Design, IT and many other professionals in the organization for the development of tools and test infrastructure for speeding time to market and enabling next generation test capabilities, characterization and data analysis.
If you are passionate about enabling of the highest quality Network products in the market, we want to hear from you!
What you'll be doing:
Design, develop, and maintain mission-critical engineering applications and automation tools.
Build systems that automate test program validation, execution, and release processes.
Architect infrastructure for scalable test and data workflows targeting next-generation network silicon.
Collaborate with cross-functional teams to enhance HW/SW automation flows and characterization pipelines.
Support integration and deployment in manufacturing environments and Contract Manufacturers (CM).
Enable new capabilities in the CM
Leverage DevOps best practices (CI/CD, version control, infrastructure automation) to accelerate internal development cycles.
Work with various teams at our company to improve and automate data analysis capabilities for all engineering and characterization test results.
Requirements:
BSc or higher in Computer Science or related field, with 7+ years of hands-on software development experience.
Proficiency in C# and Python; C/C++ experience is a strong plus.
Proven experience in GUI, application development, and tool integration; web/cloud background is advantageous.
GIT high proficiency.
Outstanding customer orientation
Hands-on experience with CI/CD (Jenkins, GitLab pipelines), Git-based workflows, Linux environments, shell scripting, and virtualized infrastructure.
Passion for it just works automation and no repetitive tasks.
Excellent communication skills with diverse teams and functional groups
Agile, self-learning and high execution quality standards
Innovative approach for problem solving.
Ways to stand out from the crowd:
VBA or VB6 experience is a huge plus.
Semiconductor test knowledge or hands-on experience with ATE/DFT workflows.
Experience with HW/SW interfaces.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8328340
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
28/08/2025
חברה חסויה
Location: Yokne`am
Job Type: Full Time
we are seeking a highly skilled DevOps Engineer to join our Networking IC Product Engineering Group (ICPE). This is a unique opportunity to become a cornerstone of our DevOps practice, owning the critical systems that power our engineering innovation. Youll be responsible for the entire DevOps lifecyclefrom robust CI/CD pipelines to production line package releasesdriving efficiency, scalability, and reliability across the organization. You will work with a high degree of autonomy, expected to independently lead initiatives, design and implement optimal solutions, and collaborate with both internal stakeholders and external partners. If you're a self-motivated engineer who thrives in dynamic environments, takes initiative without waiting for direction, and enjoys improving and scaling engineering ecosystemswe want you with us.
What Youll Be Doing:
Develop and maintain robust, scalable CI/CD pipelines to ensure seamless software integration and delivery.
Collaborate with cross-functional teams to enhance build system reliability and efficiency.
Monitor, troubleshoot, and optimize system performance to ensure continuous, reliable operation.
Diagnose and resolve complex issues affecting the stability and performance of development and production environments.
Requirements:
Bachelor's degree in computer science, computer engineering, or equivalent experience.
5+ years of hands-on experience in CI/CD pipeline development and automation (e.g., Jenkins, GitLab CI/CD).
5+ years of experience in Python development.
5+ years of working with Linux distributions (e.g., RedHat, Ubuntu).
Proficiency in scripting languages (e.g., Bash, Ruby, Groovy) in a Unix/Linux environment.
Strong background in configuration and deployment management.
Expertise in version control systems (e.g., GitLab, Gerrit).
Exceptional problem-solving skills, with a focus on identifying root causes and implementing long-term fixes.
Excellent communication and interpersonal skills; strong team spirit and cross-team collaboration mindset.
Proven ability to work independently, prioritize tasks, and drive initiatives without constant supervision.
Ways To Stand Out From The Crowd:
Experience with PyTest or other testing frameworks.
Previous leadership experience or a track record of mentoring/team-leading.
Familiarity with databases (e.g., MongoDB or similar).
Hands-on experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8322916
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
26/08/2025
Location: Ra'anana
Job Type: Full Time
our company has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Its a unique legacy of innovation thats fueled by great technologyand amazing people. Today, were tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing whats never been done before takes vision, innovation, and the worlds best talent. As a worker, youll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.
we are looking for a Senior Software Development Engineer to contribute to cutting-edge Network Management System of the most powerful super-computers in the world. Our team is growing, and we are looking for hardworking and self-motivated engineers to develop and verify advanced, high-scale SDN management solutions. You will be part of a dynamic team, working with amazing people.
What Youll Be Doing:
You will have a significant impact in developing the next-generation Unified Fabric Manager (UFM) product.
Help drive the underlying technology stack and implementation methodology, ensuring it competes at a world-class level.
Collaborate closely with other SW R&D teams and SW Architects to successfully implement ambitious projects.
Engage in performance tuning and automation to build a flawless operational environment.
Design and implement micro-services architecture to support our advanced, high-scale SDN management solutions.
Work in an agile environment, ensuring continuous improvement and innovation.
Requirements:
We are looking for candidates with the following proven qualifications and experience:
B.Sc. or equivalent experience in Computer Science or a related field.
10+ Hands-on experience with system software design, development, and maintenance, particularly in C/C++ programming.
Debugging and performance analysis skills are strictly required.
Significant advantage if you have Python programming experience.
Proficiency with Dockers, Kubernetes, and other orchestration tools.
Background with RESTful web services and experience with Continuous Integration and Continuous Delivery.
Excellent interpersonal and written communication skills to foster collaboration and inclusion.
Ways to stand out from the crowd:
Extensive knowledge and deep understanding of Linux system programming.
A track record of solving sophisticated problems with elegant solutions.
Demonstrated ability to deliver complex projects in previous roles.
Experience building infrastructures and tools to speed up development, testing, and release.
Experience in agile software development methodology.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8319866
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
27/08/2025
חברה חסויה
Location: Tel Aviv-Yafo and Yokne`am
Job Type: Full Time
we are leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.
Come work for the team that brought to you NCCL, NVSHMEM & GPUDirect. Our GPU communication libraries are crucial for scaling Deep Learning and HPC applications! We are looking for a motivated Performance engineer to influence the roadmap of our communication libraries. The DL and HPC applications of today have a huge compute demand and run on scales which go up to tens of thousands of GPUs. The GPUs are connected with high-speed interconnects (eg. NVLink, PCIe) within a node and with high-speed networking (eg. Infiniband, Ethernet) across the nodes. Communication performance between the GPUs has a direct impact on the end-to-end application performance; and the stakes are even higher at huge scales! This is an outstanding opportunity for someone with HPC and performance background to advance the state of the art in this space. Are you ready for to contribute to the development of innovative technologies and help realize our company's vision?
What you will be doing:
Conduct in-depth performance characterization and analysis on large multi-GPU and multi-node clusters.
Study the interaction of our libraries with all HW (GPU, CPU, Networking) and SW components in the stack
Evaluate proof-of-concepts, conduct trade-off analysis when multiple solutions are available
Triage and root-cause performance issues reported by our customers
Collect a lot of performance data; build tools and infrastructure to visualize and analyze the information
Collaborate with a very dynamic team across multiple time zones.
Requirements:
M.S. (or equivalent experience) or PHD in Computer Science, or related field with relevant performance engineering and HPC experience
3+ yrs of experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM)
Experience conducting performance benchmarking and triage on large scale HPC clusters
Good understanding of computer system architecture, HW-SW interactions and operating systems principles (aka systems software fundamentals)
Implement micro-benchmarks in C/C++, read and modify the code base when required
Ability to debug performance issues across the entire HW/SW stack. Proficient in a scripting language, preferably Python
Familiar with containers, cloud provisioning and scheduling tools (Kubernetes, SLURM, Ansible, Docker)
Adaptability and passion to learn new areas and tools. Flexibility to work and communicate effectively across different teams and timezones
Ways to stand out from the crowd:
Practical experience with Infiniband/Ethernet networks in areas like RDMA, topologies, congestion control
Experience debugging network issues in large scale deployments
Familiarity with CUDA programming and/or GPUs
Experience with Deep Learning Frameworks such PyTorch, TensorFlow.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8321604
סגור
שירות זה פתוח ללקוחות VIP בלבד