דרושים » תוכנה » Senior HPC and AI Cluster Administrator

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
11/02/2026
משרה זו סומנה ע"י המעסיק כלא אקטואלית יותר
מיקום המשרה: מספר מקומות
סוג משרה: משרה מלאה
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
26/03/2026
Location: Yokne`am
Job Type: Full Time
we are looking for a senior hpc and ai cluster administrator to join the networking clusters solutions hpc/ai infrastructure team. we are building supercomputers and ai clusters based on groundbreaking technologies. we are looking for a system administrator to be a key player to the most exciting computing hardware and software to contribute to the latest breakthroughs in artificial intelligence and gpu computing
you will work with the latest accelerated computing and deep learning software and hardware platforms, and with many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. you will interact with hpc, os, gpu compute, and systems specialist to architect, develop and bring up large scale performance platforms. does this sound like you? if so, we would love to hear from you!
what you will be doing: deploy, manage and maintain large scale hpc/ai clusters
managing Linux job/workload schedules and orchestration tools
support and maintain continuous integration and delivery pipelines
troubleshooting and fixing, bottom up from bare metal, operating system, software stack and application level
supporting research & development activities and engaging in pocs/povs for future improvements.
Requirements:
what we need to see: bachelor's degree in Computer Science, engineering, or a related field; or equivalent experience
5+ years of experience
knowledge of hpc and ai solution technologies from cpus and gpus to high speed interconnects and supporting software
experience with job scheduling workloads and orchestration tools such as slurm, k8s
excellent knowledge of windows and Linux (redhat/centos and ubuntu) networking (sockets, firewalls, iptables, wireshark, etc.) and internals, acls and os level security protection and common protocols e.g. tcp, dhcp, dns, etc.
experience with multiple Storage solutions such as lustre, gpfs, zfs and xfs. familiarity with newer and emerging Storage technologies.
Python programming and bash scripting experience, automation and configuration management tools such as jenkins, ansible, gitops
knowledge of networking protocols like infiniband, ethernet
experience with virtual systems (for example VMware, hyper-v, kvm)
familiarity with cloud computing platforms (e.g. aws, azure, google cloud)
ways to stand out from the crowd: knowledge of cpu and/or gpu architecture
knowledge of kubernetes, container related microservice technologies
experience with gpu-focused hardware/software (dgx, cuda)
background with rdma (infiniband or roce) fabrics
our company has been redefining computer graphics, pc gaming, and accelerated computing for more than 25 years. we have a unique legacy of innovation thats fueled by great technology-and amazing people. today, were tapping into the unlimited potential of ai to define the next era of computing. an era in which our gpu acts as the brains of computers, robots, and self-driving cars that can understand the world. doing whats never been done before takes vision, innovation, and the worlds best talent. our teams are composed of driven, innovative professionals dedicated to pushing the boundaries of technology. we offer highly competitive salaries, an extensive benefits package, and a work environment that promotes diversity, inclusion, and flexibility. as an equal opportunity employer, we are committed to fostering a supportive and empowering workplace for all
#il-hybrid
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593421
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
26/03/2026
Location: Yokne`am
Job Type: Full Time
we are looking for a data center network deployment engineer to join the networking clusters solutions hpc/ai infrastructure team. we are building supercomputers and ai clusters based on groundbreaking technologies. we are looking for a network/ system Engineer to be a key player to the most exciting computing hardware and software to contribute to the latest breakthroughs in artificial intelligence and gpu computing.
you will work with the latest accelerated computing and deep learning software and hardware platforms, and with many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. you will interact with hpc, os, gpu compute, and systems specialist to architect, develop and bring up large scale performance platforms. does this sound like you? if so, we would love to hear from you!
what you'll be doing:
deploy, manage and maintain large scale ai data centers - control, network and Storage stack
work with multiple software and hardware teams to optimize the clusters networking health and performance
develop and implement automation scripts for network, compute and Storage operations and deployments
supporting research & development activities and engaging in pocs/povs for future improvements.
Requirements:
what we need to see:
b.sc. in engineering or ccnp certificate
3+ years of proficiency in networking fundamentals, configuring ethernet switches, understanding the tcp/ip stack, and data center architecture.
excellent knowledge of windows and Linux (redhat/centos and ubuntu) networking (sockets, firewalls, iptables, wireshark, etc.) and internals, acls and os level security protection and common protocols e.g. tcp, dhcp, dns, etc.
proactive individual with the ability to work independently, prioritizing tasks to optimize technology and enhance Customer Experience.
provides ad-hoc knowledge transfers, develops handover materials, and offers deployment support for engagements.
ways to stand out from the crowd:
combination of interpersonal skills and technical competence
knowledge of hpc and ai solution technologies from cpus and gpus to high speed interconnects and supporting software
experience with multiple Storage solutions such as lustre, gpfs, and newer and emerging Storage technologies.
automation tooling background (ansible, salt, puppet etc.).
we are widely considered to be one of the technology worlds most desirable employers! we have some of the most forward-thinking and hardworking individuals in the world working for us. if you're creative and autonomous, we want to hear from you!
#il-hybrid
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593381
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
27/03/2026
חברה חסויה
Location: Yokne`am
Job Type: Full Time
in this role, you will help build and evolve systems that support performance analysis, telemetry, and optimization for large-scale gpu- and cpu-based clusters used in ai and high-performance computing environments. you will work closely with hardware, networking, firmware, and software teams to collect, analyze, and interpret performance data from live systems. this is a fast-paced r&d environment where system behavior and requirements evolve rapidly, requiring adaptable engineering solutions and strong analytical thinking.
what youll be doing:
profile, benchmark, and analyze ai and hpc workloads on gpu and cpu clusters
explore performance characteristics of high-performance networking and collective communications (e.g., nccl, rdma, mpi, roce)
identify performance bottlenecks across networking, compute, memory, and system architecture
develop and enhance performance analysis, benchmarking, and diagnostic tools
define performance TEST plans and establish expectations for new technologies and platforms
collaborate across hardware, firmware, networking, systems, and software teams to provide actionable performance insights
support telemetry collection and data refinement efforts to enable accurate performance analysis
maintain high standards for  data quality, reproducibility, and traceability of performance results
Requirements:
what we need to see:
b.sc. or m.sc. in Computer Science, computer engineering, software engineering, or equivalent experience
5+ years of experience in performance analysis, systems engineering, or hpc/ai infrastructure
demonstrated expertise in performance analysis skills and methodologies
hands-on experience with high-performance networking (rdma, mpi, nccl, congestion control)
strong understanding of  system performance metrics (latency, throughput, resource utilization)
exposure to hardware, firmware, or Embedded telemetry environments
strong analytical, problem-solving, and communication skills
ability to work effectively in cross-functional, fast-paced r&d teams
ways to stand out from the crowd:
knowledge of cuda, nccl internals, and congestion control algorithms
deep system -level understanding of cpu architectures, gpus, hcas, memory, and pcie
experience with nvidia gpus, cuda, and deep learning frameworks such as pytorch or tensorflow
experience with cloud platforms 
proficiency in  Python ; experience with bash and C / C ++ is a plus as well as a strong experience working in  Linux environments
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8594112
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
18/03/2026
Location: Yokne`am
Job Type: Full Time
We are looking for a Senior networking test engineer with strong system‑level debugging skills to join our End‑to‑End Verification team. You will work on cutting‑edge Ethernet‑based AI clusters, owning complex issues across hardware, system software and AI workloads. NVIDIA is widely considered to be one of the technology worlds most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!

What youll be doing:

Design and review test and product requirements across the Ethernet / NIC / DPU / Switch portfolio, focusing on large‑scale AI cluster behavior.

Build and maintain realistic customer‑like testbeds, including heterogeneous hardware, OS / driver combinations and complex network fabrics.

Own end‑to‑end cluster troubleshooting: reproduce customer scenarios, triage across the stack and drive issues to root cause and fix.

Read and understand relevant source code to identify defects, validate fixes and improve logging and instrumentation.

Collaborate closely with development teams to debug NCCL, RoCE/RDMA and related networking components using logs, code inspection and targeted experiments.

Define tests and guide the automation team to implement robust suites that produce actionable logs, metrics and traces.

Run Regression, Performance, Functional and Scale testing, analyze results and provide clear, data‑driven reports to stakeholders.

Profile and benchmark deep learning training and inference workloads, correlating model‑level metrics with system and network telemetry to uncover bottlenecks.
Requirements:
What we need to see:

B.A./B.Sc. in Computer Science, Electrical Engineering, or equivalent IT/Network/Systems experience.

5+ years of hands‑on networking or system‑level testing and debugging on Linux.

Strong Linux networking and debugging skills (for example perf, tcpdump, ethtool, iproute2).

Proven production‑grade debugging experience: forming hypotheses, running experiments, and driving issues to root cause under pressure.

Expertise in host‑side NIC validation and tuning (offloads, queues, interrupts, firmware/driver interactions).

Strong knowledge of AI networking libraries (such as NCCL) and protocols (such as RoCE and RDMA), including performance and correctness debugging.

Ability to read and reason about source code (C/C++/Python or similar) and collaborate closely with developers on fixes.

Solid scripting and automation skills with Bash / Python / Ansible for setup, log collection, and experiment orchestration.

Fast learner, familiar with modern AI tools and workflows, able to adapt quickly.

Excellent analytical, problem‑solving and communication skills, with strong ownership and a collaborative mindset.

Ways to stand out from the crowd:

Hands‑on debugging of collective communication libraries (for example NCCL) or large‑scale LLM training / inference clusters.

Experience with large cluster environments (tens to thousands of GPUs or nodes), including incident response and post‑mortem analysis.

Deep expertise in tuning and debugging congestion control and lossless Ethernet for AI workloads (for example DCQCN, ECN, PFC).

Familiarity with NVIDIA networking technologies (for example BlueField / BF3, ConnectX NICs) and their software stack and diagnostics.

Experience debugging issues that span multiple layers (L2/L3, transport, AI frameworks) or contributing to open‑source networking / AI systems.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8584095
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
18/03/2026
חברה חסויה
Location: Tel Aviv-Yafo and Yokne`am
Job Type: Full Time
We are now looking for a HPC Operations Engineer to join our mission and continue improving our HPC infrastructure. A meaningful part of ourstrength is our unique and advanced development tools and environments that enable our incredible pace of innovation. We are looking for architects to help us evolve the way our private compute cloud is architected and optimized.

What youll be doing:

Troubleshoot incoming support requests in a large-scale HPC environment.

Contribute enhancements to existing deployment automation, configuration management, observability, and operational monitoring and day to day operation through automation.

Ensure compute servers are running correct Operating System and configuration.

Troubleshoot Complex Issues: Perform comprehensive troubleshooting from bare metal to application level, ensuring system reliability and efficiency.

Collaborate with specialist teams to drive issues to closure.

Collaborate with domain experts to improve how our chip development process utilizes our infrastructure.

Directly contribute to the overall quality and improve time to market for our next generation chips.
Requirements:
What we need to see:

BS in Computer Science or similar degree or equivalent experience

2+ years of experience Proficient in administering Centos/RHEL Linux distributions.

Understating of container technologies like Docker.

Proficiency in Python and UNIX scripting languages such as bash.

Excellent problem-solving skills, with the ability to analyze complex systems, identify bottlenecks, and implement scalable solutions.

Excellent communication and teamwork skills, with the ability to work effectively with diverse teams and individuals.

Solid understanding of cluster configuration managements tools such as Ansible.

Ways to stand out from the crowd:

Understanding of key Linux technologies such as NFS, automounter, LDAP, DNS, and TCP/IP networking in Red Hat Linux distribution flavors.

Familiarity with job scheduler administration (e.g. IBM Spectrum LSF or SLURM) and experience building/ operating large scale compute infrastructure.

Knowledge of the FlexLM license management system.

Proficiency in Perl for maintaining legacy automation scripts.

Familiarity with High-Speed Networking (InfiniBand, RDMA, RoCE etc.) and fast, distributed storage systems (Lustre, GPFS, etc.).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8583522
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
22/03/2026
Location: Yokne`am
Job Type: Full Time
We are looking for a Senior networking test engineer with strong system‑level debugging skills to join our End‑to‑End Verification team. You will work on cutting‑edge Ethernet‑based AI clusters, owning complex issues across hardware, system software and AI workloads. We are widely considered to be one of the technology worlds most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!

What youll be doing:

Design and review test and product requirements across the Ethernet / NIC / DPU / Switch portfolio, focusing on large‑scale AI cluster behavior.

Build and maintain realistic customer‑like testbeds, including heterogeneous hardware, OS / driver combinations and complex network fabrics.

Own end‑to‑end cluster troubleshooting: reproduce customer scenarios, triage across the stack and drive issues to root cause and fix.

Read and understand relevant source code to identify defects, validate fixes and improve logging and instrumentation.

Collaborate closely with development teams to debug NCCL, RoCE/RDMA and related networking components using logs, code inspection and targeted experiments.

Define tests and guide the automation team to implement robust suites that produce actionable logs, metrics and traces.

Run Regression, Performance, Functional and Scale testing, analyze results and provide clear, data‑driven reports to stakeholders.

Profile and benchmark deep learning training and inference workloads, correlating model‑level metrics with system and network telemetry to uncover bottlenecks.
Requirements:
What we need to see:

B.A./B.Sc. in Computer Science, Electrical Engineering, or equivalent IT/Network/Systems experience.

5+ years of hands‑on networking or system‑level testing and debugging on Linux.

Strong Linux networking and debugging skills (for example perf, tcpdump, ethtool, iproute2).

Proven production‑grade debugging experience: forming hypotheses, running experiments, and driving issues to root cause under pressure.

Expertise in host‑side NIC validation and tuning (offloads, queues, interrupts, firmware/driver interactions).

Strong knowledge of AI networking libraries (such as NCCL) and protocols (such as RoCE and RDMA), including performance and correctness debugging.

Ability to read and reason about source code (C/C++/Python or similar) and collaborate closely with developers on fixes.

Solid scripting and automation skills with Bash / Python / Ansible for setup, log collection, and experiment orchestration.

Fast learner, familiar with modern AI tools and workflows, able to adapt quickly.

Excellent analytical, problem‑solving and communication skills, with strong ownership and a collaborative mindset.

Ways to stand out from the crowd:

Hands‑on debugging of collective communication libraries (for example NCCL) or large‑scale LLM training / inference clusters.

Experience with large cluster environments (tens to thousands of GPUs or nodes), including incident response and post‑mortem analysis.

Deep expertise in tuning and debugging congestion control and lossless Ethernet for AI workloads (for example DCQCN, ECN, PFC).

Familiarity with our networking technologies (for example BlueField / BF3, ConnectX NICs) and their software stack and diagnostics.

Experience debugging issues that span multiple layers (L2/L3, transport, AI frameworks) or contributing to open‑source networking / AI systems.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8586994
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
26/03/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
looking for a strong technical senior architect to join us in shaping the future. senior architects are innovators who can translate business needs into workable technology solutions. their expertise is deep and broad. they are hands on, producing both detailed technical work and high-level architectural designs.
as a senior architect in the ai networking research team, you will explore technological challenges on accelerate networking and building ai data centers. research new transport functions and semantics for optimizing ai workloads, ai systems communication and accelerations and much more. you will also be leading architectural and development efforts across numerous technological fields, related to the modern ai data center, such as distributed ai and deep learning solutions, data analytics, high performance computing (hpc), software defined networking (sdn), virtualization, Storage, and more.
what youll be doing:
co-design hardware features (e.g., in gpus, dpus, or interconnects) that accelerate data movement and enable new capabilities for inference and model serving. 
identify and evaluate new technologies, innovations and partner relationships for alignment with our technology roadmap and business value.
lead architecture and design of new technologies and innovations such as runtime systems, communication libraries, ai-specific technologies.
lead proof-of-concept development to evaluate and drive such technologies.
Requirements:
what we need to see:
hold a m.sc. or ph.d. in Computer Science, electrical or computer engineering from a leading university (or equivalent experience).
5+ years of industry experience (or equivalent) in system architecture, ai systems architecture, scaling of ai, parallelism of ai frameworks, or deep learning training workloads.
experienced in algorithm design, system programming, computer architecture and operating systems.
experienced in virtualization, networking and Storage.
deep understanding of performance profiling and optimization techniques, together with defining and using hardware features.
strong programming and software development skills.
ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
ways to stand out from the crowd:
shown research track record.
have experience and passion for system architecture, cpu/gpu/memory/ Storage /networking.
stellar communication skills.
knowledge in deep learning frameworks and ai communication libraries (nccl, ucx, mpi and equivalents).
deep understanding of inference and training workloads and optimizations, like prefill/decode, data parallelism, tensor parallelism, fdsp and others.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593803
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
19/03/2026
חברה חסויה
Location: Ra'anana
Job Type: Full Time
We are seeking an experienced IT/Lab Manager to lead the planning, deployment, and operations of our physical lab environment and IT systems. This role will focus on building and maintaining scalable, reliable, and secure environments to support engineering teams involved in research, quality assurance, validation, and related activities. It will also support internal collaborators. You will have an outstanding opportunity to drive innovation in a multidimensional, technology-focused company that is crafting the future of data-center and lab technologies. If you bring perfection and creative thinking while solving issues as they arise, and enjoy working with distributed teams - your place is with us!


What Youll Be Doing:
Own day-to-day operations, planning, and roadmap for the engineering lab and IT infrastructure (servers, storage, networking, and related services).
Lead and mentor an IT/Lab team, driving guidelines, standards, and a culture of ownership, partnership, and continuous improvement.
Collaborate closely with R&D, QE, Verification, and other engineering teams to design, provision, and maintain environments that meet their performance, reliability, and security needs.
Lead all aspects of running data center and lab operations, including rack layout, cabling, power and cooling, hardware lifecycle, and resource availability.
Lead procurement and vendor management for hardware, software, and services, including evaluation, negotiation, and ongoing relationship management.
Implement and maintain automation for system provisioning, configuration, and operations using tools such as shell/Perl/Ansible.
Design and maintain monitoring, logging, and alerting for servers, network, and storage systems to ensure high availability and rapid incident response.
Investigate and resolve sophisticated infrastructure issues across OS, networking, storage, virtualization, and application layers.
Requirements:
What we need to see:
B.Sc. or BA in Computer Science, Engineering, or a related field, or equivalent practical experience.
At least 10 years of overall experience in IT / systems administration, including extensive hands-on work with Linux/Unix environments.
At least 3 years of experience in a managerial or team-lead position within IT, lab, or infrastructure teams.
Vast experience with Linux/Unix system administration, including installation, configuration, troubleshooting, and performance tuning.
Demonstrable experience collaborating with engineering organizations (R&D, QE, Verification, etc.) and supporting their infrastructure needs.
Solid experience with data center and lab management, including server, network, and storage equipment deployment and lifecycle.
Demonstrated experience in procurement and vendor management for infrastructure hardware and software.
Proficiency in automation and scripting (e.g., shell, Perl, Ansible) for provisioning, configuration, and operational tasks.
Hands-on experience with monitoring and alerting solutions for infrastructure and services.
Strong debugging skills and experience resolving complex, cross-domain technical issues.

Ways To Stand Out From The Crowd:
Experience with Kubernetes (K8s) in on-prem or hybrid environments.
Hands-on work with Slurm, HPC clusters, and large-scale compute environments.
Background in HPC, large-scale Linux clusters, or performance-sensitive engineering environments.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8585125
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
22/03/2026
Location: Yokne`am and Tel Hai
Job Type: Full Time
We are looking for a Senior Software Engineer to join NSV tools (Network Solutions Validation) group. As a senior team member, you will be part of a development effort of high-performing software automation systems for our Data Center environments. You will interact with NIC, OS, Switch, HCA, CPU and GPU compute as well as architects, network engineers, and developers. We drive the data growth of the worlds biggest companies. With talented engineers around the globe, the work environment is dynamic, meaningful, and fast-paced. Are you ready for the challenge?

What youll be doing:

Design and develop an automation platform used to provision, configure, and monitor HPC data centers.

Implement scalable, reliable, and maintainable services that enhance cluster visibility and improve operational efficiency.

Collaborate closely with internal and external stakeholders to understand requirements and deliver robust full-cycle solutions.

Improve stability and performance across the provisioning pipeline through architectural enhancements and code optimizations.

Troubleshoot issues in distributed environments and contribute to system observability and reliability improvements.

Work cross-functionally with architects, DevOps engineers, product managers and stakeholders to ensure high-quality releases.

Participate in code reviews, technical design discussions, and continuous improvement activities within the team.
Requirements:
What we need to see:

B.Sc. in Computer Science, Engineering, or a related field (or equivalent practical experience).

5+ years of strong hands-on experience on Linux-based platforms.

Proficient scripting and automation skills (Bash, Python, Ansible).

Background in DevOps and Network Engineering practices.

Hands-on experience with large-scale network architectures, switches/routers, OVS, SR-IOV, and network operating/management systems.

Networking expertise: Ethernet, VLANs, TCP/UDP/IP, QoS, L2/L3 protocols, BGP, EVPN/VXLAN, and common network topologies.

Practical experience with containers and cloud-native technologies (Docker, Kubernetes) and networking performance.

Experience with version control systems (Git) and CI/CD pipelines.

Independent, fast learner with strong ownership mindset, excellent debugging and problem-solving skills, and effective communication abilities.

Ways to stand out from the crowd:

Experience as Team Lead/ Scrum master or similar leadership role.

Experience in planning, tracking, and delivering projects.

Familiarity with DevOps methodologies and tools (e.g., Jenkins, Ansible).

Hands-on experience with Docker and containerized environments.

Experience with agentic AI development.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8586566
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
26/03/2026
Location: Ra'anana
Job Type: Full Time
we are looking for an outstanding senior technical instructor to join our education services team. in this role, you will provide advanced training on nvidias data center, ai, and hpc platforms to customers and partners around the world. you will also contribute to developing training content, including presentation decks, hands-on manuals, and supporting materials. you will help engineers, architects, and operations teams build the skills needed to design, deploy, and operate our company -powered ai data centers with confidence. as part of this, you will take part in the design of hands-on workshops and lab environments that simulate real systems and data center components, enabling learners to practice on realistic end-to-end scenarios. 
what youll be doing:
deliver in-person and remote technical training on our data center, ai, and hpc platforms for customers and partners, with up to 25% travel as needed.
lead lab-based training sessions that walk learners through end-to-end workflows on our gpus, dgx/gb200 systems, high-speed networking, and ai/hpc software stacks.
collaborate with internal experts and content developers to build agendas, exercises, and demos that reflect real customer use cases and guidelines.
serve as a subject-matter guide (sme) for a dedicated production team and learning developers. they build primary content. partner with them on slideware, lab guides, videos, simulations, visual demos, and exercises. ensure technical accuracy and realism.
act as a technical ambassador for our company in front of partner and customer audiences, answering inquiries, facilitating discussions, and guiding learners through fix and what-if scenarios.
Requirements:
what we need to see:
bachelors degree or equivalent experience in Computer Science, engineering, or a related field.
8+ years of experience in data center or cloud infrastructure, including Linux server administration and deployment of ai / hpc workloads.
hands-on experience with core infrastructure technologies such as kubernetes, docker, monitoring and observability tools (e.g., grafana), virtualization platforms, and at least one major cloud provider (aws, azure, or gcp).
solid understanding of networking and Storage concepts relevant to modern ai/hpc data centers.
demonstrated experience delivering technical training or workshops, with excellent presentation, facilitation, and english communication skills.
ability to work effectively in a matrixed organization and collaborate with product, engineering, and enablement teams.
ways to stand out from the crowd:
direct hands-on experience with our technologies such as dgx systems, gb200 / superpod, infiniband networking, our ai enterprise, or related ai/hpc platforms.
prior experience building or delivering hands-on labs that simulate data center environments, including gpus, networking, Storage, and management tools.
relevant certifications (e.g., cka/ckad, cloud architect certifications, red hat, or advanced networking certifications).
proven track record as a technical instructor for enterprise customers or partners, especially in ai, hpc, or large-scale infrastructure domains.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593587
סגור
שירות זה פתוח ללקוחות VIP בלבד