דרושים » תוכנה » Senior HPC and AI Cluster Administrator

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 17 שעות
Location: More than one
Job Type: Full Time
We are looking for a Senior HPC and AI Cluster Administrator to join the Networking clusters solutions HPC/AI Infrastructure team. We are building supercomputers and AI clusters based on groundbreaking technologies. We are looking for a system administrator to be a key player to the most exciting computing hardware and software to contribute to the latest breakthroughs in artificial intelligence and GPU computing

You will work with the latest Accelerated computing and Deep Learning software and hardware platforms, and with many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. You will interact with HPC, OS, GPU compute, and systems specialist to architect, develop and bring up large scale performance platforms. Does this sound like you? If so, we would love to hear from you!

What you will be doing:

Deploy, manage and maintain large scale HPC/AI clusters.

Managing Linux job/workload schedules and orchestration tools.

Support and maintain continuous integration and delivery pipelines.

Troubleshooting and fixing, bottom up from bare metal, operating system, software stack and application level.

Supporting Research & Development activities and engaging in POCs/POVs for future improvements.
Requirements:
What we need to see:
Bachelor's Degree in Computer Science, Engineering, or a related field; or equivalent experience.

5+ years of experience.

Knowledge of HPC and AI solution technologies from CPUs and GPUs to high speed interconnects and supporting software.

Experience with job scheduling workloads and orchestration tools such as Slurm, K8s.

Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking (sockets, firewalls, iptables, wireshark, etc.) and internals, ACLs and OS level security protection and common protocols e.g. TCP, DHCP, DNS, etc.

Experience with multiple storage solutions such as Lustre, GPFS, zfs and xfs. Familiarity with newer and emerging storage technologies.

Python programming and bash scripting experience, automation and configuration management tools such as Jenkins, Ansible, Gitops.

Knowledge of Networking Protocols like InfiniBand, Ethernet.

Experience with virtual systems (for example VMware, Hyper-V, KVM).

Familiarity with cloud computing platforms (e.g. AWS, Azure, Google Cloud).

Ways to stand out from the crowd:

Knowledge of CPU and/or GPU architecture.

Knowledge of Kubernetes, container related microservice technologies.

Experience with GPU-focused hardware/software (DGX, Cuda).

Background with RDMA (InfiniBand or RoCE) fabrics.
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8542260
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
18/01/2026
חברה חסויה
Location: More than one
Job Type: Full Time
We are looking for an HPC and AI Data Center Engineer to join the networking cloud solutions HPC/AI Infrastructure team. We are focused on building supercomputers and HPC clusters based on groundbreaking technologies. We are looking for a lab manager, be a key player to the most exciting computing hardware and software to contribute to the latest breakthroughs in artificial intelligence and GPU computing. Take part of building large-scale compute and Deep Learning software and hardware platforms, work together and support many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions.

What you will be doing:

Plan and build complex cluster and supercomputers in various of data center and labs.

Rack stack and cable management to ensure efficient use of space and easy maintenance.

Ensure data centers and labs power and cooling efficiency while optimizing rack space utilization.

Data centers and labs daily operation and support.

Installations for variety of infrastructure and solutions - Cloud, VMs, Storage, Network, HPC and AI.

Perform troubleshooting - network, optic cabling, bare metal, operating system.

Support Research & Development activities.
Requirements:
What we need to see:

MCSE or MCITP/CCNA certification.

3+ years of experience as lab manager.

Experience in supporting large and complex data centers.

Proven hands-on experience in Linux troubleshooting with good problem identification, resolution and solving skills.

In depth knowledge in Linux & Windows Core Services: DHCP, DNS, NIS, AD, etc.

Team Work, Service oriented, organized.

Ways to stand out from the crowd:

Scripting experience in Bash and/or Python.

Experience with configuration managements tools known in the community (e.g. Ansible, puppet).

CI & Known Job schedulers tools (e.g. Jenkins, SLURM).

Virtualization: KVM / VMware / Hyper-V.

Experience with L2 & L3 network protocols.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8506713
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
3 ימים
חברה חסויה
Location: Yokne`am
Job Type: Full Time
We are looking for an AI Test Architect joining E2E Verification group to profile Innovative large scale Distributed training on NVIDIA AI End-to-End solutions in a large scale supercomputing clusters. Provide insights on at-scale system design and tuning mechanisms for large-scale compute runs. You will work with the latest Accelerated Computing and Deep Learning software and hardware platforms, with researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. You will interact with HPC, OS, Switch, HCA, CPU and GPU compute, and systems specialist to architect, develop and bring up large scale performance platforms.

What youll be doing:

Profiling, benchmarking, and analyzing deep learning models to identify areas for optimization and improvement in terms of performance, efficiency, and accuracy, with a strong emphasis on networking aspects.

Collaborating closely with data scientists, researchers, development, automation teams to design and implement scalable training pipelines and frameworks that demonstrate large scale high -performance networking capabilities.

Staying up-to-date with the latest advancements in deep learning algorithms, architectures, NVIDIA GPU technologies, and high-performance networking solutions.

Optimizing deep learning models for performance, memory usage, and power efficiency while maximizing high-performance networking features on NVIDIA supercomputers.

Providing insights and recommendations based on the analysis of large-scale training results, specifically focusing on networking bottlenecks and optimizations, to improve model outcomes and achieve business objectives.

Collaborating with hardware engineers to guide the development and integration of efficient networking solutions for deep learning, including exploring network architecture optimizations and bringing to bear technologies such as RDMA or InfiniBand.
Requirements:
What we need to see:

B.Sc. in Computer Science, Software Engineering, or equivalent experience

Strong understanding and practical experience with machine learning algorithms and techniques, with a specialization in deep learning and expertise in high-performance networking

8+ years of overall experience, with CUDA programming for deep learning frameworks like TensorFlow, PyTorch, combined with expertise in networking libraries and protocols

Ability to profile and optimize deep learning workflows, focusing on networking-related bottlenecks and optimizations, to improve overall performance and efficiency

Exceptional analytical and problem-solving skill, with a keen attention to detail, particularly in identifying and resolving networking performance issues

Excellent communication and collaboration skills, enabling effective teamwork and cooperation.

Familiarity with supercomputers, parallel computing, distributed systems, and high- performance networking technologies like RDMA or InfiniBand.

Ways to stand out from the crowd:

Demonstrated experience in successfully profiling and optimizing large-scale deep learning training on our supercomputers, with a significant focus on high-performance networking enhancements.

Experience with distributed deep learning, distributed training frameworks, or large-scale data pipelines enhanced by high-performance networking solutions.

Expertise in optimizing networking parameters, such as bandwidth, latency, or congestion control, for deep learning workloads.

Familiarity with NVIDIA's networking technologies, such as Mellanox InfiniBand, and their integration with deep learning workflows.

Strong understanding of high-performance networking protocols and standards and their application to deep learning.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8536135
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
11/01/2026
חברה חסויה
Location: Yokne`am
Job Type: Full Time
We are looking for an AI Test Architect joining E2E Verification group to profile Innovative large scale Distributed training on our AI End-to-End solutions in a large scale supercomputing clusters. Provide insights on at-scale system design and tuning mechanisms for large-scale compute runs. You will work with the latest Accelerated Computing and Deep Learning software and hardware platforms, with researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. You will interact with HPC, OS, Switch, HCA, CPU and GPU compute, and systems specialist to architect, develop and bring up large scale performance platforms.

What youll be doing:

Profiling, benchmarking, and analyzing deep learning models to identify areas for optimization and improvement in terms of performance, efficiency, and accuracy, with a strong emphasis on networking aspects.

Collaborating closely with data scientists, researchers, development, automation teams to design and implement scalable training pipelines and frameworks that demonstrate large scale high -performance networking capabilities.

Staying up-to-date with the latest advancements in deep learning algorithms, architectures, our GPU technologies, and high-performance networking solutions.

Optimizing deep learning models for performance, memory usage, and power efficiency while maximizing high-performance networking features on our supercomputers.

Providing insights and recommendations based on the analysis of large-scale training results, specifically focusing on networking bottlenecks and optimizations, to improve model outcomes and achieve business objectives.

Collaborating with hardware engineers to guide the development and integration of efficient networking solutions for deep learning, including exploring network architecture optimizations and bringing to bear technologies such as RDMA or InfiniBand.
Requirements:
What we need to see:

B.Sc. in Computer Science, Software Engineering, or equivalent experience.

Strong understanding and practical experience with machine learning algorithms and techniques, with a specialization in deep learning and expertise in high-performance networking.

8+ years of overall experience, with CUDA programming for deep learning frameworks like TensorFlow, PyTorch, combined with expertise in networking libraries and protocols.

Ability to profile and optimize deep learning workflows, focusing on networking-related bottlenecks and optimizations, to improve overall performance and efficiency.

Exceptional analytical and problem-solving skill, with a keen attention to detail, particularly in identifying and resolving networking performance issues.

Excellent communication and collaboration skills, enabling effective teamwork and cooperation.

Familiarity with supercomputers, parallel computing, distributed systems, and high- performance networking technologies like RDMA or InfiniBand.

Ways to stand out from the crowd:

Demonstrated experience in successfully profiling and optimizing large-scale deep learning training on our supercomputers, with a significant focus on high-performance networking enhancements.

Experience with distributed deep learning, distributed training frameworks, or large-scale data pipelines enhanced by high-performance networking solutions.

Expertise in optimizing networking parameters, such as bandwidth, latency, or congestion control, for deep learning workloads.

Familiarity with our networking technologies, such as Mellanox InfiniBand, and their integration with deep learning workflows.

Strong understanding of high-performance networking protocols and standards and their application to deep learning.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8496288
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
חברה חסויה
Location: Yokne`am
Job Type: Full Time
We are looking for an AI Test Architect joining E2E Verification group to profile Innovative large scale Distributed training on our AI End-to-End solutions in a large scale supercomputing clusters. Provide insights on at-scale system design and tuning mechanisms for large-scale compute runs. You will work with the latest Accelerated Computing and Deep Learning software and hardware platforms, with researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. You will interact with HPC, OS, Switch, HCA, CPU and GPU compute, and systems specialist to architect, develop and bring up large scale performance platforms.

What youll be doing:

Profiling, benchmarking, and analyzing deep learning models to identify areas for optimization and improvement in terms of performance, efficiency, and accuracy, with a strong emphasis on networking aspects.

Collaborating closely with data scientists, researchers, development, automation teams to design and implement scalable training pipelines and frameworks that demonstrate large scale high -performance networking capabilities.

Staying up-to-date with the latest advancements in deep learning algorithms, architectures, our GPU technologies, and high-performance networking solutions.

Optimizing deep learning models for performance, memory usage, and power efficiency while maximizing high-performance networking features on our supercomputers.

Providing insights and recommendations based on the analysis of large-scale training results, specifically focusing on networking bottlenecks and optimizations, to improve model outcomes and achieve business objectives.

Collaborating with hardware engineers to guide the development and integration of efficient networking solutions for deep learning, including exploring network architecture optimizations and bringing to bear technologies such as RDMA or InfiniBand.
Requirements:
What we need to see:

B.Sc. in Computer Science, Software Engineering, or equivalent experience.

Strong understanding and practical experience with machine learning algorithms and techniques, with a specialization in deep learning and expertise in high-performance networking.

8+ years of overall experience, with CUDA programming for deep learning frameworks like TensorFlow, PyTorch, combined with expertise in networking libraries and protocols.

Ability to profile and optimize deep learning workflows, focusing on networking-related bottlenecks and optimizations, to improve overall performance and efficiency.

Exceptional analytical and problem-solving skill, with a keen attention to detail, particularly in identifying and resolving networking performance issues.

Excellent communication and collaboration skills, enabling effective teamwork and cooperation.

Familiarity with supercomputers, parallel computing, distributed systems, and high- performance networking technologies like RDMA or InfiniBand.

Ways to stand out from the crowd:

Demonstrated experience in successfully profiling and optimizing large-scale deep learning training on NVIDIA supercomputers, with a significant focus on high-performance networking enhancements.

Experience with distributed deep learning, distributed training frameworks, or large-scale data pipelines enhanced by high-performance networking solutions.

Expertise in optimizing networking parameters, such as bandwidth, latency, or congestion control, for deep learning workloads.

Familiarity with NVIDIA's networking technologies, such as Mellanox InfiniBand, and their integration with deep learning workflows.

Strong understanding of high-performance networking protocols and standards and their application to deep learning.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8541318
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 23 שעות
Location: Yokne`am
Job Type: Full Time
We are looking for a Senior networking test engineer with strong system‑level debugging skills to join our End‑to‑End Verification team. You will work on cutting‑edge Ethernet‑based AI clusters, owning complex issues across hardware, system software and AI workloads. We are widely considered to be one of the technology worlds most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!

What youll be doing:

Design and review test and product requirements across the Ethernet / NIC / DPU / Switch portfolio, focusing on large‑scale AI cluster behavior.

Build and maintain realistic customer‑like testbeds, including heterogeneous hardware, OS / driver combinations and complex network fabrics.

Own end‑to‑end cluster troubleshooting: reproduce customer scenarios, triage across the stack and drive issues to root cause and fix.

Read and understand relevant source code to identify defects, validate fixes and improve logging and instrumentation.

Collaborate closely with development teams to debug NCCL, RoCE/RDMA and related networking components using logs, code inspection and targeted experiments.

Define tests and guide the automation team to implement robust suites that produce actionable logs, metrics and traces.

Run Regression, Performance, Functional and Scale testing, analyze results and provide clear, data‑driven reports to stakeholders.

Profile and benchmark deep learning training and inference workloads, correlating model‑level metrics with system and network telemetry to uncover bottlenecks.
Requirements:
What we need to see:

B.A./B.Sc. in Computer Science, Electrical Engineering, or equivalent IT/Network/Systems experience.

5+ years of hands‑on networking or system‑level testing and debugging on Linux.

Strong Linux networking and debugging skills (for example perf, tcpdump, ethtool, iproute2).

Proven production‑grade debugging experience: forming hypotheses, running experiments, and driving issues to root cause under pressure.

Expertise in host‑side NIC validation and tuning (offloads, queues, interrupts, firmware/driver interactions).

Strong knowledge of AI networking libraries (such as NCCL) and protocols (such as RoCE and RDMA), including performance and correctness debugging.

Ability to read and reason about source code (C/C++/Python or similar) and collaborate closely with developers on fixes.

Solid scripting and automation skills with Bash / Python / Ansible for setup, log collection, and experiment orchestration.

Fast learner, familiar with modern AI tools and workflows, able to adapt quickly.

Excellent analytical, problem‑solving and communication skills, with strong ownership and a collaborative mindset.

Ways to stand out from the crowd:

Hands‑on debugging of collective communication libraries (for example NCCL) or large‑scale LLM training / inference clusters.

Experience with large cluster environments (tens to thousands of GPUs or nodes), including incident response and post‑mortem analysis.

Deep expertise in tuning and debugging congestion control and lossless Ethernet for AI workloads (for example DCQCN, ECN, PFC).

Familiarity with NVIDIA networking technologies (for example BlueField / BF3, ConnectX NICs) and their software stack and diagnostics.

Experience debugging issues that span multiple layers (L2/L3, transport, AI frameworks) or contributing to open‑source networking / AI systems.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8541388
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
3 ימים
Location: More than one
Job Type: Full Time
We are looking for an outstanding Senior Technical instructor to join our Education Services team. In this role, you will provide advanced training on our data center, AI, and HPC platforms to customers and partners around the world. You will also contribute to developing training content, including presentation decks, hands-on manuals, and supporting materials. You will help engineers, architects, and operations teams build the skills needed to design, deploy, and operate our-powered AI data centers with confidence. As part of this, you will take part in the design of hands-on workshops and lab environments that simulate real systems and data center components, enabling learners to practice on realistic end-to-end scenarios.

What youll be doing:

Deliver in-person and remote technical training on our data center, AI, and HPC platforms for customers and partners, with up to 25% travel as needed.

Lead lab-based training sessions that walk learners through end-to-end workflows on our GPUs, DGX/GB200 systems, high-speed networking, and AI/HPC software stacks.

Collaborate with internal experts and content developers to build agendas, exercises, and demos that reflect real customer use cases and guidelines.

Serve as a subject-matter guide (SME) for a dedicated production team and learning developers. They build primary content. Partner with them on slideware, lab guides, videos, simulations, visual demos, and exercises. Ensure technical accuracy and realism.

Act as a technical ambassador for us in front of partner and customer audiences, answering inquiries, facilitating discussions, and guiding learners through fix and what-if scenarios.
Requirements:
What we need to see:

Bachelors degree or equivalent experience in computer science, engineering, or a related field.

8+ years of experience in data center or cloud infrastructure, including Linux server administration and deployment of AI / HPC workloads.

Hands-on experience with core infrastructure technologies such as Kubernetes, Docker, monitoring and observability tools (e.g., Grafana), virtualization platforms, and at least one major cloud provider (AWS, Azure, or GCP).

Solid understanding of networking and storage concepts relevant to modern AI/HPC data centers.

Demonstrated experience delivering technical training or workshops, with excellent presentation, facilitation, and English communication skills.

Ability to work effectively in a matrixed organization and collaborate with product, engineering, and enablement teams.

Ways to stand out from the crowd:

Direct hands-on experience with NVIDIA technologies such as DGX systems, GB200 / SuperPOD, InfiniBand networking, NVIDIA AI Enterprise, or related AI/HPC platforms.

Prior experience building or delivering hands-on labs that simulate data center environments, including GPUs, networking, storage, and management tools.

Relevant certifications (e.g., CKA/CKAD, cloud architect certifications, Red Hat, or advanced networking certifications).

Proven track record as a technical instructor for enterprise customers or partners, especially in AI, HPC, or large-scale infrastructure domains.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8536567
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 23 שעות
חברה חסויה
Location: Ra'anana
Job Type: Full Time
We are seeking an experienced IT/Lab Manager to lead the planning, deployment, and operations of our physical lab environment and IT systems. This role will focus on building and maintaining scalable, reliable, and secure environments to support engineering teams involved in research, quality assurance, validation, and related activities. It will also support internal collaborators. You will have an outstanding opportunity to drive innovation in a multidimensional, technology-focused company that is crafting the future of data-center and lab technologies. If you bring perfection and creative thinking while solving issues as they arise, and enjoy working with distributed teams - your place is with us!

What Youll Be Doing:
Own day-to-day operations, planning, and roadmap for the engineering lab and IT infrastructure (servers, storage, networking, and related services).
Lead and mentor an IT/Lab team, driving guidelines, standards, and a culture of ownership, partnership, and continuous improvement.
Collaborate closely with R&D, QE, Verification, and other engineering teams to design, provision, and maintain environments that meet their performance, reliability, and security needs.
Lead all aspects of running data center and lab operations, including rack layout, cabling, power and cooling, hardware lifecycle, and resource availability.
Lead procurement and vendor management for hardware, software, and services, including evaluation, negotiation, and ongoing relationship management.
Implement and maintain automation for system provisioning, configuration, and operations using tools such as shell/Perl/Ansible.
Design and maintain monitoring, logging, and alerting for servers, network, and storage systems to ensure high availability and rapid incident response.
Investigate and resolve sophisticated infrastructure issues across OS, networking, storage, virtualization, and application layers.
Requirements:
What we need to see:
B.Sc. or BA in Computer Science, Engineering, or a related field, or equivalent practical experience.
At least 10 years of overall experience in IT / systems administration, including extensive hands-on work with Linux/Unix environments.
At least 3 years of experience in a managerial or team-lead position within IT, lab, or infrastructure teams.
Vast experience with Linux/Unix system administration, including installation, configuration, troubleshooting, and performance tuning.
Demonstrable experience collaborating with engineering organizations (R&D, QE, Verification, etc.) and supporting their infrastructure needs.
Solid experience with data center and lab management, including server, network, and storage equipment deployment and lifecycle.
Demonstrated experience in procurement and vendor management for infrastructure hardware and software.
Proficiency in automation and scripting (e.g., shell, Perl, Ansible) for provisioning, configuration, and operational tasks.
Hands-on experience with monitoring and alerting solutions for infrastructure and services.
Strong debugging skills and experience resolving complex, cross-domain technical issues.

Ways To Stand Out From The Crowd:
Experience with Kubernetes (K8s) in on-prem or hybrid environments.
Hands-on work with Slurm, HPC clusters, and large-scale compute environments.
Background in HPC, large-scale Linux clusters, or performance-sensitive engineering environments.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8541475
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
08/01/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
We are searching for a strong technical leader to own the backbone of our Networking Research capabilities. We are looking for an Engineering Manager to lead the development of our high-fidelity Network Simulation platform and the extensive on-premise infrastructure that powers it.

In this role, you will lead a team of performance simulation software engineers and DevOps/Infrastructure specialists. You will own the "Simulation-as-a-Service" product-a critical platform used by internal researchers to model next-generation data center architectures. Your mission is to ensure our simulations are accurate, performant, and accessible, while managing the large-scale compute clusters required to run them.

What you'll be doing:

Team Leadership: Manage and mentor a team of C++ software engineers and DevOps infrastructure engineers, fostering a culture of performance, reliability, and code quality.

Product Ownership (Sim-as-a-Service): Treat the internal simulation platform as a product. Work with research partners to define the roadmap, prioritize features, and ensure high availability for users.

High-Performance Simulation: Be responsible for the architecture and optimization of complex network simulation engines (C++ based), ensuring they can scale to model extensive data center topologies with high fidelity.

Infrastructure Management: Own the lifecycle of our on-premise compute clusters and servers. Drive decisions on hardware upgrades, prioritisation, and managing system resources.

DevOps & Automation: Lead the strategy for CI/CD pipelines, automated testing, and containerized deployments to ensure rapid iteration and stability of the simulation platform.

multi-functional Collaboration: Partner with the AI Agents team to expose simulation APIs, enabling agents to run experiments and gather data autonomously.
Requirements:
What we need to see:

MSc, Ph.D. or equivalent experience in Computer Science, Electrical Engineering, or a related field.

8+ years of hands-on software engineering experience, with a proven track record of leading technical teams in systems or infrastructure domains for 3+ years.

3+ years of managerial experience.

C++ Expertise: Strong background in C++ development for high-performance applications (System-level programming, concurrent programming).

Infrastructure & DevOps: Practical experience managing on-premise servers, Linux environments, and modern DevOps tools (Kubernetes, Slurm, Docker, Ansible).

Operational Rigor: Ability to manage "heavy" operations-ensuring uptime, monitoring system health, and optimizing hardware utilization.

Ways to stand out from the crowd:

Networking Knowledge: Deep understanding of computer networking fundamentals (TCP/IP, Ethernet, InfiniBand, Congestion Control) and data center architectures.

Simulation/Modeling: Experience with discrete event simulation (DES) or modeling complex systems.

HPC Background: Experience working with MPI, CUDA, or other High-Performance Computing frameworks.

Specific Simulators: Familiarity with standard network simulators like OMNeT++, NS-3, or similar proprietary tools.

Hardware Knowledge: Understanding of switch micro-architecture or NIC design is a significant plus.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8494134
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a motivated and experienced Senior Software Engineer to join our Cloud and K8s Group. The successful candidate will possess a strong technical background in low-level systems programming and will excel in developing performant, efficient, and reliable software across multiple operating systems. Expertise in C++ and deep knowledge of Linux, macOS, and Windows internals are essential for this role, as you will be instrumental in building and optimizing our agent.

Key Responsibilities:

Design, implement, and optimize low-level system software components and libraries with a focus on performance and efficiency.
Analyze and debug complex issues related to operating system internals (kernel, drivers, memory management) across Linux, macOS, and Windows platforms.
Develop networking capabilities and optimize networking stack interactions within software modules.
Write clean, maintainable, and well-tested C++ code, while mentoring and reviewing peers contributions.
Collaborate closely with infrastructure, security, and product teams to design scalable and secure systems.
Contribute to CI/CD pipelines and automation workflows to streamline build, test, and deployment processes.
Develop and maintain scripting tools (e.g., Python, Bash, PowerShell) to support development and operational tasks.
Stay up to date with emerging technologies in systems programming, cybersecurity, and networking to continuously improve our solutions.
Requirements:
Bachelor's or Masters degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
Minimum of 5 years experience in software development with a strong focus on C++ and low-level programming.
Deep understanding of Linux, macOS, and Windows internals including kernel architecture, system calls, process and memory management.
Strong knowledge of networking protocols and experience writing performant and efficient code.
Experience with Golang is an advantage.
Background or interest in cybersecurity is a plus.
Familiarity with .NET development is beneficial.
Experience with CI/CD tools and pipelines (e.g., Jenkins, GitHub Actions) is preferable.
Proficient in scripting languages such as Python, Bash, or PowerShell.
Strong problem-solving skills and ability to work independently and in a team environment.
Excellent communication and collaboration skills.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8496587
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
11/01/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for Senior Software Engineer to join the Cumulus Linux team! We present you with an opportunity to be part of the team that develops the Network Operating System that powers data centers that are accelerated, disaggregated and software-defined to meet the exploding growth in AI and high-performance computing. You'll be part of a software development team responsible for defining and implementing core platform services, as well as Reliability, Availability and Serviceability features for Cumulus Linux, the Debian-based operating system for our market-leading Ethernet switches.

What you'll be doing:

Design and develop software for Cumulus Linux operating system (OS) which runs on our portfolio of data center switches.

Work on bringing up Cumulus Linux on next generation our switches.

Develop and maintain software in Python, C and Shell for our OS.

Collaborate with product, architecture, and engineering teams to deliver features on Cumulus Linuxs roadmap.

Debug and resolve issues reported by test and customer-facing teams.

Work with open source software that is part of our OS and fix issues as and when they are raised.
Requirements:
What we need to see:

BSc in Electrical Engineering or Computer Science (or equivalent experience).

5 + years of proven experience writing enterprise software.

Strong C and Python coding skills.

Previous experience with I2C, PSUs, SMBus, PHY Layer technologies and doing hardware bringups.

Good knowledge of Linux systems administration, Linux internals and tools.

Experience using source code management tools, as well as code coverage, unit testing and debugging tools.

Excellent written and verbal communication and interpersonal skills.

Able to work independently with minimal direction.

Ways to stand out from the crowd:

Strong background in Linux systems and Linux kernel networking.

Strong background in debugging kernel and hardware issues.

Familiarity with Data Center Networking technologies.

Exposure to CI/CD tools.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8496549
סגור
שירות זה פתוח ללקוחות VIP בלבד