דרושים » תוכנה » senior hpc and ai cluster administrator

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 21 שעות
Location: Yokne`am
Job Type: Full Time
we are looking for a senior hpc and ai cluster administrator to join the networking clusters solutions hpc/ai infrastructure team. we are building supercomputers and ai clusters based on groundbreaking technologies. we are looking for a system administrator to be a key player to the most exciting computing hardware and software to contribute to the latest breakthroughs in artificial intelligence and gpu computing
you will work with the latest accelerated computing and deep learning software and hardware platforms, and with many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. you will interact with hpc, os, gpu compute, and systems specialist to architect, develop and bring up large scale performance platforms. does this sound like you? if so, we would love to hear from you!
what you will be doing: deploy, manage and maintain large scale hpc/ai clusters
managing Linux job/workload schedules and orchestration tools
support and maintain continuous integration and delivery pipelines
troubleshooting and fixing, bottom up from bare metal, operating system, software stack and application level
supporting research & development activities and engaging in pocs/povs for future improvements.
Requirements:
what we need to see: bachelor's degree in Computer Science, engineering, or a related field; or equivalent experience
5+ years of experience
knowledge of hpc and ai solution technologies from cpus and gpus to high speed interconnects and supporting software
experience with job scheduling workloads and orchestration tools such as slurm, k8s
excellent knowledge of windows and Linux (redhat/centos and ubuntu) networking (sockets, firewalls, iptables, wireshark, etc.) and internals, acls and os level security protection and common protocols e.g. tcp, dhcp, dns, etc.
experience with multiple Storage solutions such as lustre, gpfs, zfs and xfs. familiarity with newer and emerging Storage technologies.
Python programming and bash scripting experience, automation and configuration management tools such as jenkins, ansible, gitops
knowledge of networking protocols like infiniband, ethernet
experience with virtual systems (for example VMware, hyper-v, kvm)
familiarity with cloud computing platforms (e.g. aws, azure, google cloud)
ways to stand out from the crowd: knowledge of cpu and/or gpu architecture
knowledge of kubernetes, container related microservice technologies
experience with gpu-focused hardware/software (dgx, cuda)
background with rdma (infiniband or roce) fabrics
our company has been redefining computer graphics, pc gaming, and accelerated computing for more than 25 years. we have a unique legacy of innovation thats fueled by great technology-and amazing people. today, were tapping into the unlimited potential of ai to define the next era of computing. an era in which our gpu acts as the brains of computers, robots, and self-driving cars that can understand the world. doing whats never been done before takes vision, innovation, and the worlds best talent. our teams are composed of driven, innovative professionals dedicated to pushing the boundaries of technology. we offer highly competitive salaries, an extensive benefits package, and a work environment that promotes diversity, inclusion, and flexibility. as an equal opportunity employer, we are committed to fostering a supportive and empowering workplace for all
#il-hybrid
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593421
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 21 שעות
Location: Yokne`am
Job Type: Full Time
we are looking for a data center network deployment engineer to join the networking clusters solutions hpc/ai infrastructure team. we are building supercomputers and ai clusters based on groundbreaking technologies. we are looking for a network/ system Engineer to be a key player to the most exciting computing hardware and software to contribute to the latest breakthroughs in artificial intelligence and gpu computing.
you will work with the latest accelerated computing and deep learning software and hardware platforms, and with many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. you will interact with hpc, os, gpu compute, and systems specialist to architect, develop and bring up large scale performance platforms. does this sound like you? if so, we would love to hear from you!
what you'll be doing:
deploy, manage and maintain large scale ai data centers - control, network and Storage stack
work with multiple software and hardware teams to optimize the clusters networking health and performance
develop and implement automation scripts for network, compute and Storage operations and deployments
supporting research & development activities and engaging in pocs/povs for future improvements.
Requirements:
what we need to see:
b.sc. in engineering or ccnp certificate
3+ years of proficiency in networking fundamentals, configuring ethernet switches, understanding the tcp/ip stack, and data center architecture.
excellent knowledge of windows and Linux (redhat/centos and ubuntu) networking (sockets, firewalls, iptables, wireshark, etc.) and internals, acls and os level security protection and common protocols e.g. tcp, dhcp, dns, etc.
proactive individual with the ability to work independently, prioritizing tasks to optimize technology and enhance Customer Experience.
provides ad-hoc knowledge transfers, develops handover materials, and offers deployment support for engagements.
ways to stand out from the crowd:
combination of interpersonal skills and technical competence
knowledge of hpc and ai solution technologies from cpus and gpus to high speed interconnects and supporting software
experience with multiple Storage solutions such as lustre, gpfs, and newer and emerging Storage technologies.
automation tooling background (ansible, salt, puppet etc.).
we are widely considered to be one of the technology worlds most desirable employers! we have some of the most forward-thinking and hardworking individuals in the world working for us. if you're creative and autonomous, we want to hear from you!
#il-hybrid
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593381
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 2 שעות
חברה חסויה
Location: Yokne`am
Job Type: Full Time
in this role, you will help build and evolve systems that support performance analysis, telemetry, and optimization for large-scale gpu- and cpu-based clusters used in ai and high-performance computing environments. you will work closely with hardware, networking, firmware, and software teams to collect, analyze, and interpret performance data from live systems. this is a fast-paced r&d environment where system behavior and requirements evolve rapidly, requiring adaptable engineering solutions and strong analytical thinking.
what youll be doing:
profile, benchmark, and analyze ai and hpc workloads on gpu and cpu clusters
explore performance characteristics of high-performance networking and collective communications (e.g., nccl, rdma, mpi, roce)
identify performance bottlenecks across networking, compute, memory, and system architecture
develop and enhance performance analysis, benchmarking, and diagnostic tools
define performance TEST plans and establish expectations for new technologies and platforms
collaborate across hardware, firmware, networking, systems, and software teams to provide actionable performance insights
support telemetry collection and data refinement efforts to enable accurate performance analysis
maintain high standards for  data quality, reproducibility, and traceability of performance results
Requirements:
what we need to see:
b.sc. or m.sc. in Computer Science, computer engineering, software engineering, or equivalent experience
5+ years of experience in performance analysis, systems engineering, or hpc/ai infrastructure
demonstrated expertise in performance analysis skills and methodologies
hands-on experience with high-performance networking (rdma, mpi, nccl, congestion control)
strong understanding of  system performance metrics (latency, throughput, resource utilization)
exposure to hardware, firmware, or Embedded telemetry environments
strong analytical, problem-solving, and communication skills
ability to work effectively in cross-functional, fast-paced r&d teams
ways to stand out from the crowd:
knowledge of cuda, nccl internals, and congestion control algorithms
deep system -level understanding of cpu architectures, gpus, hcas, memory, and pcie
experience with nvidia gpus, cuda, and deep learning frameworks such as pytorch or tensorflow
experience with cloud platforms 
proficiency in  Python ; experience with bash and C / C ++ is a plus as well as a strong experience working in  Linux environments
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8594112
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
18/03/2026
Location: Yokne`am
Job Type: Full Time
We are looking for a Senior networking test engineer with strong system‑level debugging skills to join our End‑to‑End Verification team. You will work on cutting‑edge Ethernet‑based AI clusters, owning complex issues across hardware, system software and AI workloads. NVIDIA is widely considered to be one of the technology worlds most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!

What youll be doing:

Design and review test and product requirements across the Ethernet / NIC / DPU / Switch portfolio, focusing on large‑scale AI cluster behavior.

Build and maintain realistic customer‑like testbeds, including heterogeneous hardware, OS / driver combinations and complex network fabrics.

Own end‑to‑end cluster troubleshooting: reproduce customer scenarios, triage across the stack and drive issues to root cause and fix.

Read and understand relevant source code to identify defects, validate fixes and improve logging and instrumentation.

Collaborate closely with development teams to debug NCCL, RoCE/RDMA and related networking components using logs, code inspection and targeted experiments.

Define tests and guide the automation team to implement robust suites that produce actionable logs, metrics and traces.

Run Regression, Performance, Functional and Scale testing, analyze results and provide clear, data‑driven reports to stakeholders.

Profile and benchmark deep learning training and inference workloads, correlating model‑level metrics with system and network telemetry to uncover bottlenecks.
Requirements:
What we need to see:

B.A./B.Sc. in Computer Science, Electrical Engineering, or equivalent IT/Network/Systems experience.

5+ years of hands‑on networking or system‑level testing and debugging on Linux.

Strong Linux networking and debugging skills (for example perf, tcpdump, ethtool, iproute2).

Proven production‑grade debugging experience: forming hypotheses, running experiments, and driving issues to root cause under pressure.

Expertise in host‑side NIC validation and tuning (offloads, queues, interrupts, firmware/driver interactions).

Strong knowledge of AI networking libraries (such as NCCL) and protocols (such as RoCE and RDMA), including performance and correctness debugging.

Ability to read and reason about source code (C/C++/Python or similar) and collaborate closely with developers on fixes.

Solid scripting and automation skills with Bash / Python / Ansible for setup, log collection, and experiment orchestration.

Fast learner, familiar with modern AI tools and workflows, able to adapt quickly.

Excellent analytical, problem‑solving and communication skills, with strong ownership and a collaborative mindset.

Ways to stand out from the crowd:

Hands‑on debugging of collective communication libraries (for example NCCL) or large‑scale LLM training / inference clusters.

Experience with large cluster environments (tens to thousands of GPUs or nodes), including incident response and post‑mortem analysis.

Deep expertise in tuning and debugging congestion control and lossless Ethernet for AI workloads (for example DCQCN, ECN, PFC).

Familiarity with NVIDIA networking technologies (for example BlueField / BF3, ConnectX NICs) and their software stack and diagnostics.

Experience debugging issues that span multiple layers (L2/L3, transport, AI frameworks) or contributing to open‑source networking / AI systems.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8584095
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
18/03/2026
חברה חסויה
Location: Tel Aviv-Yafo and Yokne`am
Job Type: Full Time
We are now looking for a HPC Operations Engineer to join our mission and continue improving our HPC infrastructure. A meaningful part of ourstrength is our unique and advanced development tools and environments that enable our incredible pace of innovation. We are looking for architects to help us evolve the way our private compute cloud is architected and optimized.

What youll be doing:

Troubleshoot incoming support requests in a large-scale HPC environment.

Contribute enhancements to existing deployment automation, configuration management, observability, and operational monitoring and day to day operation through automation.

Ensure compute servers are running correct Operating System and configuration.

Troubleshoot Complex Issues: Perform comprehensive troubleshooting from bare metal to application level, ensuring system reliability and efficiency.

Collaborate with specialist teams to drive issues to closure.

Collaborate with domain experts to improve how our chip development process utilizes our infrastructure.

Directly contribute to the overall quality and improve time to market for our next generation chips.
Requirements:
What we need to see:

BS in Computer Science or similar degree or equivalent experience

2+ years of experience Proficient in administering Centos/RHEL Linux distributions.

Understating of container technologies like Docker.

Proficiency in Python and UNIX scripting languages such as bash.

Excellent problem-solving skills, with the ability to analyze complex systems, identify bottlenecks, and implement scalable solutions.

Excellent communication and teamwork skills, with the ability to work effectively with diverse teams and individuals.

Solid understanding of cluster configuration managements tools such as Ansible.

Ways to stand out from the crowd:

Understanding of key Linux technologies such as NFS, automounter, LDAP, DNS, and TCP/IP networking in Red Hat Linux distribution flavors.

Familiarity with job scheduler administration (e.g. IBM Spectrum LSF or SLURM) and experience building/ operating large scale compute infrastructure.

Knowledge of the FlexLM license management system.

Proficiency in Perl for maintaining legacy automation scripts.

Familiarity with High-Speed Networking (InfiniBand, RDMA, RoCE etc.) and fast, distributed storage systems (Lustre, GPFS, etc.).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8583522
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 21 שעות
Location: Yokne`am
Job Type: Full Time
our company has been defining computer graphics, pc gaming, and accelerated computing for more than 25 years. with an outstanding legacy of innovation, driven by phenomenal technology, and extraordinary people, we are looking for a strong technical dev platform engineer to join us in shaping the future of software development. platform engineers are builders who turn strategy into daily tooling. their expertise is practical and broad. they are hands on, producing reliable infrastructure that teams depend on every day.
as a senior dev platform engineer in the ai-native development team, you will build the infrastructure that enables ai-assisted software development at scale. this includes process workflow automation, multi-agent collaboration platforms, ai-powered review pipelines, and integrations with enterprise tools. you will work closely with development teams to understand friction points and translate them into robust, Developer -friendly tooling.
what you'll be doing:
build and extend process workflow automation tools - managing the pipeline from specification to merge, enforcing review gates per risk level.
design and implement multi-agent collaboration infrastructure - enabling multiple ai agents and humans to work on the same project with isolation, shared state, and handoff protocols.
build ai-powered review orchestration - deploying parallel expert reviewers with domain-specific configurations.
integrate ai development workflows with enterprise tools (issue trackers, source control, ci/cd pipelines) - auto-status updates, ai agent assignment, work breakdown.
develop markdown-based documentation workflows with review plugins.
rapidly iterate on tooling based on direct feedback from development teams.
Requirements:
what we need to see:
hold a b.sc. or m.sc. in Computer Science, electrical or computer engineering from a leading university (or equivalent experience).
7+ years of industry experience in software engineering with a focus on Developer tools, infrastructure, or platform engineering.
strong software engineering fundamentals - can build reliable, well-tested tooling that others depend on.
experience with Developer workflows: git, ci/cd, code review systems, branch strategies.
comfortable with ai-assisted development - already uses ai coding assistants in daily work.
can work across the stack - cli tools, apis, webhooks, integrations.
strong programming skills ( Python required; shell scripting, familiarity with web apis).
ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
ways to stand out from the crowd:
experience building mcp servers, ai agent orchestration, or llm-powered tooling.
familiarity with enterprise tool apis (issue trackers, source control platforms, ci systems).
has contributed to or built Developer productivity tools (open-source or internal).
experience with state machines, pipeline orchestration, or workflow engines.
knowledge of container orchestration (docker, kubernetes) and deployment automation.
we are widely considered to be one of the technology world's most desirable employers. we have some of the most forward-thinking and hardworking people in the world working for us. if you're creative and autonomous, we want to hear from you! we are committed to fostering a diverse work environment and proud to be an equal opportunity employer. as we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593456
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 17 שעות
Location: Yokne`am
Job Type: Full Time
looking for senior software program manager that will be responsible for software programs and projects. the pm should drive planning and execution of fw/sw projects while aligning with corporate priorities and constraints.
 a leading supplier of innovative end-to-end infiniband and ethernet connectivity solutions and services for servers and Storage. we offer best-in-class solutions that include adapter cards, switches, cables, and software to support networking technologies. our products optimize data center performance and deliver industry-leading bandwidth and scalability. in addition, we serve a wide range of markets including high-performance computing, enterprise, data centers, cloud computing, Big Data and web 2.0. we are constantly reinventing ourselves to stay ahead of the market and bring groundbreaking products and services to the industry. our product line is focusing on delivering the most optimized ethernet solutions for industries like media and entertainment as well as any other industry that can benefit from our datastream and tcp/ip acceleration. 
what you'll be doing:
you will manage the networking software programs for nvidia next generation ai  data centers 
responsible to coordinate between all project stakeholders such as marketing, engineering teams in il and around the world, operations, etc. from initial requirements definition through architectural stage, execution, and delivery.
develop and execute feature planning and prioritization of perception capabilities to meet the software programs' needs
identify risks, gaps, and bottlenecks in time, and find resolution with technical leaders and project management
work with product managers, architects, and engineers to ensure consistency with company strategy, commitments, and goals
Requirements:
what we need to see:
b.sc. or m.sc. in Computer Science, electrical engineering, or related field
expert with software project management methodologies and tools
8+ years experience in software project management or leadership
experience in software development over hardware/silicon products
teammate, independent, responsible, capable of multi-tasking, ability to drive people and tasks
excellent verbal and written communication skills with english proficiency
ability and willingness to work in a dynamic environment and flexible hours, with teams all over the world 
ways to stand out from the crowd:
technical orientation, including the ability to conduct technical discussions
experience with tools such as ms excel, ms project, power BI
networking background
experience in multiple groups coordination
familiarity with sw agile concept
if you're creative and autonomous, we want to hear from you! nvidia is committed to fostering a diverse work environment and is proud to be an equal opportunity employer. as we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593858
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 2 שעות
Location: Yokne`am
Job Type: Full Time
the company networking advanced development software team develops new groundbreaking technologies to enable new market shares for the company and tighten customer relationships. these are emerging technologies in networking and distributed computing for the booming ai factories and data centers. they span areas such as ai neural networks, deep learning, high performance computing (hpc), Storage, cloud, sw defined network, network function virtualization and more. we develop the solutions top-down, all the way from application behavioral analysis, to architecture definition and down to the implementation, using the world-leading our devices. the development traverses any needed component - application sw, middleware sw, os Kernel subsystems, device drivers, Embedded sw (firmware) and cuda gpu. we collaborate with partners and key customers in the analysis processes and engage with open source communities introducing our leading features.
what youll be doing:
design and implement solutions throughout all layers from high level application, os and driver subsystem to firmware
work on impactful projects involving state-of-the-art high-performance computing hardware and software
provide insight and technical guidance and collaborate with peers from across the company - including software architecture, chip architecture, and engineering departments to improve our future technology
collaborate with our partners and customers
Requirements:
what we need to see:
b.sc. in Computer Science, electrical engineering, computer engineering, or a related field
5+ overall years of industry experience in system programming or related fields.
understanding of multi core hardware, operating systems design, concurrency, virtual memory, caching, interrupts, device drivers, Real-Time
excellent programming skills
ability to learn complex concepts in a fast pace environment.
a teammate with a can-do attitude, high energy and excellent interpersonal skills
ways to stand out from a crowd:
familiarity with networking protocols
hands-on experience with cuda programming and gpu acceleration
hands-on experience with llm serving frameworks
experience with open-source projects (coursework, personal, or contributions)
working in a fast-paced and dynamic environment 
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8594147
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
02/03/2026
Location: Yokne`am
Job Type: Full Time
The Networking Advanced Development Software team develops new groundbreaking technologies to enable new market shares for the company and tighten customer relationships. These are emerging technologies in networking and distributed computing for the booming AI factories and data centers. They span areas such as AI neural networks, Deep Learning, High Performance Computing (HPC), Storage, Cloud, SW Defined Network, Network Function Virtualization and more. We develop the solutions top-down, all the way from application behavioral analysis, to architecture definition and down to the implementation, using the world-leading our devices. The development traverses any needed component - application SW, middleware SW, OS kernel subsystems, device drivers, embedded SW (Firmware) and CUDA GPU. We collaborate with partners and key customers in the analysis processes and engage with open source communities introducing our leading features.

What youll be doing:

Design and implement solutions throughout all layers from high level application, OS and driver subsystem to firmware.

Work on impactful projects involving state-of-the-art high-performance computing hardware and software.

Provide insight and technical guidance and collaborate with peers from across the company - including software architecture, chip architecture, and engineering departments to improve our future technology.

Collaborate with our partners and customers.
Requirements:
What we need to see:

B.Sc. in Computer Science, Electrical Engineering, Computer Engineering, or a related field.

5+ overall years of industry experience in system programming or related fields.

Understanding of multi core hardware, operating systems design, concurrency, virtual memory, caching, interrupts, device drivers, real-time

Excellent programming skills.

Ability to learn complex concepts in a fast pace environment.

A teammate with a can-do attitude, high energy and excellent interpersonal skills.

Ways to stand out from a crowd:

Familiarity with networking protocols.

Hands-on experience with CUDA programming and GPU acceleration.

Hands-on experience with LLM serving frameworks.

Experience with open-source projects (coursework, personal, or contributions).

Working in a fast-paced and dynamic environment.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8566056
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
4 ימים
Job Type: Full Time
We're looking for a Senior AI/MLOps Engineer to join a group that specializes in Security and Networking, and specifically ML, AI and agent development. As a Senior AI/MLOps Engineer, youll build and maintain the infrastructure, tools and processes necessary to support the AI lifecycle in a production environment. You will collaborate closely with data scientists, software engineers, security architects and DevOps teams to ensure smooth deployment, modeling and optimization of AI models. This role involves creative problem solving alongside engineering teams, and is pivotal for the continued success of AI networking security.

What youll be doing:

Developing, improving and optimizing scalable infrastructure for handling and deploying security and networking AI models and agents in production, ensuring high availability, scalability, reproducibility, and performance.

Optimizing AI models and agents for performance, scalability, and resource utilization, considering factors such as latency, efficiency, and cost.

Monitoring and deploying agentic systems, LLMs, and ML models in production.

Designing and implementing frameworks/pipelines for AI training, inference, and experimentation.

Collaborating closely with data scientists, security architects and software engineers to operationalize and deploy AI models and agents, including packaging and integration with existing systems. Participate in developing and reviewing code, design documents, use case reviews, and test plan reviews.

Collaborating with DevOps teams to integrate pipelines and workflows into the CI/CD process, ensuring flawless deployments and rollbacks.

Building and maintaining monitoring and alerting systems to proactively identify and resolve issues relating to quality, performance and infrastructure.

Implementing access controls, authentication mechanisms, and encryption standards for AI models and data.

Documenting guidelines, and standard operating procedures for MLOps/AI processes and sharing knowledge with the wider team.

Develop proof-of-concepts for new features.
Requirements:
What we need to see:

BSc/MSc in CS/CE or related field (or equivalent experience).

Strong background in AI with experience deploying and monitoring AI/ML models, LLMs and agents to production systems at scale, including distributed and multi-node environments - at least 5 years of experience.

Proficiency in programming languages such as Python, Java, or Scala, along with experience in using ML/AI frameworks and libraries (e.g. TensorFlow, PyTorch).

Proficiency in microservices architecture, container orchestration, cloud platforms, and scalable infrastructure for training and inference workloads.

Knowledge of inference optimization techniques.

Understanding of build infrastructure and CI/CD tools and practices (e.g. GitLab, GitHub Actions, Jenkins).

You are detail-oriented and care deeply about robust, well tested, high-performance code in production environments.

You are proactive, take full ownership of your deliverables, have a can-do approach, and excellent communication and collaboration skills, able to work effectively in multifunctional teams.

Ways to stand out from the crowd:

Knowledge of network protocols and Linux internals.

Security and networking background, with knowledge of security protocols, network architectures, firewalls, intrusion detection systems, and other relevant security and networking concepts.

Experience deploying and optimizing generative models and agents.

Knowledge of network security principles and practices.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8586605
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
4 ימים
Location: Yokne`am and Tel Hai
Job Type: Full Time
We are looking for a Senior Software Engineer to join NSV tools (Network Solutions Validation) group. As a senior team member, you will be part of a development effort of high-performing software automation systems for our Data Center environments. You will interact with NIC, OS, Switch, HCA, CPU and GPU compute as well as architects, network engineers, and developers. We drive the data growth of the worlds biggest companies. With talented engineers around the globe, the work environment is dynamic, meaningful, and fast-paced. Are you ready for the challenge?

What youll be doing:

Design and develop an automation platform used to provision, configure, and monitor HPC data centers.

Implement scalable, reliable, and maintainable services that enhance cluster visibility and improve operational efficiency.

Collaborate closely with internal and external stakeholders to understand requirements and deliver robust full-cycle solutions.

Improve stability and performance across the provisioning pipeline through architectural enhancements and code optimizations.

Troubleshoot issues in distributed environments and contribute to system observability and reliability improvements.

Work cross-functionally with architects, DevOps engineers, product managers and stakeholders to ensure high-quality releases.

Participate in code reviews, technical design discussions, and continuous improvement activities within the team.
Requirements:
What we need to see:

B.Sc. in Computer Science, Engineering, or a related field (or equivalent practical experience).

5+ years of strong hands-on experience on Linux-based platforms.

Proficient scripting and automation skills (Bash, Python, Ansible).

Background in DevOps and Network Engineering practices.

Hands-on experience with large-scale network architectures, switches/routers, OVS, SR-IOV, and network operating/management systems.

Networking expertise: Ethernet, VLANs, TCP/UDP/IP, QoS, L2/L3 protocols, BGP, EVPN/VXLAN, and common network topologies.

Practical experience with containers and cloud-native technologies (Docker, Kubernetes) and networking performance.

Experience with version control systems (Git) and CI/CD pipelines.

Independent, fast learner with strong ownership mindset, excellent debugging and problem-solving skills, and effective communication abilities.

Ways to stand out from the crowd:

Experience as Team Lead/ Scrum master or similar leadership role.

Experience in planning, tracking, and delivering projects.

Familiarity with DevOps methodologies and tools (e.g., Jenkins, Ansible).

Hands-on experience with Docker and containerized environments.

Experience with agentic AI development.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8586566
סגור
שירות זה פתוח ללקוחות VIP בלבד