דרושים » תוכנה » senior hpc performance engineer

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
2 ימים
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
conduct in-depth performance characterization and analysis on large multi-gpu and multi-node clusters.
study the interaction of our libraries with all hw (gpu, cpu, networking) and sw components in the stack
evaluate proof-of-concepts, conduct trade-off analysis when multiple solutions are available
triage and root-cause performance issues reported by our customers
collect a lot of performance data ; build tools and infrastructure to visualize and analyze the information
collaborate with a very dynamic team across multiple time zones
Requirements:
what we need to see:
m.s. (or equivalent experience) or phd in Computer Science, or related field with relevant performance engineering and hpc experience
3+ yrs of experience with parallel programming and at least one communication runtime (mpi, nccl, ucx, nvshmem)
experience conducting performance benchmarking and triage on large scale hpc clusters
good understanding of computer system architecture, hw-sw interactions and operating systems principles (aka systems software fundamentals)
implement micro-benchmarks in C / C ++, read and modify the code base when required
ability to debug performance issues across the entire hw/sw stack. proficient in a scripting language, preferably Python
familiar with containers, cloud provisioning and scheduling tools (kubernetes, slurm, ansible, docker)
adaptability and passion to learn new areas and tools. flexibility to work and communicate effectively across different teams and timezones
ways to stand out from the crowd:
practical experience with infiniband/ethernet networks in areas like rdma, topologies, congestion control
experience debugging network issues in large scale deployments
familiarity with cuda programming and/or gpus
experience with deep learning frameworks such pytorch, tensorflow
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593744
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
2 ימים
Location: Tel Aviv-Yafo
Job Type: Full Time
our company is leading the way in groundbreaking developments in artificial intelligence, high performance computing and visualization. the gpu, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.
come work for the team that brought to you nccl, nvshmem & gpudirect. our gpu communication libraries are crucial for scaling deep learning and hpc applications! we are looking for a motivated partner enablement engineer to guide our key partners and customers with nccl. most DL /hpc applications run on large clusters with high-speed networking (infiniband, roce, ethernet). this is an outstanding opportunity to get an end to end understanding of the ai networking stack. are you ready for to contribute to the development of innovative technologies and help realize our vision?
what you will be doing:
engage with our partners and customers to root cause functional and performance issues reported with nccl
conduct performance characterization and analysis of nccl and DL applications on groundbreaking gpu clusters
develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (azure, aws, gcp, etc.)
guide our customers and support teams on hpc knowledge and standard methodologies for running applications on multi-node clusters
document and conduct trainings/webinars for nccl
engage with internal teams in different time zones on networking, gpus, Storage, infrastructure and support.
Requirements:
what we need to see:
b.s./m.s. degree in cs/ce or equivalent experience with 5+ years of relevant experience. experience with parallel programming and at least one communication runtime (mpi, nccl, ucx, nvshmem)
excellent C / C ++ programming skills, including debugging, profiling, code optimization, performance analysis, and TEST design
experience working with engineering or academic research community supporting hpc or ai
practical experience with high performance networking: infiniband/roce/ethernet networks, rdma, topologies, congestion control
expert in Linux fundamentals and a scripting language, preferably Python
familiar with containers, cloud provisioning and scheduling tools (docker, docker swarm, kubernetes, slurm, ansible)
adaptability and passion to learn new areas and tools
flexibility to work and communicate effectively across different teams and timezones
ways to stand out from the crowd:
experience conducting performance benchmarking and developing infrastructure on hpc clusters. prior system administration experience, esp for large clusters. experience debugging network configuration issues in large scale deployments
familiarity with cuda programming and/or gpus. good understanding of Machine Learning concepts and experience with deep learning frameworks such pytorch, tensorflow
deep understanding of technology and passionate about what you do
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593743
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
6 ימים
Location: Tel Aviv-Yafo
Job Type: Full Time
Help build an Always-On, low-overhead GPU profiling service that runs in production, scales across cluster environments, and delivers actionable insights for ML workloads. You will be hands-on delivering our profiling solutions across system software, drivers, and CUDA to make profiling continuously available and reliable.

What youll be doing:

Develop low-overhead, high-reliability implementations in C/C++, with bounded CPU/memory budgets.

Lead end-to-end feature delivery spanning user-mode components, driver/platform layers, and performance counter/trace providers.

Establish profiling models that integrate with existing ML/AI workflows (e.g., PyTorch/XLA) to turn low-level signals into actionable insights.
Requirements:
What we need to see:

BS or MS degree or equivalent experience in Computer Engineering, Computer Science, or related degree.

5+ years of system-level C/C++ development, including concurrency, memory management, and performance engineering.

Familiarity with system software design, operating systems fundamentals, computer architectures, performance analysis, and delivering production-quality software.

Strong interpersonal, verbal, and written communication; able to influence across organizations and build trust with external collaborators.

Ways to stand out from the crowd:

Extensive experience with profiling/tracing stacks for CPU/GPU (e.g., CUPTI, Nsight, performance counters, event correlation) and debugging highly concurrent systems.

Deep hands-on knowledge of CUDA and GPU architecture, including runtime/driver APIs, CUDA streams/graphs, and kernel behavior.

Track record building continuous, always-on, or multi-client profiling systems designed for predictable overhead at scale.

Hands-on experience tuning ML training/inference loops based on deep profiling analysis, with familiarity in ML ecosystems (e.g., PyTorch, JAX) and correlating application events with GPU metrics to translate data into actionable performance insights (e.g., bottleneck triage, compute vs. memory bound).

Experience with user-mode driver development and integration within platform security and permissions models.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8586600
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
Location: Tel Aviv-Yafo
Job Type: Full Time
looking for a strong technical senior architect to join us in shaping the future. senior architects are innovators who can translate business needs into workable technology solutions. their expertise is deep and broad. they are hands on, producing both detailed technical work and high-level architectural designs.
as a senior architect in the ai networking research team, you will explore technological challenges on accelerate networking and building ai data centers. research new transport functions and semantics for optimizing ai workloads, ai systems communication and accelerations and much more. you will also be leading architectural and development efforts across numerous technological fields, related to the modern ai data center, such as distributed ai and deep learning solutions, data analytics, high performance computing (hpc), software defined networking (sdn), virtualization, Storage, and more.
what youll be doing:
co-design hardware features (e.g., in gpus, dpus, or interconnects) that accelerate data movement and enable new capabilities for inference and model serving. 
identify and evaluate new technologies, innovations and partner relationships for alignment with our technology roadmap and business value.
lead architecture and design of new technologies and innovations such as runtime systems, communication libraries, ai-specific technologies.
lead proof-of-concept development to evaluate and drive such technologies.
Requirements:
what we need to see:
hold a m.sc. or ph.d. in Computer Science, electrical or computer engineering from a leading university (or equivalent experience).
5+ years of industry experience (or equivalent) in system architecture, ai systems architecture, scaling of ai, parallelism of ai frameworks, or deep learning training workloads.
experienced in algorithm design, system programming, computer architecture and operating systems.
experienced in virtualization, networking and Storage.
deep understanding of performance profiling and optimization techniques, together with defining and using hardware features.
strong programming and software development skills.
ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
ways to stand out from the crowd:
shown research track record.
have experience and passion for system architecture, cpu/gpu/memory/ Storage /networking.
stellar communication skills.
knowledge in deep learning frameworks and ai communication libraries (nccl, ucx, mpi and equivalents).
deep understanding of inference and training workloads and optimizations, like prefill/decode, data parallelism, tensor parallelism, fdsp and others.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593803
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Required Senior Algo Data Engineer
Realize your potential by joining the leading performance-driven advertising company!
As a Senior Algo Data Engineer on the Infra group, youll play a vital role in develop, enhance and maintain highly scalable Machine-Learning infrastructures and tools.
About Algo platform:
The objective of the algo platform group is to own the existing algo platform (including health, stability, productivity and enablement), to facilitate and be involved in new platform experimentation within the algo craft and lead the platformization of the parts which should graduate into production scale. This includes support of ongoing ML projects while ensuring smooth operations and infrastructure reliability, owning a full set of capabilities, design and planning, implementation and production care.
The group has deep ties with both the algo craft as well as the infra group. The group reports to the infra department and has a dotted line reporting to the algo craft leadership.
The group serves as the professional authority when it comes to ML engineering and ML ops, serves as a focal point in a multidisciplinary team of algorithm researchers, product managers, and engineers and works with the most senior talent within the algo craft in order to achieve ML excellence.
How youll make an impact:
As a Senior Algo Data Engineer, youll bring value by:
Develop, enhance and maintain highly scalable Machine-Learning infrastructures and tools, including CI/CD, monitoring and alerting and more
Have end to end ownership: Design, develop, deploy, measure and maintain our machine learning platform, ensuring high availability, high scalability and efficient resource utilization
Identify and evaluate new technologies to improve performance, maintainability, and reliability of our machine learning systems
Work in tandem with the engineering-focused and algorithm-focused teams in order to improve our platform and optimize performance
Optimize machine learning systems to scale and utilize modern compute environments (e.g. distributed clusters, CPU and GPU) and continuously seek potential optimization opportunities.
Build and maintain tools for automation, deployment, monitoring, and operations.
Troubleshoot issues in our development, production and test environments
Influence directly on the way billions of people discover the internet
Our tech stack:
Java, Python, TensorFlow, Spark, Kafka, Cassandra, HDFS, vespa.ai, ElasticSearch, AirFlow, BigQuery, Google Cloud Platform, Kubernetes, Docker, git and Jenkins.
Requirements:
Experience developing large scale systems. Experience with filesystems, server architectures, distributed systems, SQL and No-SQL. Experience with Spark and Airflow / other orchestration platforms is a big plus.
Highly skilled in software engineering methods. 5+ years experience.
Passion for ML engineering and for creating and improving platforms
Experience with designing and supporting ML pipelines and models in production environment
Excellent coding skills - in Java & Python
Experience with TensorFlow - a big plus
Possess strong problem solving and critical thinking skills
BSc in Computer Science or related field.
Proven ability to work effectively and independently across multiple teams and beyond organizational boundaries
Deep understanding of strong Computer Science fundamentals: object-oriented design, data structures systems, applications programming and multi threading programming
Strong communication skills to be able to present insights and ideas, and excellent English, required to communicate with our global teams.
Bonus points if you have:
Experience in leading Algorithms projects or teams.
Experience in developing models using deep learning techniques and tools
Experience in developing software within a distributed computation framework.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8559383
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
2 ימים
Location: Tel Aviv-Yafo
Job Type: Full Time
we are looking for a 100% hands-on network software engineer to join the block Storage group. you will be a member of a team that builds the next generation block Storage capabilities. you will work closely with a variety of teams and architects including the networking team, and external customers. you will take part in defining the software architecture and implementation of the most advanced Storage services! services that will need to meet extreme performance and scalability demands! we have crafted a team of extraordinary people stretching around the globe, whose mission is to push the frontiers of what is possible today and define the platform of tomorrow.
at our company, we work, think and learn as a team. we thrive in a deeply strong environment, and we're passionate about a culture that demands innovation and the highest standards. the rewards are sweet and include collaborating with some of the smartest people in the industry, an aggressive compensation plan that rewards top performers, and the opportunity to work on products that transform the way people work and play.
what youll be doing:
100% hands-on coding role in C language, Kernel and userspace
research, design, implement and TEST, new and existing, advance networking service and features of our block Storage solution, in both host and dpu environments.
acquire understanding of the algorithms, the technicalities and the interaction with other components across our block Storage ecosystem.
analyze and solve challenging bugs and customer cases in large scale production systems, identifying issues in our or inbox Kernel modules and often in other components. drive new solutions based on any issues that arise
Requirements:
what we need to see:
b.sc., m.sc.. in Computer Science, electrical engineering or related discipline (or equivalent experience).
12+ years of experience as a senior Developer, preferably in the domains of Storage, networking, and/or operating-systems.
strong proficiency in C / C ++ programming.
knowledge of networking fundamentals and experience in Linux -based networking environments.
familiarity with rdma technologies, including infiniband, roce, or iwarp, and experience with rdma programming models, control and data paths
comprehension of large and complexed systems.
proven professional experience in designing and developing distributed systems; advantage for experience in block Storage and/or networking systems.
ability to work autonomously, with a proactive mindset and perseverance to solve day to day challenges.
ability to quickly adapt to new technology and go deep into new areas
excellent communication skills and a collaborative mindset.
innovative approach, identifying opportunities to improve, accelerate, and reuse existing solutions.
knowledge of cloud computing concepts, including virtualization, scalability, and data management.
ways to stand out from the crowd:
Linux Kernel coding experience
Linux Kernel internals knowledge including memory management, scheduling, etc.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593625
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
19/03/2026
Location: Tel Aviv-Yafo and Yokne`am
Job Type: Full Time
seeking a dynamic and highly motivated Software Senior Manager to lead our BlueField DPU Platform software team. We are looking for a candidate who can excel in a sophisticated, multidisciplinary environment, take ownership, and drive high-quality software development including low-level device initialization, Linux OS drivers and kernel configuration, boards bring-up and system management. This position offers the opportunity to have a real impact on sophisticated, groundbreaking products, delivered by us and developed by our customers, empowering the most advanced data centers in the world. We believe our most valuable asset is our people and seek the very best to lead our outstanding team. This role requires close collaboration with teams across various fields (SW, HW, QA) to elevate our product to the next level.

What you will be doing:

Mentor and expand your engineering team in the planning and execution of initiatives and projects with top quality and timely results.

Coordinate feature design and implementation as well issue resolution, as this is a technical leadership role.

Interact with internal and external partners to understand their use cases and requirements. Collaborate with engineering teams, program and product management across the product roadmap.

Continuously review and identify improvement opportunities in established processes, infrastructure, and practices to ensure the teams are implementing in the most efficient and open manner.

Develop a team of engineers who understand the bigger picture, value collaboration, and can take ownership of and implement designs from beginning to end.

Be familiar with the open-source community process to advance industry-standard programming models and platform support while upstreaming and maintaining software into standard software distributions.
Requirements:
What we need to see:

B.Sc. degree or equivalent experience in Computer Science, Computer Engineering, or Electrical Engineering.

12+ overall years of experience in the software industry with specialization in embedded Linux system software stack and Arm preboot development.

4+ years of experience managing managers or senior engineers.

Proven track record of taking several complex software features or products through the full product life cycle.

Strong understanding of computer system architecture, operating systems principles, HW-SW interactions, and performance analysis/optimizations.

Proficient in C, C++ with the technical depth to guide and mentor the team

Experience balancing multiple projects with conflicting priorities.

Flexibility to work and communicate effectively across different teams and time zones.

Ways to stand out from the crowd:

Demonstrated leadership of engineering teams doing embedded Linux and preboot Arm work.

Experience with ARMv8 microarchitecture, ATF, and/or UEFI software.

Knowledge of secure boot flows and/or trusted computing environments is a strong plus.

A good sense of humor is key. We like to have a positive team environment.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8585256
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
Location: Tel Aviv-Yafo
Job Type: Full Time
we seek a versatile senior software engineer who is passionate about performance optimization and generative ai. our team builds software solutions that enable efficient inference on the latest and greatest generative ai models. we tackle problems on all levels of the stack-from server-level request batching to gpu Kernel fusion-and collaborate with teams across diverse disciplines to push nvidia's hardware to its full potential.
what youll be doing:
cooperate with research teams to onboard new llms and vlms into nvidia's opensource ai runtimes
optimize inference workloads using sophisticated profiling and simulation tools
build solid, extendable inference software systems, and refine robust apis
implement and debug low-level gpu code to harness the latest hw features
own end-to-end inference acceleration features and work with teams around the world to deliver production-grade products
Requirements:
what we need to see:
b.sc., m.sc. or equivalent experience in Computer Science or computer engineering
5+ years of relevant hands-on software engineering experience
profound knowledge of software design principles
strong proficiency in at least one system and one scripting language
strong grasp of Machine Learning concepts
people person with excellent communication skills that enjoys collaboration and teamwork.
ways to stand out from the crowd:
familiarity with nvidia's DL software stack, e.g. triton inference server, tensorrt-llm, and model optimizer
proven track record of performance modeling, profiling, debugging, and development in a performance-critical setting with nvidia's accelerators.
familiarity with llm quantization, fine-tunning, and caching algorithms
proficiency in gpu Kernel programming (cuda or opencl)
prior experience working on a large software project with 50+ contributors
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593825
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
Location: Tel Aviv-Yafo
Job Type: Full Time
we are looking for a 100% hands-on Storage services software engineer to join the block Storage group. you will be a member of a team that builds the next generation block Storage capabilities. you will work closely with a variety of teams and architects including the networking team, and external customers. you will take part in defining the software architecture and implementation of the most advanced Storage services! services that will need to meet extreme performance and scalability demands! we have crafted a team of extraordinary people stretching around the globe, whose mission is to push the frontiers of what is possible today and define the platform of tomorrow.
we work, think and learn as a team. we thrive in a deeply strong environment, and we're passionate about a culture that demands innovation and the highest standards. the rewards are sweet and include collaborating with some of the smartest people in the industry, an aggressive compensation plan that rewards top performers, and the opportunity to work on products that transform the way people work and play.
what youll be doing:
100% hands-on coding role in C language, Kernel and userspace
research, design, implement and TEST, new and existing, distributed Storage services and features of nvidias block Storage solution, in both host and dpu environments.
acquire understanding of the algorithms, the technicalities and the interaction with other components across nvidias block Storage ecosystem.
analyze and solve challenging bugs and customer cases in large scale production systems, identifying issues in our or inbox Kernel modules and often in other components. drive new solutions based on any issues that arise
Requirements:
what we need to see:
b.sc., m.sc.. in Computer Science, electrical engineering or related discipline (or equivalent experience).
15+ years of experience as a senior Developer, preferably in the domains of Storage, networking, and/or operating-systems.
strong proficiency in C / C ++ programming.
experience with Storage protocols and standards, especially nvme
experience with Linux block subsystem and io stack
proven professional experience in designing and developing distributed systems; advantage for experience in block Storage and/or networking systems.
ability to work autonomously, with a proactive mindset and perseverance to solve day to day challenges.
ability to quickly adapt to new technology and go deep into new areas
excellent communication skills and a collaborative mindset.
innovative approach, identifying opportunities to improve, accelerate, and reuse existing solutions.
knowledge of cloud computing concepts, including virtualization, scalability, and data management.
ways to stand out from the crowd:
Linux Kernel coding experience
Linux Kernel internals knowledge including memory management, scheduling, etc.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593806
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
about the job
our company's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. our products need to handle information at massive scale, and extend well beyond web search. we're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data Storage, security, artificial intelligence, Natural Language Processing, UI design and mobile; the list goes on and is growing every day. as a software engineer, you will work on a specific project critical to our company's needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. we need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the Full-Stack as we continue to push technology forward.
in this role, you will work with system teams and the cpu architecture team to develop an understanding of the central processing unit (cpu), system on a chip ( SOC ), performance metrics, benchmarks/measuring tools, and available optimization knobs. you will define methods and technologies to model cpu performance at different accuracy levels by supporting architectural explorations and decision making. you will correlate performance projections with measured post-silicon data.the ai and infrastructure team is redefining whats possible. we empower our company customers with breakthrough capabilities and insights by delivering ai and infrastructure at unparalleled scale, efficiency, reliability and velocity. our customers, our company cloud customers, and billions of our company users worldwide. we're the driving force behind our company's groundbreaking innovations, empowering the development of our cutting-edge ai models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. from software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our tpus, vertex ai for our company cloud, our company global networking, data center operations, systems research, and much more.
responsibilities
write product or system development code.
design, develop, TEST, deploy, maintain, and improve central processing unit (cpu) software modeling and other software tools.
manage project priorities, deadlines, and deliverables.
collaborate with hardware and software cpu architecture teams, SOC performance modeling team, and other company software teams.
Requirements:
minimum qualifications:
bachelor's degree in electrical engineering, computer engineering, Computer Science, or a related field, or equivalent practical experience.
2 years of experience with software development in C ++ programming language or 1 year of experience with an advanced degree.
preferred qualifications:
masters degree or phd in engineering, Computer Science, or a related technical field.
2 years of experience with data structures and algorithms.
experience in modern cpu/ Machine Learning (ml) architecture and micro-architecture.
ability to learn coding languages.
excellent object-oriented database design and sql skills.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8593048
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
18/03/2026
חברה חסויה
Location: Tel Aviv-Yafo and Yokne`am
Job Type: Full Time
We are now looking for a HPC Operations Engineer to join our mission and continue improving our HPC infrastructure. A meaningful part of ourstrength is our unique and advanced development tools and environments that enable our incredible pace of innovation. We are looking for architects to help us evolve the way our private compute cloud is architected and optimized.

What youll be doing:

Troubleshoot incoming support requests in a large-scale HPC environment.

Contribute enhancements to existing deployment automation, configuration management, observability, and operational monitoring and day to day operation through automation.

Ensure compute servers are running correct Operating System and configuration.

Troubleshoot Complex Issues: Perform comprehensive troubleshooting from bare metal to application level, ensuring system reliability and efficiency.

Collaborate with specialist teams to drive issues to closure.

Collaborate with domain experts to improve how our chip development process utilizes our infrastructure.

Directly contribute to the overall quality and improve time to market for our next generation chips.
Requirements:
What we need to see:

BS in Computer Science or similar degree or equivalent experience

2+ years of experience Proficient in administering Centos/RHEL Linux distributions.

Understating of container technologies like Docker.

Proficiency in Python and UNIX scripting languages such as bash.

Excellent problem-solving skills, with the ability to analyze complex systems, identify bottlenecks, and implement scalable solutions.

Excellent communication and teamwork skills, with the ability to work effectively with diverse teams and individuals.

Solid understanding of cluster configuration managements tools such as Ansible.

Ways to stand out from the crowd:

Understanding of key Linux technologies such as NFS, automounter, LDAP, DNS, and TCP/IP networking in Red Hat Linux distribution flavors.

Familiarity with job scheduler administration (e.g. IBM Spectrum LSF or SLURM) and experience building/ operating large scale compute infrastructure.

Knowledge of the FlexLM license management system.

Proficiency in Perl for maintaining legacy automation scripts.

Familiarity with High-Speed Networking (InfiniBand, RDMA, RoCE etc.) and fast, distributed storage systems (Lustre, GPFS, etc.).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8583522
סגור
שירות זה פתוח ללקוחות VIP בלבד