דרושים » תוכנה » Senior Performance Engineer - LLM Inference Frameworks

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 1 שעות
Location: More than one
Job Type: Full Time
We are hiring exceptional software engineers to build and optimize the core inference infrastructure for large language models. Join the TensorRT‑LLM team - the group defining how generative AI performs at global scale on our GPUs. Were looking for engineers who love squeezing every drop of throughput, memory efficiency, and scalability out of modern model runtimes. Your work will directly shape the frameworks behind state‑of‑the‑art LLM inference used across the company and the AI community. Join us to redefine what fast means for LLM inference - building the frameworks that power the next generation of generative AI at scale.
What you'll be doing:
Design, implement, and optimize high‑performance inference pipelines for large language models running on GPUs
Profile and tune model execution across the stack - from scheduler design to kernel fusions and everything in-between
Design and experiment with memory management strategies for improved memory bandwidth optimization and cache efficiency
Innovate and Implement cutting-edge techniques such as Speculative Decoding, Context Caching, and FP8/INT4 quantization to push the boundaries of tokens-per-second-per-watt
Develop and maintain benchmarking and testing systems that quantify latency, utilization, and efficiency.
Requirements:
Bachelor's, Master's, or higher degree in Computer Engineering, Computer Science, Applied Mathematics, or related computing-focused degree (or equivalent experience)
5+ years of relevant software development experience.
Excellent Python programming skills, software design, and software engineering skills
Experience working with deep learning frameworks like PyTorch and HuggingFace
Experience profiling and debugging performance at all levels - Python runtime, PyTorch internals, and GPU utilization metrics
Awareness of the latest developments in LLM architectures and LLM inference techniques
Proactive and able to work without supervision
Excellent written and oral communication skills in English
Ways to stand out from the crowd:
Contributions to inference frameworks such as TensorRT‑LLM, vLLM, SGLang, or similar systems
Demonstrated expertise in performance modeling, memory optimization, distributed model execution or GPU execution workflows
Hands‑on experience with our profiling tools (Nsight Systems, PyTorch Profiler, custom benchmarking harnesses)
Strong grasp of the trade‑offs shaping inference efficiency: compute vs. memory, scheduling vs. batching, latency vs. throughput.
This position is open to all candidates.
 
Hide
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8645997
סגור
שירות זה פתוח ללקוחות VIP בלבד
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 4 שעות
Location: Tel Aviv-Yafo
Job Type: Full Time
We seek a versatile Senior Software Engineer who is passionate about performance optimization and generative AI. Our team brings the latest research in LLM inference - from novel decoding strategies to quantization schemes - into production across Our hardware lineup, from large data center servers to powerful edge devices. We work on the most advanced architectures in the field, with a focus on our own.
What you'll be doing:
Implement and optimize inference algorithms for LLM and omnimodal architectures, including hybrid Mamba-Transformer and mixture-of-experts models
Profile inference pipelines using our profiling and simulation tools. Correlate simulation predictions against real hardware across data center and edge devices
Write and tune GPU kernels (CUDA, Triton) for operators like fused MoE layers, SSM state updates, and quantized GEMMs
Solve distributed inference problems: expert parallelism, communication-compute overlap, collective tuning, multi-node deployment
Build production-grade software inside major open-source libraries - vLLM, SGLang, Dynamo, FlashInfer
Own optimization features end-to-end, from scoping through delivery, collaborating with research, product, and engineering teams worldwide.
Requirements:
B.Sc., M.Sc., or equivalent experience in Computer Science or Computer Engineering
5+ years of hands-on software engineering experience in performance-critical systems
Solid understanding of deep learning architectures (Transformers, SSMs, MoE,)
Experience with systems where hardware constraints matter: GPU programming, memory hierarchy, networking, or distributed computing
Strong software engineering fundamentals: clean design, extensibility, testability. Good judgment about when complexity is warranted
Effective communicator who works well across teams and time zones
Experience optimizing deep learning workloads on our GPUs using roofline models, Nsight/PyTorch profilers and end-to-end traces
Ways to stand out from the crowd:
Contributions to open-source inference runtimes and libraries - vLLM, SGLang, FlashInfer, Dynamo or similar
Hands-on work with LLM quantization (FP8, NVFP4, MXFP8, mixed-precision) and practical understanding of numerical precision tradeoffs
Track record with distributed inference at scale: tensor parallelism, pipeline parallelism, expert parallelism, disaggregation, multi-node orchestration
Deep knowledge of the latest LLM architectural trends: multi-token predictors, sparse hybrid models, attention and state-space mechanisms
Experience with performance modeling and simulation-to-silicon correlation.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8645501
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
09/04/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
Required Senior ML Engineer - Applied AI Engineering Group
The Dream Job
It starts with you - an engineer driven to build the ML platform that turns research into reliable, production-grade intelligence. You care about reproducibility, low-friction experimentation, and infrastructure that earns the trust of the scientists and researchers who depend on it daily. You'll architect and ship our ML platform - training pipelines, model serving, feature stores, experiment tracking, and compute orchestration - turning models into production capabilities across cloud and on-prem, including air-gapped deployments. A significant part of the platform supports large language models, with unique challenges across training, evaluation, and inference in mission-critical environments.
If you want to make a meaningful impact, join our mission and build the ML platform that drives Sovereign AI products - this role is for you.
The Dream-Maker Responsibilities
Build and operate ML training infrastructure - distributed training pipelines, compute scheduling, and reproducible experiment workflows that data scientists rely on daily.
Own model serving and inference systems - packaging, deployment, autoscaling, A/B testing, canary rollouts, and latency/cost optimization for production models.
Run feature stores, model registries, and dataset versioning - enabling self-serve feature engineering, model lineage, and reproducible experiments across teams.
Build experiment tracking and evaluation infrastructure - automated evals, comparison dashboards, drift detection, and monitoring that give teams visibility into model behavior and performance.
Build and maintain production pipelines for training, fine-tuning workflows, and serving domain models - owning reliability, reproducibility, and scale.
Build and maintain the monitoring and observability layer - model performance tracking, data and prediction drift detection, data quality validation, and alerting.
Improve performance and cost across the ML stack - training throughput, inference latency, batch vs. real-time tradeoffs, and compute cost management.
Ship shared tooling - libraries, templates, CI/CD for models, IaC, and runbooks - while collaborating across Data Platform, AI, Data Science, Engineering, and DevOps. Own architecture, documentation, and operations end-to-end.
Requirements:
5+ years in software engineering, with 2+ years focused on ML infrastructure, MLOps, or data-intensive systems
Engineering craft - Strong Python, distributed systems design, testing, secure coding, API design, CI/CD discipline, and production ownership.
ML platform & serving - Model serving frameworks (e.g., Triton, TorchServe, vLLM, Ray Serve); model packaging, deployment pipelines, and inference optimization
Training infrastructure - Distributed training pipelines (e.g., frameworks like PyTorch, JAX) experiment orchestration and reproducibility
ML lifecycle tooling - Feature stores, model registries, experiment tracking (e.g., MLflow, Weights & Biases); dataset versioning and lineage
Data pipelines - Building training and inference data pipelines; familiarity with tools like Spark, Airflow/Dagster, and streaming ingestion
Comfortable with AI coding tools like Cursor, Claude Code, or Copilot
Nice to Have:
Experience operating in constrained environments - on-premise, private cloud, or air-gapped deployments
Hands-on experience with simulation environments, synthetic data generation, or reinforcement learning workflows
Platform & infra - Kubernetes, AWS, Terraform or similar IaC, CI/CD, observability, incident response
Hands-on data science or applied ML experience.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8603632
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 21 שעות
Location: Yokne`am
Job Type: Full Time
Required Senior Software Engineer, Data Center Workloads - Infrastructure
We are pioneers in innovation, transforming computer graphics, PC gaming, and accelerated computing for over 25 years. Our team is driven by powerful technology and outstanding people who expand the limits of whats achievable. Now, we are unlocking the potential of AI to usher in the next era of computing.
As part of our engineering organization, you will play a key hands-on role in developing and executing software-driven characterization workflows on our rack-scale systems. This role is focused on running AI workloads across the full stack to analyze, characterize, and optimize power, performance, and drive behavior at system level. This is an opportunity to work at the intersection of software, infrastructure, silicon, and large-scale AI platforms, with direct impact on next-generation systems.
What youll be doing:
Develop and run software tools, automation, and workloads to characterize power, performance, and drive behavior across our rack-scale systems.
Execute AI and system-level workloads to stress and evaluate behavior across the stack, including GPUs, CPUs, networking, storage, firmware, drivers, and system software.
Build automated frameworks for data collection, telemetry, validation, correlation, and analysis of characterization results.
Investigate system behavior under different workloads and operating conditions to identify bottlenecks, anomalies, and optimization opportunities.
Work closely with hardware, firmware, driver, system software, performance, and validation teams to define characterization methodologies and debug cross-stack issues.
Support bring-up, validation, and readiness activities for new rack-scale platforms and AI infrastructure.
Create clear documentation, test flows, and repeatable processes to improve coverage, efficiency, and reproducibility.
Requirements:
B.Sc. or M.Sc. in Computer Science, Electrical Engineering, or a related field.
5+ years of software engineering experience, preferably in system software, infrastructure, validation, or performance-focused environments.
Strong programming skills in Python and at least one system-level language such as C/C++.
Experience developing automation and test infrastructure for complex hardware/software systems.
Hands-on experience running, debugging, or optimizing AI, HPC, or large-scale system workloads.
Good understanding of system-level architecture, including interactions across hardware, firmware, drivers, operating systems, and application layers
Experience working in Linux environments and with scripting, telemetry, logging, and data analysis tools.
Strong debugging and problem-solving skills, with the ability to work across multiple engineering disciplines.
Good communication skills and the ability to drive technical work in a fast-paced, cross-functional environment.
Ways to stand out from the crowd:
Experience with NVIDIA platforms, GPU systems, or rack-scale AI infrastructure.
Background in power, thermal, performance, or storage/drive characterization.
Experience with workload automation, cluster orchestration, or lab infrastructure.
Familiarity with AI benchmarks, training/inference workloads, and system stress methodologies.
Experience in post-silicon validation, production testing, or system bring-up.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8644517
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
09/04/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
Required Senior AI Engineer - Applied AI Engineering Group
The Dream Job
It starts with you - an engineer driven to build the agentic AI platform that turns LLMs into reliable, production-grade capabilities. You care about clean APIs, well-defined service boundaries, and systems that teams can build on with confidence. Dream is AI-first across the board - every team builds and operates agents. You'll architect and ship the platform that makes this possible: agent orchestration frameworks, LLM gateways, evaluation pipelines, tool-calling infrastructure, and retrieval systems. Without this platform, agents don't ship - you own the layer that turns AI research into Sovereign AI products, deployed across cloud and on-prem environments.
If you want to make a meaningful impact, join our mission and build the agentic AI platform that drives Sovereign AI products - this role is for you.
The Dream-Maker Responsibilities
Design and build agentic systems - single and multi-agent workflows with planning, memory, context engineering, and tool use - for both internal automation and product-facing autonomous capabilities operating over long time horizons.
Build and operate the AI platform layer - LLM gateways, prompt management, structured output handling, tool-calling infrastructure, and cost/latency optimization - deployed on Kubernetes, consumed by every team for their agentic work.
Own the agent framework layer - orchestration primitives, execution environments, state management, and sandboxed tool execution - giving every team the building blocks to create and operate their own agents.
Build evaluation infrastructure that gives teams confidence in agent behavior - automated LLM and agent evals for quality, correctness, safety, latency, cost, and regressions, including human-in-the-loop oversight for mission-critical workflows.
Productionize and harden backend services (APIs, gRPC, async workers) that integrate LLMs - with proper error handling, retries, circuit breakers, and high-availability patterns.
Own RAG pipelines and retrieval systems - indexing, chunking, embedding, vector database management, filtering, and relevance tuning for production retrieval.
Optimize performance and cost across the AI stack - model routing, caching, batching, and inference cost management.
Ship shared tooling - libraries, SDKs, agent templates, and documentation - while working closely with ML Platform, Data Platform, DevOps, and other teams across the Applied AI Engineering group. Own architecture, documentation, and operations end-to-end.
דרישות:
5+ years in backend or distributed systems engineering, with 2+ years focused on production systems that integrate AI/ML models or LLMs.
Engineering craft - Strong Python, Go, or Java, system architecture, API design, testing, and secure coding practices.
Agentic systems - Experience designing and building agent orchestration, tool-use systems, and autonomous workflows; familiarity with frameworks like LangGraph or similar, or having built equivalent from scratch
Backend engineering - Experience building production APIs and services (FastAPI or similar); async programming, service architecture, high-availability, and reliability patterns (retries, circuit breakers, backpressure)
LLM integration - Hands-on experience integrating LLMs via SDKs and APIs; context engineering, structured outputs, tool calling, and model routing
RAG & retrieval - Experience with embedding pipelines, vector databases (e.g., Milvus, Qdrant, Pinecone), chunking strategies, and relevance tuning
Evaluation & observability - Experience designing LLM and agent evals, monitoring AI system quality, and building observability for non-deterministic systems
Nice to Have:
Platform & infra - Kubernetes, AWS, Terraform or similar IaC, CI/CD, container orchestration, deploying and operating production services
Experience with MCP or similar tool-use protocols for agent-to-service communication
Hands-on ML experience - המשרה מיועדת לנשים ולגברים כאחד.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8603620
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 2 שעות
חברה חסויה
Location: Yokne`am
Job Type: Full Time
NWe are seeking a highly skilled Senior Performance Engineer to join our Performance and R&D organizations. In this role, you will help build and evolve systems that support performance analysis, telemetry, and optimization for large-scale GPU- and CPU-based clusters used in AI and high-performance computing environments. You will work closely with hardware, networking, firmware, and software teams to collect, analyze, and interpret performance data from live systems. This is a fast-paced R&D environment where system behavior and requirements evolve rapidly, requiring adaptable engineering solutions and strong analytical thinking.
What youll be doing:
Profile, benchmark, and analyze AI and HPC workloads on GPU and CPU clusters
Explore performance characteristics of high-performance networking and collective communications (e.g., NCCL, RDMA, MPI, RoCE)
Identify performance bottlenecks across networking, compute, memory, and system architecture
Develop and enhance performance analysis, benchmarking, and diagnostic tools
Define performance test plans and establish expectations for new technologies and platforms
Collaborate across hardware, firmware, networking, systems, and software teams to provide actionable performance insights
Support telemetry collection and data refinement efforts to enable accurate performance analysis
Maintain high standards for data quality, reproducibility, and traceability of performance results.
Requirements:
B.Sc. or M.Sc. in Computer Science, Computer Engineering, Software Engineering, or equivalent experience
5+ years of experience in performance analysis, systems engineering, or HPC/AI infrastructure
Demonstrated expertise in performance analysis skills and methodologies
Hands-on experience with high-performance networking (RDMA, MPI, NCCL, congestion control)
Strong understanding of system performance metrics (latency, throughput, resource utilization)
Exposure to hardware, firmware, or embedded telemetry environments
Strong analytical, problem-solving, and communication skills
Ability to work effectively in cross-functional, fast-paced R&D teams
Ways to stand out from the crowd:
Knowledge of CUDA, NCCL internals, and congestion control algorithms
Deep system-level understanding of CPU architectures, GPUs, HCAs, memory, and PCIe
Experience with our GPUs, CUDA, and deep learning frameworks such as PyTorch or TensorFlow
Experience with cloud platforms
Proficiency in Python; experience with Bash and C/C++ is a plus as well as a strong experience working in Linux environments.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8645832
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 1 שעות
חברה חסויה
Location: More than one
Job Type: Full Time
We are looking for an enthusiastic software engineer to join our AI networking acceleration team, to work on a groundbreaking open-source library, using hardware offloads, GPU Kernels and RDMA network cards. Our product is a performance-oriented low-level infrastructure, crafted to change the way inference works.
We thrive as a team in a deeply strong environment, and we're passionate about innovation. The rewards are sweet and include working with some of the brightest people in the industry, an aggressive compensation plan that rewards top performers, and the opportunity to collaborate on products that transform daily the way people work and play.
What you'll be doing:
Developing a highly optimized inference framework
Running on the worlds largest supercomputers and data centers.
The work environment is dynamic and challenging as our employees work on innovative, next-generation products at the forefront of technology in terms of performance, scalability, and features.
Requirements:
B.Sc. or equivalent experience in Computer Science or Software Engineering
At least 5 years of experience in modern C++ / C / Python development
At least 3 years of experience in Linux environment and familiarity with development tools
Deep knowledge of the TCP/IP network stack
Understanding of computer architecture and operating systems concepts
Ways to stand out from the crowd:
Background in Linux internals and low-level software optimizations (benchmarking, bottleneck research, performance tuning)
Experience in programming CUDA kernels is an advantage
Familiarity with ML frameworks and LLMs
Background in parallel programming / high-performance computing / RDMA technology.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8645963
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
1 ימים
Location: Ra'anana and Yokne`am
Job Type: Full Time and Hybrid work
We are looking for an outstanding Senior Software Engineer to join our Video/Multimedia Architecture & Algorithms (A&A) team - the people who build tomorrows NVENC and NVDEC, the dedicated video encode and decode engines that power streaming, cloud gaming, video conferencing and broadcast on every modern NVIDIA GPU. You will be the software-craftsmanship anchor of a small Software team inside A&A. You will craft and implement the core components developed using C++ alongside Python. These components support our research and product paths. You will raise the engineering bar across the group. You will also guide research code through to the shipping NVENC/NVDEC SDK.
This is a hybrid role - 4 days per week from the office.
What You'll Be Doing
Work closely with our Architects and Algorithms Engineers to understand the needs and build, implement and/or optimize the most elegant solutions - in modern C++ and Python
Set the bar for what good software means inside A&A: reviewing code, mentoring engineers from non-software backgrounds, and bringing the rest of the group up with you
Implement detailed, focused tweaks into the SDK and the wider video stack. These changes let researchers test new ideas without forking the world. Walk research code through to a shipping NVENC/NVDEC release when needed.
Profile and optimize critical paths in the codec stack; reach for CUDA when CPU-side optimization is not enough
Build and sharpen the small libraries, frameworks and tools the team uses every day, and make sure they are a joy to work with.
Requirements:
What We Need To See
B.Sc. in Computer Science or Electrical/Computer Engineering
8+ years of relevant proven experience (or 5+ years and a relevant M.Sc.)
Proficiency in modern C++ (C++14/17/20) - templates, RAII, concurrency, move semantics, the standard library, the works
Proficiency in Python - idiomatic, performant, well-tested, with a strong sense of what is appropriate for C++ and what suits Python
Strong software design instincts and a real care for code quality: APIs, modularity, testability, clean abstractions, performance, the long tail of maintainability
Experience working on Linux as a development platform - CMake, Git, debuggers, profilers, sanitizers
Experience with optimizing Algorithmic code with different methods such as Multi-Threading/Multi-Processing, SIMD, C++, C, etc
Ways To Stand Out From The Crowd
Familiarity with video compression / codecs (NVENC, NVDEC, FFmpeg, GStreamer, x264/x265, AV1, VVC)
CUDA or GPU programming experience
Experience embedding Python in C++ (pybind11, nanobind) or building Python extensions
Extensive mileage with C++/Python Algorithmic Frameworks such as OpenCV, Numpy, SciPy, CuPy, matplotlib, TensorFlow, PyTorch, etc.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8643834
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
20/04/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for a highly skilled Senior Machine Learning Engineer to lead our transition from on-demand, third-party LLM APIs to a fully self-hosted, scalable model ecosystem.
Our core product is an advanced, agentic support chatbot capable of complex reasoning, API tool calling, database lookups, and orchestrating specialized Small Language Models (SLMs) for targeted NLP tasks. As we scale, our current deployment infrastructure (AWS SageMaker) is becoming unsustainable. You will be responsible for architecting, deploying, and optimizing an infrastructure capable of supporting 50 to 100 distinct models ranging from 100M to 70B parameters.
What Youll Do:
Inference Optimization: Deploy and manage large-scale models using high-performance inference engines (like vLLM) to ensure low latency and high throughput for our agentic chatbot.
Agentic Workflows: Develop and refine the chatbot's agentic capabilities, ensuring reliable tool-use, routing, and interactions between massive LLMs and specialized SLMs.
Model Fine-Tuning: Design and execute fine-tuning strategies to improve model accuracy on specific domain tasks and tool-calling execution.
Rigorous Evaluation: Build comprehensive offline and online evaluation frameworks to constantly measure model performance and business impact through structured A/B testing.
Requirements:
Core Engineering & AI Frameworks
Strong proficiency in Python and Bash scripting.
Deep experience with PyTorch and the Hugging Face ecosystem.
Experience using AI coding assistants natively in the terminal, specifically Claude Code, to accelerate development workflows.
LLMs, Inference & Agents
Proven experience deploying models using vLLM, TGI, or similar high-performance inference servers.
Strong fundamental understanding of LLM architectures, attention mechanisms, and generation parameters.
Hands-on experience building Agentic systems (ReAct, function/tool calling, RAG).
Expertise in fine-tuning strategies (e.g., SFT, RLHF, DPO) and parameter-efficient techniques (PEFT/LoRA).
Statistics & Model Evaluation
Offline Metrics: Deep understanding of classification/summarization metrics (Precision, Recall, F1, AUC) and retrieval metrics (MRR, NDCG, Precision/Recall @ k).
Online Metrics & A/B Testing: Strong statistical foundation to design and analyze A/B tests safely, including the use of t-tests, Mann-Whitney U tests, and bootstrapping techniques.
Bonus Points
Containerization & Orchestration: Experience with Ray for orchestrating large-scale model deployments across multi-GPU clusters.
Model Quantization: Experience with memory optimization techniques like AWQ, GPTQ, GGUF, or FlashAttention to fit 70B models efficiently onto hardware.
API Development: Proficiency in building robust, asynchronous microservices using FastAPI to serve model requests.
Knowledge of Data Engineering principles: dataset collection, cleaning, processing, and scalable storage.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8618171
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 22 שעות
Location: Tel Aviv-Yafo
Job Type: Full Time
We are looking for an exceptional Hardware Simulation Engineer for the ChipSim team. You'll build software systems that validate our next-generation datacenter GPUs before they exist in silicon - enabling firmware, driver, and architecture teams to develop in parallel with chip fabrication. Working closely with hardware architects and cross-functional teams, you'll create simulation platforms for networking features like NVLink and InfiniBand that power the world's largest AI supercomputers.
What You'll Be Doing:
Develop and maintain simulation models for next-generation networking hardware features
Build validation frameworks and test suites for InfiniBand and NVLink protocol implementations
Create automation tools and CI/CD pipelines for regression testing and result analysis
Design developer-friendly simulation environments that enable rapid iteration and debugging
Collaborate with hardware, firmware, and software teams to ensure accurate chip behavior modeling.
Requirements:
Bachelor's Degree or equivalent experience in Computer Science, Computer Engineering, Electrical Engineering, or related field
5+ years of experience with Python and C in systems programming or infrastructure contexts
Strong debugging skills across multiple system layers and processes
Knowledge of Linux systems programming
Creative, motivated, and collaborative team player
Ways to stand out from the crowd:
Experience with networking protocols (InfiniBand, RDMA, NVLink, Ethernet) or distributed systems
Background in hardware/firmware environments or hardware-software co-development
Familiarity with simulation, emulation, or virtualization platforms
Experience with CI methodology & tools (Git, Gerrit, Jenkins, pytest)
Systems-level performance optimization experience.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8644509
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 4 שעות
Location: Yokne`am
Job Type: Full Time
Required Senior Software Engineer - Networking
We are building state-of-the-art accelerated computing platforms that know no boundaries. Our next-generation Infiniband, NVLink, and Ethernet systems will continue to be at the forefront of connecting and powering the world's most advanced AI clusters. We are looking for a highly motivated and experienced SW networking senior software engineer to join our SAI development team.
This is an outstanding opportunity to join our high performance multi-site team and to work on some of the most pioneering technologies, implement and lead cutting-edge networking features for cloud, HPC and AI networks. We drive the data growth of the worlds biggest companies. With talented engineers around the globe, the work environment is dynamic, meaningful, and fast-paced.
What youll be doing:
Develop first tier features, with groundbreaking multi-protocol networking technology.
Lead features from planning through design and development, until delivery to the customer.
Work closely with other development teams, arch and verification to ensure features delivery on time with high quality.
Gain deep understanding of our products and technologies.
Requirements:
B.Sc. degree or equivalent experience in Engineering/Computer Science/related field.
At least 5 years experience in development positions in the industry.
C programming experience - must, Python programming experience- an advantage
High technical understanding and learning skills - specification, design, programming, integration and debugging abilities
Self-motivated, ability to work with little definition and supervision while multi-tasking and prioritizing across a number of projects and initiatives
Experience with testing methodologies, some tasks will include developing sophisticated fully automated testing environment
Excellent English communication and leading skills
Ways to stand out from the crowd:
Experience in a Ethernet switching product development, Routing / Bridging protocols knowledge
Experience in a multi-functional team and collaborate with teams in oversea sites.
Linux networking knowledge, TCP/IP stack
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8645463
סגור
שירות זה פתוח ללקוחות VIP בלבד