Our AI Solutions team in Israel is looking for an AI / machine learning / data engineer to join the LLM-based solutions engineering group. The team develops and sustains large language model (LLM)-powered tools that drive root cause analysis (RCA) and intelligent debugging for our current and next-generation hardware platforms. Our work accelerates issue resolution, improves product reliability, and powers the worlds most advanced GPU and data center technologies.
Our Hardware Networking division is a global leader in delivering end-to-end accelerated computing and connectivity solutions for high-performance data centers. We design innovative GPUs, networking platforms, and AI software that power modern workloads: from large language models and deep learning to cloud, enterprise, and hyperscale environments. Our products optimize performance and reliability at scale, supporting industries such as AI/ML, cloud, HPC, autonomous vehicles, and media & entertainment. We continually reinvent our platforms to stay ahead of the market and deliver breakthrough solutions that transform how computing and hardware engineering are done.
What youll be doing:
Build and maintain data pipelines and ETL flows for logs, telemetry, and hardware test data supporting AI/ML workflows.
Prepare, clean, and structure large, complex datasets (structured & unstructured) to train and fine-tune LLMs.
Assist in developing and deploying LLM-based applications for root cause analysis and hardware debugging.
Experiment with prompt engineering, retrieval-augmented generation (RAG), and vector search to integrate knowledge into models.
Collaborate with hardware, reliability, and AI platform teams to embed intelligent debugging tools into our engineering ecosystem.
Monitor and evaluate model performance, ensuring accuracy, scalability, and reliability in production environments.
Requirements: What we need to see:
B.Sc. or M.Sc. in Computer Science, Electrical/Computer Engineering, Data Science or related field (or equivalent practical experience).
2+ years of industry experience in machine learning or data engineering.
Strong programming skills in Python (pandas, NumPy, PyTorch or TensorFlow).
Proficiency with SQL and modern data pipeline tools.
Understanding of deep learning fundamentals and strong interest in LLMs/NLP.
Hands-on experience with Linux environments, version control (Git), and container tools (e.g., Docker).
Strong analytical and problem-solving skills
Eagerness to learn complex hardware/software systems.
Ways to stand out from the crowd:
Internship or project experience with LLM fine-tuning, prompt engineering, or retrieval-augmented generation.
Exposure to hardware debugging, observability/logging systems, or chip/system reliability analysis.
Experience with vector databases (FAISS, Pinecone, Milvus) or MLOps tools (MLflow, Kubeflow).
Masters degree in a related field (e.g., Computer Science, Electrical Engineering, Data Science), showing advanced theoretical foundation and research exposure.
This position is open to all candidates.