Required Senior Hands-On Gen.AI Infrastructure Expert
Job Description
Tel Aviv Research and Innovation Center, where we pioneer next-generation data center technologies. Our Future-Data-Center-Infrastructure group specializes in co-designing hardware and software architectures that accelerate AI workloads, optimize resource utilization, and reduce operational costs.
Working at the intersection of academic research and practical implementation, our team develops cutting-edge solutions for AI infrastructure challenges, with a particular focus on Generative AI and large language models (LLMs). We pride ourselves on fostering an environment where system-level thinking, innovation, and technical excellence drive our success in reshaping the future of data center technology.
Role Overview
As a Senior Gen.AI Infrastructure Engineer in our AI innovation team, you'll play a crucial role in advancing our LLM infrastructure capabilities, focusing on research, implementation and optimization of state-of-the-art solutions for model deployment and serving
What will you be doing?
Explore new directions and seek new opportunities for innovation through academic / industry trends tracking
Design and implement advanced optimization techniques for transformer-based models
Conduct research and stay current with academic developments in LLM optimization
Drive architectural improvements for large-scale model serving systems
Develop and optimize infrastructure components using PyTorch and other frameworks
Implement and optimize model serving solutions using frameworks such as vLLM, TGI, and MindIE
Collaborate with cross-functional teams to improve system architecture and performance.
Requirements: Bachelor degree or higher in Computer Engineering / Computer Science or equivalent
At least 3 of the following:
o Advanced hands-on proficiency in Python and C/C++
o Deep expertise with PyTorch and other deep learning frameworks
o Comprehensive understanding of transformer architecture (Attention, MLP, KV cache)
o Proven experience with LLM serving frameworks and optimization techniques
o Strong system-level understanding of hardware accelerators
5+ years of relevant engineering experience
Excellence in system design and architecture
Track record of building and optimizing infrastructure components
Demonstrated ability to solve complex technical challenges
Strong research capabilities and quick learning ability
Outstanding collaborative and communication skills, ability to work as part of an international team
Innovative thinking
Ways to stand out from the crowd:
Ph.D. in Computer Science, Machine Learning, or related field
Experience with AI compilers (TVM, TensorRT)
Hands-on experience in LLM training and fine-tuning
Expertise in LLM deployment optimization:
o Parallelization strategies
o Scheduling optimization
o Batching strategies
Experience working with large-scale AI clusters and/or NVIDIA SuperPOD
CUDA or equivalent GPU programming expertise
Advanced knowledge of memory optimization for large-scale models.
This position is open to all candidates.