Complex static code analysis to determine possible bottlenecks and time-consuming operations within the code of AI model for inference
Architecture, design and implementation of compilation passes, compiling high-level languages to a unique HW
Collaborate with other development and product teams in our company and in China to ensure the successful implementation and delivery of a solution.
Requirements: What do we want to see?
B.Sc. in Computer Engineering / Computer Science or equivalent
At least 5 years experience in implementation and design of SW / SW+HW systems (mainly in C / C++)
Hands on experience with compilers design, architecture and implementation
At least 3 years experience using LLVM / MLIR
At least 3 years proven experience working with GPU instruction set architecture
At least 3 years proven experience using compilers for optimizing given AI models to run on GPU
System view, together with profound understanding of related technologies
Hands-on system design and PoC bring-up experience
Excellent communications skills and ability to work as part of an international team
Innovation, fast learning skills
Ways to stand out from the crowd:
M.Sc. or Ph.D. degree with expertise in fields related to compilation / static analysis / AI model optimizations
Experience in Triton compilation
Experience in working with Torch Inductor
Proven experience in optimizing applications performance
Proficiency in C++ programming language
Understanding in multiprocessing and multithreaded code.
This position is open to all candidates.