In this role, you will be a key contributor to the design and implementation of AI Graph Compiler software stack for Neural Processing Units (NPUs). You will take part in defining software architecture, implementing performance-critical components, and enabling efficient execution of advanced neural networks under tight power, memory, and latency constraints.
You will work closely with hardware and system architects, software and hardware engineers, influencing both software and hardware decisions. You will design and implement major parts of NPU embedded solutions, actively promoting AI capabilities to the customers.
What will you do:
Own and design key components of the AI Graph Compiler software stack for NPU-based systems.
Optimize inference performance (latency, throughput, memory footprint, power) for edge deployments.
Collaborate on HW-SW co-design, influencing NPU architecture.
Support IP evaluations and silicon bring-up, root-cause complex HW/SW issues, and influence development methodologies.
Mentor junior engineers and contribute to technical best practices.
Requirements: 3 years of experience in building high-quality embedded software using C/C++.
BSc/MSc in Computer Science, Electrical Engineering, or equivalent.
Proven experience developing and maintaining complex embedded systems, including multi-component software stacks, tight HW/SW integration, and system-level debugging.
Experience in designing and implementing software based on product & hardware specifications.
Experience working under tight memory, power, and real-time constraints.
Excellent interpersonal and communication skills, with a proven ability to work well in a team.
This position is open to all candidates.