This is a hands-on senior IC role. You will design subsystems, implement them in modern C ++, and be the person who diagnoses the hard, ambiguous problems when a pipeline drops samples, jitters, or starves the GPU.
What you'll do- Design and build the Real-Time data path that ingests high-rate sensor/A2Ds.
- Port different Algorithms from Python to C ++.
- Integrate on-device signal processing and inference (GPU/ DSP ).
- Own subsystems end-to-end: architecture, implementation, profiling, and production support on Embedded Linux (Yocto) and NVIDIA Jetson / GPU edge targets. - Profile and optimize CPU, memory, cache, and I/O behavior; eliminate jitter, contention, and copies in the critical path. - Write design docs, defend technical decisions, and collaborate with hardware, DSP /algorithms, and systems teams.
Requirements: - 7+ yearsbuilding production Embedded / systems software.
- Expert in Modern C ++ ( C ++17/20) - RAII, move semantics, concurrency, and writing high-performance, memory-safe code.
- Proven track record shipping products end-to-end, from design through production deployment and field hardening.
- Hands-on with Real-Time, high-throughput data pipelines: low-latency ingest, lock-free / concurrent data structures, zero-copy techniques, and backpressure handling.
- Strong on Embedded Linux, including building and customizing images with Yocto (or equivalent: Buildroot).
- Experience with signal processing, and porting algorithms into production C ++ (e.g. from MATLAB / Python / reference implementations into optimized, Real-Time Embedded code).
- Deep, demonstrable problem-solving and debugging skills on complex, ambiguous systems issues (perf, concurrency, timing, memory).
This position is open to all candidates.