We are seeking a technical leader to define, craft, implement, and guide firmware architecture for reliability, availability, serviceability, and power management across next-generation Networking products and platforms. You will take a strong hands-on role, working with hardware, firmware, software, validation, customer engineering, and external partners to build robust, diagnosable, power-efficient systems for large-scale deployments.
What you'll be doing:
Define platform-level firmware architecture for RAS and power management across SoCs, accelerators, DPUs, servers, embedded systems, and data center platforms.
Own error detection, classification, containment, recovery, escalation, and reporting architecture.
Define firmware architecture for power sequencing, power states, reset flows, thermal and power fault handling, idle management, and recovery from power-related failures.
Create firmware specifications for hardware error handling, health monitoring, crash capture, telemetry, diagnostics, debug data, and field serviceability.
Define interfaces and contracts between firmware, hardware, operating systems, BMCs, management controllers, platform software, and cloud/service infrastructure.
Drive architecture reviews, tradeoff discussions, failure-mode analysis, validation strategy, and long-term RAS and power management roadmap planning.
Establish standards for error logs, event schemas, telemetry flows, recovery policies, service diagnostics, and production debug infrastructure.
Guide engineering teams through implementation, validation, silicon bring-up, platform integration, and production deployment of RAS and power management features.
Analyze customer and field failures, identify architectural gaps, and feed lessons learned into future platform designs.
Requirements: What we need to see:
BSc, MS, or PhD in Electrical Engineering, Computer Science, Computer Engineering, or equivalent experience.
7+ years of relevant experience in firmware, platform architecture, embedded systems, or low-level systems software.
Deep understanding of RAS principles, fault modeling, error containment, recovery policies, diagnosability, and serviceability requirements.
Experience architecting firmware for complex hardware platforms such as SoCs, accelerators, DPUs, servers, networking devices, or embedded systems.
Strong knowledge of power management concepts, including power sequencing, reset architecture, thermal and power fault handling, power state transitions, and platform recovery flows.
Familiarity with boot firmware, UEFI/BIOS, BMC, embedded controllers, RTOS, embedded Linux, or platform management stacks.
Strong understanding of hardware/software interfaces, registers, interrupts, telemetry paths, debug infrastructure, and firmware-to-hardware contracts.
Programming and debugging fundamentals across languages such as C/C++, Python/Perl scripting, Verilog, assembly, or RISC-V assembly.
Ability to lead cross-functional architecture discussions and drive alignment across hardware, firmware, software, validation, product, and customer-facing teams.
Excellent communication skills, strong technical leadership, and a real passion for working collaboratively.
Ways to stand out from the crowd:
Experience with PCIe AER, CXL RAS, memory RAS, ECC/parity, accelerator RAS, networking RAS, high-availability systems, or large-scale data center platforms.
Knowledge of ACPI, SMBIOS, UEFI, PLDM, MCTP, Redfish, IPMI, or cloud telemetry systems.
Experience with power/thermal fault handling, dynamic power management, platform power sequencing, low-power states, or autonomous recovery mechanisms.
Background in silicon bring-up, platform validation, production diagnostics, or customer failure analysis.
Prior technical leadership experience as a firmware architect, principal engineer, platform lead, or domain owner.
This position is open to all candidates.