The eBPF APM team builds Datadogs zero-instrumentation observability platform, enabling automatic service discovery, Layer 7 protocol classification, traffic decoding, and high-fidelity RED (requests, errors, duration) metrics from both plaintext and TLS-encrypted traffic-without requiring customer code changes.
This work spans kernel-space eBPF, user-space Go services, and large-scale distributed systems, operating reliably across diverse Linux kernels, distributions, runtimes, and real-world production environments. The team tackles challenges in protocol evolution, TLS detection across languages and frameworks, and performance-critical data collection at scale.
We are looking for a Staff Engineer who will act as a technical owner and multiplier-driving architecture, influencing the roadmap, and ensuring our APM platform remains robust, scalable, and easy to adopt us and our customers grow.
What Youll Do:
Own major technical areas of the zero-instrumentation APM system, from design through long-term evolution and operational maturity.
Define and drive architecture for kernel-level traffic capture, L7 protocol decoding, and metric extraction using eBPF and Go.
Lead high-impact initiatives addressing protocol parsing, TLS visibility, kernel compatibility, and performance at scale.
Set technical direction and standards for reliability, performance, and maintainability across the team.
Partner cross-functionally with Agent, Tracing, Security, Runtime, and Product teams to align on system design and roadmap priorities.
Requirements: You have deep experience in backend or systems engineering, with strong proficiency in Go and/or C/C++.
You are comfortable operating close to the Linux kernel, with experience in eBPF, networking, observability, or similarly low-level systems.
You consistently think at a system and organizational scale, making thoughtful tradeoffs between performance, correctness, velocity, and long-term sustainability.
You have built, evolved, or operated large-scale production systems in complex and heterogeneous environments.
You demonstrate technical leadership without relying on authority-you influence architecture, unblock teams, and raise the technical bar through collaboration.
You have a strong bias for performance, efficiency, and reliability, especially in resource-constrained or performance-sensitive contexts.
You thrive in ambiguity and take ownership of ill-defined, high-impact problems.
You are excited to leverage AI-assisted development tools to improve productivity, code quality, and system design - or are eager to learn.
This position is open to all candidates.