We are looking for a Platform / DevOps Engineer to design, build, and operate the core infrastructure for a research compute datacenter. This platform supports researchers and physicists by providing scalable compute resources and exposing selected physics experiments to external users as Software-as-a-Service (SaaS).
What Youll Do
Design, maintain, and operate bare-metal Kubernetes clusters used for research and production workloads.
Build and manage declarative GitOps and workflows using tools such as Argo CD and Argo Workflows.
Develop and maintain Python-based infrastructure automation, backend services, APIs, and internal tooling for Kubernetes-based research platforms.
Administer and support core services such as Linux systems, Redis, and PostgreSQL.
Implement and evolve networking and security policies, including Cilium-based enforcement.
Collaborate with researchers to expose internal physics experiments as external SaaS services.
Contribute to internal platforms and, where possible, open-source projects.
Continuously improve reliability, observability, and developer/researcher experience.
Requirements: Required Skills & Experience
5+ experience with Python (automation, tooling, or backend services).
3+ Hands-on experience maintaining bare-metal Kubernetes clusters.
Practical knowledge of GitOps and DevOps tools, especially Argo CD, Argo Workflows.
3+ Experience operating Redis and PostgreSQL in production environments.
Solid Linux system administration skills.
Comfortable working in complex, distributed infrastructure environments.
Excellent communication skills for collaborating with cross-disciplinary teams.
Commitment to thorough documentation and knowledge sharing.
Ability to design and implement reusable infrastructure patterns.
What Makes This Role Unique
Direct impact on scientific research and real-world physics experiments
Opportunity to work on non-cloud, high-performance, bare-metal infrastructure
Strong emphasis on open-source technologies and best practices
A mix of deep infrastructure engineering and exposure to user-facing services
Preferred Skills
Nice to Have / Willing to Learn
Basic knowledge of Cilium and Tetragon networking and policy enforcement (or strong interest in learning it)
Basic web development experience, preferably with Svelte (or interest in learning frontend technologies)
Experience contributing to or maintaining open-source software
Minor bonus: some familiarity with Go (Golang) is a plus but completely optional.
This position is open to all candidates.