We're hiring for a new AI Engineering team in Tel Aviv, and you would be the first infrastructure hire. You will own the platform layer for AI agents the team builds: deployment architecture, observability, and production reliability.
The team's first two projects: an agent that automates internal governance processes (vendor reviews, security questionnaires, tool provisioning), and an agent that helps engineering teams prepare for architecture reviews. Both integrate with external APIs (LLM providers, OneTrust, ServiceNow), handle structured decision logic, and manage sensitive data flows with audit requirements.
Highlights
- Greenfield, but with real constraints. You're building on Azure/AWS with enterprise security requirements. The challenge is designing deployment and observability for LLM-backed services. You need to track output quality, cost per invocation, and model drift.
- Enterprise complexity, startup autonomy. Ownership and greenfield environment of a startup, with the integration challenges of a Fortune 200: connecting AI services to real enterprise systems.
- More than infrastructure. Your core is SRE, but you'll also write agent code in TypeScript and Python, work with data pipelines, and ship features alongside the team.
What the Work Looks Like
AI Service Infrastructure - Design and maintain deployment and release infrastructure for AI agents. The stack is cloud-native (Azure/AWS), with services that call LLM APIs, connect to enterprise systems, and handle structured data.
Observability & Reliability - Build monitoring and observability for AI services. Ensure model response quality doesn't degrade silently by tracking errors, logging cost spikes, and monitoring upstream API changes.
Security & Compliance - These agents handle sensitive workflows with elevated security requirements. You will work with our company's security team on standards, but you own how they're implemented in the infrastructure.
Developer Experience - Create tooling that makes it easy for the team to build, test, and deploy. The patterns you set become the team's defaults.
Requirements: Required:
- 5+ years in SRE, platform engineering, DevOps, or infrastructure roles, with experience owning infrastructure end-to-end
- Strong experience with cloud platforms (Azure or AWS), containerization (Docker, Kubernetes), and CI/CD pipelines
- Infrastructure-as-code experience (Terraform, CDK, or CloudFormation)
- Monitoring and observability (Datadog, Splunk, CloudWatch, or similar)
- Infrastructure fundamentals: Linux, networking, security
- Incident management experience: on-call, production incidents, post-mortems
- Comfortable working independently with broad ownership and high accountability
- Strong written and verbal English for async collaboration with distributed teams
Preferred:
- Experience with AI/ML infrastructure: model serving, LLM API integration, vector databases, or evaluation pipelines
- Comfortable writing production code in TypeScript or Python, not just scripts
- Experience building self-service developer tooling or internal platforms
- Cost optimization for cloud and API-based workloads
- Security engineering experience, especially in enterprise or compliance-heavy environments.
This position is open to all candidates.