Required Principal Machine Learning Engineer GenAI Benchmarking & Validation Infrastructure
The Principal Machine Learning Engineer GenAI is responsible for hands-on design, development, and operation of large-scale systems and tools for AI model benchmarking, optimization, and validation.
Unlike traditional ML Engineers focused mainly on training models, this role centers on building, running, and continuously improving the infrastructure, automation, and services that enable rigorous, repeatable, and production-grade model evaluation at scale.
This is a hands-on principal role that combines strategic technical leadership with active engineering execution.
You will own the architecture, implementation, and optimization of benchmarking and validation capabilities across our AI ecosystem. This includes architecting Validation-as-a-Service platforms, delivering high-performance benchmarking pipelines, integrating with leading GenAI frameworks, and setting industry standards for model evaluation quality and reproducibility.
The role demands deep GenAI domain expertise, architectural foresight, and direct coding involvement to ensure evaluation platforms are flexible, extensible, and optimized for real-world, large-scale use.
What you will do
Architect and lead scalable benchmarking pipelines for LLM performance measurement (latency, throughput, accuracy, cost) across multiple serving backends and hardware types.
Build optimization & profiling tools for inference performance, including GPU utilization, memory footprint, CUDA kernel efficiency, and parallelism strategies.
Develop Validation-as-a-Service platforms with APIs and self-service tools for standardized, on-demand model evaluation.
Integrate and optimize model serving frameworks (vLLM, TGI, LMDeploy, Triton) and API-based serving (OpenAI, Mistral, Anthropic) in production environments.
Establish dataset & scenario management workflows for reproducible, comprehensive evaluation coverage.
Implement observability & diagnostics systems (Prometheus, Grafana) for real-time benchmark and inference performance tracking.
Deploy and manage workloads in Kubernetes (Helm, Argo CD, Argo Workflows) across AWS/GCP GPU clusters.
Lead performance engineering efforts to identify bottlenecks, apply optimizations, and document best practices.
Stay ahead of the GenAI ecosystem by tracking emerging frameworks, benchmarks, and optimization techniques, and integrating them into the platform.
Requirements: Advanced Python for ML/GenAI pipelines, backend development, and data processing.
Kubernetes (Deployments, Services, Ingress) with Helm for large-scale distributed workloads.
Deep expertise in LLM serving frameworks (vLLM, TGI, LMDeploy, Triton) and API-based serving (OpenAI, Mistral, Anthropic).
GPU optimization mastery: CUDA, mixed precision, tensor/sequence parallelism, memory optimization, kernel-level profiling.
Design and operation of benchmarking/evaluation pipelines with metrics for accuracy, latency, throughput, cost, and robustness.
Experience with Hugging Face Hub for model/dataset management and integration.
Familiarity with GenAI tools: OpenAI SDK, LangChain, LlamaIndex, Cursor, Copilot.
Argo CD and Argo Workflows for reproducible ML orchestration.
CI/CD (GitHub Actions, Jenkins) for ML workflows.
Cloud expertise (AWS/GCP) for provisioning, running, and optimizing GPU workloads (A100, H100, etc.).
Monitoring and observability (Prometheus, Grafana) and database experience (PostgreSQL, SQLAlchemy).
Nice to Have
Distributed training across multi-node, multi-GPU environments.
Advanced model evaluation: bias/fairness testing, robustness analysis, domain-specific benchmarks.
Experience with OpenShift/RHOAI for enterprise AI workloads.Benchmarking frameworks: GuideLLM, HELM (Holistic Evaluation of Language Models), Eval Harness.
Security scanning for ML artifacts and containers (Trivy, Grype).
This position is open to all candidates.