The Senior Machine Learning Engineer GenAI is responsible for designing, implementing, and operating large-scale systems and tools for AI model benchmarking, optimization, and validation. Unlike a traditional ML Engineer focused primarily on model training, this role centers on building the infrastructure, automation, and services that enable systematic evaluation and performance tuning of LLMs at scale.
This position combines deep understanding of model serving frameworks, GPU optimization, and benchmarking methodologies with strong software engineering skills to deliver reliable, reproducible, and production-grade evaluation pipelines. The engineer will design and maintain validation-as-a-service platforms that allow internal and external stakeholders to assess models across latency, throughput, accuracy, and cost dimensionsintegrating seamlessly with our AI ecosystem and industry-standard GenAI tooling.
A core aspect of this role is creating a robust, extensible benchmarking and validation framework capable of running across diverse inference engines, hardware configurations, and deployment environments, while providing actionable insights for model selection, optimization, and integration.
What you will do:
Benchmarking Platform Development: Design and implement scalable benchmarking pipelines for LLM performance measurement (latency, throughput, accuracy, cost) across multiple serving backends and hardware types.
Optimization Tooling: Build utilities and automation to profile, debug, and optimize inference performance (GPU utilization, memory footprint, CUDA kernels, parallelism strategies).
Validation-as-a-Service: Develop APIs and self-service platforms for model evaluation, enabling teams to run standardized benchmarks on demand.
Serving Integration: Integrate and operate high-performance serving frameworks (vLLM, TGI, LMDeploy, Triton) with cloud-native deployment patterns.
Dataset & Scenario Management: Create reproducible workflows for dataset preparation, augmentation, and scenario-based testing to ensure robust evaluation coverage.
Observability & Diagnostics: Implement real-time monitoring, logging, and metrics dashboards (Prometheus, Grafana) for benchmark and inference performance.
Cloud-Native Orchestration: Deploy and manage benchmarking workloads on Kubernetes (Helm, Argo CD, Argo Workflows) across AWS/GCP GPU clusters.
Integration with GenAI Tooling: Leverage Hugging Face Hub, OpenAI SDK, LangChain, LlamaIndex, and internal frameworks for streamlined evaluation workflows.
Performance Engineering: Identify bottlenecks, apply targeted optimizations, and document best practices for inference scalability.
Ecosystem Leadership: Track emerging frameworks, benchmarks, and optimization techniques to continuously improve the evaluation platform.
Requirements: What you ill bring:
Advanced Python for backend development, data processing, and ML/GenAI pipelines.
Kubernetes (Deployments, Services, Ingress) and Helm for large-scale distributed training and inference workloads.
LLM training, fine-tuning, and optimization (PyTorch, DeepSpeed, HF Transformers, LoRA/PEFT).
GPU optimization expertise: CUDA, mixed precision, tensor/sequence parallelism, memory management, and throughput tuning.
High-performance model serving with vLLM, TGI, LMDeploy, Triton, and API-based serving (OpenAI, Mistral, Anthropic).
Benchmarking and evaluation pipelines: dataset preparation, accuracy/latency/throughput measurement, and costperformance tradeoffs.
Multi-model, multi-engine comparative testing for optimal deployment decisions.
Hugging Face Hub for model/dataset management, including private hosting and pipeline integration.
GenAI development tools: OpenAI SDK, LangChain, LlamaIndex, Cursor, Copilot.
Argo CD & Argo Workflows for reproducible, automated ML pipelines.
CI/CD (GitHub Actions, Jenkins) for ML lifecycle automation.
This position is open to all candidates.