דרושים » תוכנה » ML Engineer - AI Infra Group

משרות על המפה
 
בדיקת קורות חיים
VIP
הפוך ללקוח VIP
רגע, משהו חסר!
נשאר לך להשלים רק עוד פרט אחד:
 
שירות זה פתוח ללקוחות VIP בלבד
AllJObs VIP
כל החברות >
15/01/2026
משרה זו סומנה ע"י המעסיק כלא אקטואלית יותר
שם חברה חסוי
מיקום המשרה: תל אביב יפו
סוג משרה: משרה מלאה
משרות דומות שיכולות לעניין אותך
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
11/02/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
Join our companys AI research group, a cross-functional team of ML engineers, researchers and security experts building the next generation of AI-powered security capabilities. Our mission is to leverage large language models to understand code, configuration, and human language at scale, and to turn this understanding into security AI capabilities which will drive our company AI future security solutions.
We foster a hands-on, research-driven culture where youll work with large-scale data, modern ML infrastructure, and a global product footprint that impacts over 100,000 organizations worldwide.
Key Responsibilities
Your Impact & Responsibilities
As a Senior ML Research Engineer, you will be responsible for the end-to-end lifecycle of large language models: from data definition and curation, through training and evaluation, to providing robust models that can be consumed by product and platform teams.
Own training and fine-tuning of LLMs / seq2seq models: Design and execute training pipelines for transformer-based models (encoder-decoder, decoder-only, retrievalaugmented, etc.), and fine-tune open-source LLMs on our company-specific data (security content, logs, incidents, customer interactions).
Apply advanced LLM training techniques such as instruction tuning, preference / contrastive learning, LoRA / PEFT, continual pre-training, and domain adaptation where appropriate.
Work deeply with data: define data strategies with product, research and domain experts; build and maintain data pipelines for collecting, cleaning, de-duplicating and labeling large-scale text, code and semi-structured data; and design synthetic data generation and augmentation pipelines.
Build robust evaluation and experimentation frameworks: define offline metrics for LLM quality (task-specific accuracy, calibration, hallucination rate, safety, latency and cost); implement automated evaluation suites (benchmarks, regression tests, redteaming scenarios); and track model performance over time.
Scale training and inference: use distributed training frameworks (e.g. DeepSpeed, FSDP, tensor/pipeline parallelism) to efficiently train models on multi-GPU / multi-node clusters, and optimize inference performance and cost with techniques such as quantization, distillation and caching.
Collaborate closely with security researchers and data engineers to turn domain knowledge and threat intelligence into high-value training and evaluation data, and to expose your models through well-defined interfaces to downstream product and platform teams.
Requirements:
What You Bring
5+ years of hands-on work in machine learning / deep learning, including 3+ years focused on NLP / language models.
Proven track record of training and fine-tuning transformer-based models (BERT-style, encoder-decoder, or LLMs), not just consuming hosted APIs.
Strong programming skills in Python and at least one major deep learning framework (PyTorch preferred; TensorFlow).
Solid understanding of transformer architectures, attention mechanisms, tokenization, positional encodings, and modern training techniques.
Experience building data pipelines and tools for large-scale text / log / code processing (e.g. Spark, Beam, Dask, or equivalent frameworks).
Practical experience with ML infrastructure, such as experiment tracking (Weights & Biases, MLflow or similar), job orchestration (Airflow, Argo, Kubeflow, SageMaker, etc.), and distributed training on multi-GPU systems.
Strong software engineering practices: version control, code review, testing, CI/CD, and documentation.
Ability to own research and engineering projects end-to-end: from idea, through prototype and controlled experiments, to models ready for integration by product and platform teams.
Good communication skills and the ability to work closely with non-ML stakeholders (security experts, product managers, engineers).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8541239
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
22/02/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
Were growing fast, and our team is passionate about pushing AI engineering to new heights - solving complex problems in LLM training, inference optimization, reasoning, and agent orchestration at scale.
About the Role:
As a Machine Learning Engineer, youll work on cutting-edge
code-focused LLMs and AI agent systems
that power next-generation developer platform. Youll be at the center of research, model training, and productionization of intelligent systems that understand software deeply, collaborate with developers, and help automate engineering workflows end-to-end. Your work will immediately impact millions of engineers worldwide.
Responsibilities:
Push LLM Innovation: Research, design, and fine-tune domain-specific LLMs for code generation, refactoring, debugging, and multi-turn reasoning.
Agent-Oriented Development: Build multi-agent coding systems that integrate retrieval-augmented generation (RAG), code execution, testing, and tool use to create autonomous, context-aware coding workflows.
Production-Grade AI: Own the training-to-inference pipeline for large code models-optimize inference with quantization, distillation, and caching techniques.
Rapid Experimentation: Prototype and validate ideas quickly; leverage reinforcement learning, human feedback, and synthetic data generation to push accuracy and reasoning.
Cross-Functional Collaboration: Partner with product, engineering, and design teams to ship AI-powered features that help developers focus on high-impact work.
Scale the Platform: Contribute to distributed training, scalable serving systems, and GPU/TPU-efficient architectures for ultra-low-latency developer tools.
Requirements:
2+ years of hands-on experience designing, training, and deploying machine-learning models
M.Sc. or higher in Computer Science / Mathematics / Statistics or equivalent from a university, or B.Sc. with strong hands-on ML experience
Practical experience with Natural Language Processing (NLP) and LLMs
Experience with data acquisition, data cleaning, and data pipelines
A passion for building products and helping people, both customers and colleagues
All-around team player, fast, self-learning individual
Nice to have:
3+ years of development experience with a passion for excellence
Experience building AI coding assistants, code reasoning models, or dev-focused LLM agents.
Familiarity with RAG, function-calling, and tool-using LLMs.
Knowledge of model optimizations (quantization, distillation, LoRA, pruning).
Startup or product-driven ML experience, especially in high-scale, latency-sensitive environments.
Contributions to open-source AI or developer tools.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8556109
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
25/02/2026
Location: Tel Aviv-Yafo
Job Type: Full Time
It starts with you - a senior ML engineer responsible for building, training, evaluating, and operating machine learning systems in production. The role focuses on data pipelines, model training, experimentation, evaluation, and scalable deployment.
If you want to grow your skills building AI products for mission-critical AI, join our companys mission - this role is for you.
The Responsibilities
Design, train, and evaluate ML models for production use.
Build and maintain data pipelines for training, validation, and inference.
Own experimentation workflows: feature engineering, training runs, and comparison.
Implement model evals, monitoring, and drift detection.
Package and deploy models to production systems.
Optimize training and inference performance, cost, and reliability.
Collaborate with data, platform, and product teams.
Mentor engineers and promote ML engineering best practices.
Requirements:
4+ years software engineering experience with 2+ years applied ML in production.
Strong foundations in machine learning, statistics, and data analysis.
Hands-on experience with model training frameworks (e.g., PyTorch, TensorFlow, JAX).
Experience with distributed training and large-scale datasets.
Experience building data pipelines, feature engineering, and dataset versioning.
Proven experience designing and operating ML evals, experiment tracking, and monitoring.
Familiarity with feature stores, model registries, and ML lifecycle management.
Experience with model serving patterns and production deployment.
Proficiency in Python and strong system design skills.
Experience deploying ML systems on Kubernetes or similar platforms.
Familiarity with GPU acceleration and performance optimization.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8561447
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
11/02/2026
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
Join our companys AI research group, a cross-functional team of ML engineers, researchers and security experts building the next generation of AI-powered security capabilities. Our mission is to leverage large language models to understand code, configuration, and human language at scale, and to turn this understanding into security AI capabilities that will drive our companys future security solutions.
We foster a hands-on, research-driven culture where youll work with large-scale data, modern ML infrastructure, and a global product footprint that impacts over 100,000 organizations worldwide.
Key Responsibilities
Your Impact & Responsibilities
As a Data Engineer - AI Technologies, you will be responsible for building and operating the data foundation that enables our LLM and ML research: from ingestion and augmentation, through labeling and quality control, to efficient data delivery for training and evaluation.
You will:
Own data pipelines for LLM training and evaluation
Design, build and maintain scalable pipelines to ingest, transform and serve large-scale text, log, code and semi-structured data from multiple products and internal systems.
Drive data augmentation and synthetic data generation
Implement and operate pipelines for data augmentation (e.g., prompt-based generation, paraphrasing, negative sampling, multi-positive pairs) in close collaboration with ML Research Engineers.
Build tagging, labeling and annotation workflows
Support human-in-the-loop labeling, active learning loops and semi-automated tagging. Work with domain experts to implement tools, schemas and processes for consistent, high-quality annotations.
Ensure data quality, observability and governance
Define and monitor data quality checks (coverage, drift, anomalies, duplicates, PII), manage dataset versions, and maintain clear documentation and lineage for training and evaluation datasets.
Optimize training data flows for efficiency and cost
Design storage layouts and access patterns that reduce training time and cost (e.g., sharding, caching, streaming). Work with ML engineers to make sure the right data arrives at the right place, in the right format.
Build and maintain data infrastructure for LLM workloads
Work with cloud and platform teams to develop robust, production-grade infrastructure: data lakes / warehouses, feature stores, vector stores, and high-throughput data services used by training jobs and offline evaluation.
Collaborate closely with ML Research Engineers and security experts
Translate modeling and security requirements into concrete data tasks: dataset design, splits, sampling strategies, and evaluation data construction for specific security use.
דרישות:
What You Bring
3+ years of hands-on experience as a Data Engineer or ML/Data Engineer, ideally in a product or platform team.
Strong programming skills in Python and experience with at least one additional language commonly used for data / backend (e.g., SQL, Scala, or Java).
Solid experience building ETL / ELT pipelines and batch/stream processing using tools such as Spark, Beam, Flink, Kafka, Airflow, Argo, or similar.
Experience working with cloud data platforms (e.g., AWS, GCP, Azure) and modern data storage technologies (object stores, data warehouses, data lakes).
Good understanding of data modeling, schema design, partitioning strategies and performance optimization for large datasets.
Familiarity with ML / LLM workflows: train/validation/test splits, dataset versioning, and the basics of model training and evaluation (you dont need to be the primary model researcher, but you understand what the models need from the data).
Strong software engineering practices: version control, code review, testing, CI/CD, and documentation.
Ability to work independently and in collaboration with ML engineers, researchers and security experts, and to translate high-level requirements into concrete data engineering tasks.
Nice to Have המשרה מיועדת לנשים ולגברים כאחד.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8541065
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
Required Software Engineer - Al Platform
What this role is really about:
You're building our AI platform. The internal system that powers AI capabilities across Product, Customer Success, Sales, Operations, Data, and IT.
This is a full-stack role where you'll own features end-to-end: design React interfaces for AI workflows, build Lambda functions that orchestrate multi-agent processes, integrate with enterprise systems (Salesforce, Workato, Snowflake), and optimize costs and performance at scale. You'll work with cutting-edge AI while building production-grade systems that handle real business operations.
If you want to build something that directly enables business growth, work across the full stack with modern tech, and have ownership over a platform that the entire company depends on, this is your opportunity.
Job responsibilities:
Build core AI platform services - Design and implement agent orchestration, prompt management, RAG, Connectors, and evaluation pipelines that power AI experiences across the company.
Develop complex agentic process - Develop a multi-step workflow that coordinates tools and services with proper observability, guardrails, and cost controls (using OpenAI Agent SDK, LangGraph, or a similar framework).
Build LLM evaluation and optimization process -Develop evaluation harnesses, offline/online experiments, prompt-testing frameworks, and dashboards to balance quality, latency, and spend across all AI services.
Requirements:
5+ years of hands‑on software engineering experience building production systems at scale.
Strong proficiency in Python, with Practical knowledge of databases.
Strong grounding of LLM/AI application patterns (RAG, tool use, function calling, guardrails) and vendor APIs (OpenAI or similar).
Experience with vector store (pgvector, Pinecone, OpenSearch), feature/semantic layers, or retrieval pipelines
Familiarity with: eval frameworks, prompt/version management, offline/online A/B testing, and cost/latency optimization.
Clear written and verbal communication; able to drive alignment with concise design docs and reviews.
Nice to have:
Experience building developer platforms or internal tooling
Hands-on experience with model optimization, fine-tuning, or distillation techniques.
Deep experience with cloud infrastructure (AWS), containers (Docker, Kubernetes), and distributed systems.
Frontend development frameworks such as React.
Background in SaaS/enterprise environments with compliance requirements (SOC2, GDPR).
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8557290
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
Location: Tel Aviv-Yafo
Job Type: Full Time
Required Machine Learning Engineer II - GenAI Applications
26947
About the team:
This opening is for the GenAI Applications Team within the Data & AI Marketplace department.
The GenAI Applications team is responsible for designing and delivering agentic, ML-powered solutions for some of our most impactful products, including booking search experiences, trip planning, and trip helpfulness. The team builds AI-driven applications and conversational agents, such as chatbots and intelligent assistants, that significantly enhance the end-to-end customer experience.
Role Description:
As a Machine Learning Engineer, you will work closely with experienced engineers and ML scientists to build scalable, production-grade GenAI applications. Your work will focus on designing, training, and deploying ML systems leveraging LLMs,, recommendation systems, and agent-based architectures, using state-of-the-art technologies. These solutions will directly power customer-facing experiences and play a key role in shaping the future of AI-driven travel products.
Key Job Responsibilities and Duties:
Deploying machine learning models: Design, develop and deploy in collaboration with scientists, scalable machine learning models and algorithms that provide content related insights and generative AI applications, ensuring scalability, efficiency, and accuracy.
Evaluating possible architecture solutions by taking into account cost, business requirements, emerging technologies, and technology requirements, like latency, throughput, and scale.
Generative AI Development: Contribute to the development of generative models such as GPT (Generative Pre-trained Transformer) variants or similar architectures for creative content generation, Q&A, chatbots, translation or other innovative applications.
Deployment and integration: Work closely with software engineers to integrate machine learning models into production systems. Ensure seamless deployment and efficient model inference in real-time environments. Collaborate with DevOps to implement effective monitoring and maintenance strategies.
Owning a service end to end by actively monitoring application health and performance, setting and monitoring relevant metrics and acting accordingly when violated.
Maintain clean, scalable code, ensuring reproducibility and easy integration of models into production environments, including CI/CD.
Collaborate with multidisciplinary teams: Collaborate with product managers, data scientists, and analysts to understand business requirements and translate them into machine learning solutions.
Requirements:
We are looking for driven MLEs who enjoy solving problems, who initiate solutions and discussions and who believe that any challenge can be scaled with the right mindset and tools.
We have found that people who match the following requirements are the ones who fit us best:
Bachelors or masters degree in computer science, Engineering, Statistics, or a related field.
Minimum of 4 years of experience as a Machine Learning Engineer or a similar role, with a consistent record of successfully delivering ML solutions.
Strong programming skills in languages such as Python and Java.
Experience with cloud frameworks like AWS sagemaker for training, evaluation and serving models using TensorFlow, PyTorch, or scikit-learn.
Experience with big data processing frameworks such, Pyspark, Apache Flink, Snowflake or similar frameworks.
Experience with data at scale using MySQL, Pyspark, Snowflake and similar frameworks.
Demonstrable experience with MySQL, Cassandra, DynamoDB or similar relational/NoSQL database systems.
Deep understanding of machine learning algorithms, statistical models, and data structures.
Experience in deploying large-scale language models like GPT, BERT, or similar architectures - an advantage.
Proficiency in data manipulation, analysis, and visualization using tools like NumPy, pandas, and matplotlib - an advantage.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8560104
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
17/02/2026
Location: Tel Aviv-Yafo
Job Type: Full Time and Hybrid work
Required Machine learning operations engineer
Your Mission:
As an MLOps Engineer, your mission is to design, build, and operate the platforms that power our machine learning and generative AI products spanning real-time use cases such as large-scale fraud scoring, MCP & agentic workflows support. Youll create reliable CI/CD for models and Agents, robust data/feature pipelines, secure model serving, and comprehensive observability. You will also support our agentic AI ecosystem and Model Context Protocol (MCP) services so that models can safely use tools, data, and actions across.
You will partner closely with Data Scientists, Data/Platform Engineers, Product, and SRE to ensure every model from classic ML to LLM/RAG agents moves from prototype to production with strong reliability, governance, cost efficiency, and measurable business impact.
Responsibilities:
Operate & Develop ML/LLM platforms on Kubernetes + cloud (Azure; AWS/GCP ok) with Docker, Terraform, and other relevant tools
Manage object storage, GPUs, and autoscaling for training & low-latency model serving
Manage cloud environment, networking, service mesh, secrets, and policies to meet PCI-DSS and data-residency requirements
Build end-to-end CI/CD for models/agents/MCP tooling (versioning, tests, approvals)
Deliver real-time fraud/risk scoring & agent signals under strict latency SLOs.
Maintain MCP servers/clients: tool/resource definitions, versioning, quotas, isolation, access controls
Integrate agents with microservices, event streams, and rule engines; provide SLAs, tracing, and on-call runbooks
Measure operational metrics of ML/LLM (latency, throughput, cost, tokens, tool success, safety events)
Enforce governance: RBAC/ABAC, row-level security, encryption, PII/secrets management, audit trails.
Partner with DS on packaging (wheels/conda/containers), feature contracts, and reproducible experiments.
lead incident response and post-mortems.
Drive FinOps: right-sizing, GPU utilization, batching/caching, budget alerts.
Requirements:
4+ years in DevOps/MLOps/Platform roles building and operating production ML systems (batch and real-time)
Strong hands-on with Kubernetes, Docker, Terraform/IaC, and CI/CD
Practical experience with Spark/Databricks and scalable data processing
Proficiency in Python & Bash
Ability to operate DS code and optimize runtime performance.
Experience with model registries (MLflow or similar), experiment tracking, and artifact management.
Production model serving using FastAPI/Ray Serve/Triton/TorchServe, including autoscaling and rollout strategies
Monitoring and tracing with Prometheus/Grafana/OpenTelemetry; alerting tied to SLOs/SLAs
Solid understanding of PCI-DSS/GDPR considerations for data and ML systems
Experience with the Azure cloud environment is a big plus
Operating LLM/agent workloads in production (prompt/config versioning, tool execution reliability, fallback/retry policies)
Building/maintaining RAG stacks (indexing pipelines, vector DBs, retrieval evaluation, hybrid search)
Implementing guardrails (policy checks, content filters, allow/deny lists) and human-in-the-loop workflows
Experience with feature stores - Qwak Feature Store, Feast
A/B testing for models and agents, offline/online evaluation frameworks
Payments/fraud/risk domain experience; integrating ML outputs with rule engines and operational systems - Advantage
Familiarity with Databricks Unity Catalog, dbt, or similar tooling.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8550121
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
we are seeking a senior AI Researcher to join its R&D group and lead the frontier of large-scale LLM optimization. You will focus on maximizing performance, scalability, and efficiency of LLM training and inference across massive GPU clusters, bridging deep learning research, distributed systems design, and hardware-aware optimization.
At our company, we treat AI performance as a systems problem. Just as we reinvented networking through disaggregation and software-defined scale, were applying the same philosophy to AI infrastructure. Your work will directly influence how large models are deployed, scaled, and optimized across high-density compute environments.
Key Responsibilities
● Conduct cutting-edge research in artificial intelligence and machine learning, from problem formulation to experimental validation.
● Research, design, implement and evaluate novel algorithms, models, optimization strategies and architectures across areas of large-scale LLM training and inference (e.g., tensor/pipeline/expert parallelisms, quantization, prefill/decode disaggregation, GPU communication optimization).
● Translate research ideas into working prototypes and production-ready solutions.
● Stay up to date with state-of-the-art research, frameworks, and emerging trends in the AI ecosystem.
● Publish research findings internally and externally (papers, technical reports, blog posts, or patents) and present results to internal and external technical audiences.
● Collaborate closely with engineers, product teams, and other researchers to align research with real- world impact
● Profile distributed training and inference pipelines - identifying algorithmic, memory, and scheduling inefficiencies to contribute to a technical decision-making and long-term research roadmaps.
● Validate research through measurable impact, higher throughput, better FLOPS utilization, improved convergence efficiency, or reduced compute cost.
Requirements:
● Strong foundation in machine learning, deep learning, and statistical modeling.
● Deep understanding of deep learning internals-transformer architectures, distributed training paradigms, precision scaling, and optimizer behavior.
● Proven hands-on experience training or deploying LLMs on multi-GPU and/or multi-node clusters.
● Ability to read, understand, and critically evaluate academic research papers. Demonstrated ability to translate theoretical ideas into practical, production-level performance improvements.
● Strong problem-solving skills and ability to work independently on open-ended research problems.
● Clear written and verbal communication skills in English.
Optional Qualifications
● MSc or PhD in Computer Science, Electrical Engineering, Mathematics or a related quantitative field.
● Strong mathematical background, including linear algebra, probability, and optimization.
● Strong grasp of parallel and distributed systems principles, including communication collectives, load balancing, and scaling bottlenecks.
● Proficiency with frameworks like DeepSpeed, Megatron-LM, NeMo VLLM, SGLang, or equivalent large- scale training ecosystems.
● Understanding of CUDA, Triton, or low-level GPU kernel development, and experience profiling large
models across multi-node GPU systems.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8549876
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
לפני 4 שעות
Job Type: Full Time
We're looking for a Senior AI/MLOps Engineer to join a group that specializes in Security and Networking, and specifically ML, AI and agent development. As a Senior AI/MLOps Engineer, youll build and maintain the infrastructure, tools and processes necessary to support the AI lifecycle in a production environment. You will collaborate closely with data scientists, software engineers, security architects and DevOps teams to ensure smooth deployment, modeling and optimization of AI models. This role involves creative problem solving alongside engineering teams, and is pivotal for the continued success of AI networking security.

What youll be doing:

Developing, improving and optimizing scalable infrastructure for handling and deploying security and networking AI models and agents in production, ensuring high availability, scalability, reproducibility, and performance.

Optimizing AI models and agents for performance, scalability, and resource utilization, considering factors such as latency, efficiency, and cost.

Monitoring and deploying agentic systems, LLMs, and ML models in production.

Designing and implementing frameworks/pipelines for AI training, inference, and experimentation.

Collaborating closely with data scientists, security architects and software engineers to operationalize and deploy AI models and agents, including packaging and integration with existing systems. Participate in developing and reviewing code, design documents, use case reviews, and test plan reviews.

Collaborating with DevOps teams to integrate pipelines and workflows into the CI/CD process, ensuring flawless deployments and rollbacks.

Building and maintaining monitoring and alerting systems to proactively identify and resolve issues relating to quality, performance and infrastructure.

Implementing access controls, authentication mechanisms, and encryption standards for AI models and data.

Documenting guidelines, and standard operating procedures for MLOps/AI processes and sharing knowledge with the wider team.

Develop proof-of-concepts for new features.
Requirements:
What we need to see:

BSc/MSc in CS/CE or related field (or equivalent experience).

Strong background in AI with experience deploying and monitoring AI/ML models, LLMs and agents to production systems at scale, including distributed and multi-node environments - at least 5 years of experience.

Proficiency in programming languages such as Python, Java, or Scala, along with experience in using ML/AI frameworks and libraries (e.g. TensorFlow, PyTorch).

Proficiency in microservices architecture, container orchestration, cloud platforms, and scalable infrastructure for training and inference workloads.

Knowledge of inference optimization techniques.

Understanding of build infrastructure and CI/CD tools and practices (e.g. GitLab, GitHub Actions, Jenkins).

You are detail-oriented and care deeply about robust, well tested, high-performance code in production environments.

You are proactive, take full ownership of your deliverables, have a can-do approach, and excellent communication and collaboration skills, able to work effectively in multifunctional teams.

Ways to stand out from the crowd:

Knowledge of network protocols and Linux internals.

Security and networking background, with knowledge of security protocols, network architectures, firewalls, intrusion detection systems, and other relevant security and networking concepts.

Experience deploying and optimizing generative models and agents.

Knowledge of network security principles and practices.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8586605
סגור
שירות זה פתוח ללקוחות VIP בלבד
סגור
דיווח על תוכן לא הולם או מפלה
מה השם שלך?
תיאור
שליחה
סגור
v נשלח
תודה על שיתוף הפעולה
מודים לך שלקחת חלק בשיפור התוכן שלנו :)
03/03/2026
חברה חסויה
Location: Tel Aviv-Yafo
Job Type: Full Time
we aren't building a single, generic chatbot. We are building a Composable AI Microservice Architecture, a swarm of hundreds of hyper-specialized AI services, each meticulously "programmed" to solve small, focused tasks with high precision. This fleet powers Ava, our AI support engine, and a suite of cutting-edge generative tools for travel and expense management.
As a Senior AI Ops / MLOps Engineer, you are the architect of the platform that makes this scale possible. You will move beyond traditional MLOps to manage a "factory" of Language Models. Your challenge is one of orchestration and standardization, ensuring that every service in the swarm meets a rigorous bar for quality, reliability, and cost-efficiency.
What You'll Do:
Orchestrate the AI Fleet: Build and own the runtime environment for 100+ specialized AI services. Manage model routing, context versioning, and standardized memory/history stores.
High-Density Inference Optimization: Design and implement SageMaker Multi-Model Endpoints (MME) and Inference Components to serve multiple tuned SLMs per GPU, maximizing hardware utilization while minimizing latency.
Deterministic Service Excellence: Treat reliability as a layered engineering problem. Build deterministic "shells" around probabilistic LM outputs, prioritizing data-layer validation and strict serialization.
Automated Evaluation & Observability: Implement "LLM-as-a-judge" patterns and automated benchmarking to detect semantic drift and hallucinations across the fleet before they impact the user.
Standardize the Workflow: Obsess over building reusable patterns and Terraform-based infrastructure that eliminate "snowflake" configurations, allowing us to deploy new specialized AI tasks in minutes.
Agency Strategy: Partner with AI Researchers to find the "Goldilocks zone" for agentic autonomy-balancing the flexibility of LLM tool-use with the precision required for production stability.
Requirements:
Experience: 5+ years in SRE, Platform Engineering, or MLOps, with at least 2 years focused on deploying LLMs/SLMs in production environments.
SageMaker Mastery: Deep hands-on expertise with AWS SageMaker, specifically configuring Multi-Model Endpoints (MME), Inference Components, and GPU-backed instances (G5/P4).
SLM Expertise: Proven experience with Small Language Models (e.g., Mistral, Llama 3, Phi) and parameter-efficient fine-tuning (PEFT) deployment strategies like LoRA/QLoRA.
Technical Stack: * Languages: Strong proficiency in Python and Terraform.
Orchestration: Experience with Docker, Kubernetes (EKS), or AWS ECS/Fargate.
Data: Familiarity with Snowflake and Vector Databases.
The "AI Ops" Mindset: You understand that AI at scale is a statistical challenge. You are comfortable debugging issues at the data/serialization layer rather than defaulting to prompt tweaks.
CI/CD & Automation: Experience building robust pipelines (Jenkins, GitHub Actions) for non-deterministic software, including automated "eval" stages.
Education: BS or MS in Computer Science, Engineering, Mathematics, or a related technical field.
This position is open to all candidates.
 
Show more...
הגשת מועמדותהגש מועמדות
עדכון קורות החיים לפני שליחה
עדכון קורות החיים לפני שליחה
8567347
סגור
שירות זה פתוח ללקוחות VIP בלבד