Join our companys AI research group, a cross-functional team of ML engineers, researchers and security experts building the next generation of AI-powered security capabilities. Our mission is to leverage large language models to understand code, configuration, and human language at scale, and to turn this understanding into security AI capabilities that will drive our companys future security solutions.
We foster a hands-on, research-driven culture where youll work with large-scale data, modern ML infrastructure, and a global product footprint that impacts over 100,000 organizations worldwide.
Key Responsibilities
Your Impact & Responsibilities
As a Data Engineer - AI Technologies, you will be responsible for building and operating the data foundation that enables our LLM and ML research: from ingestion and augmentation, through labeling and quality control, to efficient data delivery for training and evaluation.
You will:
Own data pipelines for LLM training and evaluation
Design, build and maintain scalable pipelines to ingest, transform and serve large-scale text, log, code and semi-structured data from multiple products and internal systems.
Drive data augmentation and synthetic data generation
Implement and operate pipelines for data augmentation (e.g., prompt-based generation, paraphrasing, negative sampling, multi-positive pairs) in close collaboration with ML Research Engineers.
Build tagging, labeling and annotation workflows
Support human-in-the-loop labeling, active learning loops and semi-automated tagging. Work with domain experts to implement tools, schemas and processes for consistent, high-quality annotations.
Ensure data quality, observability and governance
Define and monitor data quality checks (coverage, drift, anomalies, duplicates, PII), manage dataset versions, and maintain clear documentation and lineage for training and evaluation datasets.
Optimize training data flows for efficiency and cost
Design storage layouts and access patterns that reduce training time and cost (e.g., sharding, caching, streaming). Work with ML engineers to make sure the right data arrives at the right place, in the right format.
Build and maintain data infrastructure for LLM workloads
Work with cloud and platform teams to develop robust, production-grade infrastructure: data lakes / warehouses, feature stores, vector stores, and high-throughput data services used by training jobs and offline evaluation.
Collaborate closely with ML Research Engineers and security experts
Translate modeling and security requirements into concrete data tasks: dataset design, splits, sampling strategies, and evaluation data construction for specific security use.
דרישות:
What You Bring
3+ years of hands-on experience as a Data Engineer or ML/Data Engineer, ideally in a product or platform team.
Strong programming skills in Python and experience with at least one additional language commonly used for data / backend (e.g., SQL, Scala, or Java).
Solid experience building ETL / ELT pipelines and batch/stream processing using tools such as Spark, Beam, Flink, Kafka, Airflow, Argo, or similar.
Experience working with cloud data platforms (e.g., AWS, GCP, Azure) and modern data storage technologies (object stores, data warehouses, data lakes).
Good understanding of data modeling, schema design, partitioning strategies and performance optimization for large datasets.
Familiarity with ML / LLM workflows: train/validation/test splits, dataset versioning, and the basics of model training and evaluation (you dont need to be the primary model researcher, but you understand what the models need from the data).
Strong software engineering practices: version control, code review, testing, CI/CD, and documentation.
Ability to work independently and in collaboration with ML engineers, researchers and security experts, and to translate high-level requirements into concrete data engineering tasks.
Nice to Have המשרה מיועדת לנשים ולגברים כאחד.