We are looking for a Senior Data Engineer to join our Data Platform team, focused on building and evolving a secure, enterprise-grade Data Lake that powers large-scale global search, indexing, analytics, and AI-driven capabilities.
In this role, you will design and deliver scalable, compliant, and high-performance data pipelines that ingest, transform, and structure massive volumes of sensitive data to support mission-critical discovery and search workloads.
This position is ideal for a senior engineer who combines deep hands-on data engineering expertise with strong architectural thinking, particularly in regulated and security-sensitive environments. You will work closely with Product, Search, Backend, Security, and Data Science teams to ensure data is searchable, governed, reliable, and compliant by design.
Key Responsibilities:
Enterprise Data Lake Architecture:
Design and evolve a secure, scalable Data Lake architecture on AWS.
Define storage layout, partitioning strategies, and data organization optimized for large-scale search and analytics workloads.
Implement ACID-compliant table formats (e.g., Iceberg) to ensure reliability, consistency, and schema evolution.
Design ingestion patterns (batch and streaming) for high-volume, heterogeneous datasets.
Implement lifecycle management, retention policies, and environment isolation.
Global Search & Indexing Enablement:
Design data pipelines that prepare and structure data for global search and indexing systems.
Optimize data models and transformations to support high-performance search queries and distributed indexing.
Collaborate with search and backend teams to ensure efficient data availability and low-latency access patterns.
Support incremental ingestion, change-data-capture (CDC), and near real-time processing where required.
Ensure traceability and reproducibility of indexed datasets.
Secure & Regulated Data Engineering:
Implement strict access controls (IAM), encryption (at rest and in transit), and auditing mechanisms.
Ensure compliance with enterprise security and regulatory requirements.
Design systems with data lineage, traceability, and audit-readiness in mind.
Partner with Security and Compliance teams to support internal and external audits.
Handle sensitive and regulated datasets with strong governance and segregation controls.
Pipeline Development & Platform Engineering:
Build and maintain high-scale ETL/ELT pipelines using Apache Spark (EMR/Glue) and AWS-native services.
Leverage S3, Athena, Kinesis, Lambda, Step Functions, and EKS to support both batch and streaming workloads.
Implement Infrastructure as Code (Terraform / CDK / SAM) for reproducible environments.
Establish observability, monitoring, and SLA management for mission-critical pipelines.
Continuously optimize performance, scalability, and cost efficiency.
Cross-Functional Collaboration:
Work closely with Product Managers to translate global search and discovery requirements into scalable data solutions.
Collaborate with ML and Data Science teams to enable feature extraction and enrichment pipelines.
Contribute to architecture discussions and promote best practices in enterprise data engineering.
Provide documentation and clear technical artifacts for regulated environments.
דרישות:
Technical Expertise:
Strong hands-on experience with Apache Spark (EMR, Glue, PySpark).
Deep experience with AWS data services: S3, EMR, Glue, Athena, Lambda, Step Functions, Kinesis.
Proven experience designing and operating Data Lakes / Lakehouse architectures (Iceberg preferred).
Experience building scalable batch and streaming pipelines for large datasets.
Strong understanding of distributed systems and data modeling for search/indexing use cases.
Experience implementing secure, compliant data architectures (IAM, encryption, auditing).
Infrastructure as Code experience (Terraform / CDK / SAM).
Strong Python skills (TypeScript is a plus).
Enterprise & Search-Oriented Mindset המשרה מיועדת לנשים ולגברים כאחד.