Required Senior Data Engineer - Big Data
Our Core Data Platform team:
We're a leading force in the ad tech industry, revolutionizing how brands connect with their audiences.
Our platform processes billions of ad impressions daily, generating massive datasets that drive our core business.
We thrive on innovation and seek a Data Engineer to help us build and scale the data infrastructure that powers our insights and analytics.
This is a unique opportunity to work with cutting-edge technologies and make a direct impact on our products.
What will you do?
As a Big Data Engineer, you'll be a key part of our data platform team, responsible for designing, building, and maintaining robust and scalable data pipelines. You'll work closely with data scientists, analysts, and server side engineers to ensure our data is reliable, accessible, and ready for analysis. Your expertise will be crucial in expanding our data warehouse and data lake capabilities, enabling us to deliver next-generation ad tech solutions.
Your mission will be to:
Develop and Optimize Data Pipelines: Design, build, and maintain ETL/ELT pipelines using Apache Spark to ingest, process, and transform large-scale datasets from various sources.
Manage Cloud Infrastructure: Architect and manage our data infrastructure primarily on Google Cloud Platform (GCP) or Amazon Web Services (AWS). This includes services like BigQuery, S3, GCS, EMR, and AirFlow.
Enhance Data Storage: Improve and manage our data warehouse and data lake solutions, ensuring data quality, consistency, and accessibility for business intelligence and machine learning applications.
Collaborate and Innovate: Partner with cross-functional teams to understand data needs and implement solutions that support new product features and business initiatives.
Ensure Data Integrity: Implement monitoring, alerting, and logging systems to maintain data pipeline health and ensure data accuracy.
Requirements: 5+ years of data engineering experience, building and operating production data pipelines at scale (TB+ datasets, hourly/daily batch or streaming workloads).
Hands-on production experience with Apache Spark and distributed data processing frameworks such as Flink, Hive, or Trino. Strong understanding of large-scale batch and streaming pipelines, including performance tuning and troubleshooting. Language is not a filter: Scala, Python, or Java are all fine. What matters is that you can debug and ship production Spark code, not which language you write it in
Production experience building and operating data solutions on GCP or AWS, including cloud-native services such as BigQuery, Dataproc, GCS, S3, EMR, or Redshift. Experience across the full project lifecycle is preferred.
Production experience with Kafka or Kafka-compatible streaming platforms, including the development, operation, and troubleshooting of real-time data pipelines, as well as debugging production incidents involving consumer lag, partition rebalancing, or data loss.
Strong understanding of data warehouse and data lake concepts, including Medallion Architecture (Bronze, Silver, Gold) and data platform best practices.
Nice to have
Production experience with a lakehouse table format (Apache Iceberg or Delta Lake). The specific format matters less than the underlying concepts: you understand how the lake stores data on object storage, how a table format adds transactional semantics on top and how a data warehouse can read that data. You have dealt with schema evolution and table maintenance.
Please submit your CV in English.
This position is open to all candidates.