We're a leading force in the ad tech industry, revolutionizing how brands connect with their audiences. Our platform processes billions of ad impressions daily, generating massive datasets that drive our core business. We thrive on innovation and seek a Data Engineer to help us build and scale the data infrastructure that powers our insights and analytics. This is a unique opportunity to work with cutting-edge technologies and make a direct impact on our products.
What will you do?
As a Big Data Engineer, you'll be a key part of our data platform team, responsible for designing, building, and maintaining robust and scalable data pipelines. You'll work closely with data scientists, analysts, and server side engineers to ensure our data is reliable, accessible, and ready for analysis. Your expertise will be crucial in expanding our data warehouse and data lake capabilities, enabling us to deliver next-generation ad tech solutions.
Your mission will be to:
Develop and Optimize Data Pipelines: Design, build, and maintain ETL/ELT pipelines using Apache Spark to ingest, process, and transform large-scale datasets from various sources.
Manage Cloud Infrastructure: Architect and manage our data infrastructure primarily on Google Cloud Platform (GCP) or Amazon Web Services (AWS). This includes services like BigQuery, S3, GCS, EMR, and AirFlow.
Enhance Data Storage: Improve and manage our data warehouse and data lake solutions, ensuring data quality, consistency, and accessibility for business intelligence and machine learning applications.
Collaborate and Innovate: Partner with cross-functional teams to understand data needs and implement solutions that support new product features and business initiatives.
Ensure Data Integrity: Implement monitoring, alerting, and logging systems to maintain data pipeline health and ensure data accuracy.
Requirements: 4-5 years of professional experience in a data engineering or similar role.
Technical skills:
Strong proficiency in Java or Scala / Python.
Extensive experience with distributed big-data processing frameworks like Apache Spark / Flink / Hive / Trino.
Proven experience working with cloud-based data services on GCP or AWS (e.g.BigQuery, S3, GCS, EMR, DataProc).
Experience with real-time data streaming technologies like Kafka.
Deep understanding of data warehouse and data lake concepts and best practices of the Medallion Architecture (Bronze, Silver, Gold layers).
Knowledge of Apache Iceberg or Delta Lake
Solid understanding of IaC using Terraform
Familiarity with SQL and NoSQL databases.
Orchestration: Experience with pipeline orchestration and scheduling tools (e.g., Airflow).
Good communication skills and ability to work collaboratively within a team. You are an active listener and a dialogue facilitator, you know how to explain your decision and like sharing your knowledge.
Nice to Have
Familiarity with containerization (Docker/OrbStack, Kubernetes).
Knowledge of the ad tech ecosystem (e.g., DSPs, SSPs, Ad Exchanges).
Please submit your CV in English.
This position is open to all candidates.