We're seeking a seasoned Data Engineer to build the data infrastructure that fuels our groundbreaking intelligent agent. You'll play a crucial role in developing large-scale data-intensive systems that power Apollo's capabilities.
**What You'll Do:**
- Design and implement massive parallel processing solutions for both real-time and batch scenarios.
- Develop real-time stream processing solutions using technologies like Apache Kafka or Amazon Kinesis.
- Build infrastructures that bring machine learning capabilities to production.
- Orchestrate containerized applications in cloud environments (AWS and GCP).
- Write production-grade Python code and work with various database systems.
- Administer and design cloud-based data warehousing solutions.
- Work with unstructured data, complex data sets, and perform data modeling.
- Collaborate with cross-functional teams to integrate data solutions into our AI systems.
Requirements: **Who We're Looking For:**
- A seasoned Data Engineer with deep understanding of data modeling and massive parallel processing.
- Someone experienced in bringing Machine Learning capabilities into large-scale production systems.
- An individual with experience at a cutting-edge startup.
- A passionate builder of data infrastructures for advanced AI systems.
- A team player with excellent collaboration and communication skills.
- Someone with a "can-do" approach to problem-solving.
**Requirements:**
- 3+ years of experience building massive parallel processing solutions (e.g., Spark, Presto).
- 2+ years of experience developing real-time stream processing solutions (e.g., Apache Kafka, Amazon Kinesis).
- 2+ years of experience developing ML infrastructures for production (e.g., Kubeflow, Sagemaker, Vertex).
- Experience orchestrating containerized applications in AWS and GCP using EKS and GKE.
- 3+ years of experience writing production-grade Python code.
- Experience working with both relational and non-relational databases.
- 2+ years of experience administering and designing cloud-based data warehousing solutions (e.g., Snowflake, Amazon Redshift).
- 2+ years of experience working with unstructured data, complex data sets, and data modeling.
This position is open to all candidates.