As a Data Scientist at our company, you will play a critical role in developing data-driven solutions that uncover legal insights at scale. Youll work across domains-analyzing large and complex datasets, designing models (including those based on LLMs), and translating data into actionable insights that support our mission of surfacing justice.
Youll take full ownership of your analyses and models, from problem definition to experimentation and evaluation. Youll collaborate closely with engineers, domain experts, and product teams to design robust data science solutions that create real-world impact and inform product decisions.
Responsibilities:
Design and implement data science solutions using LLMs and legal datasets to surface insights and automate tasks.
Lead end-to-end development of ML pipelines-from data exploration and preprocessing to model training, deployment, and monitoring.
Build and maintain production-ready pipelines for experimentation, evaluation, and inference.
Collaborate with legal stakeholders and analysts to define problems, identify opportunities, and translate them into data-driven solutions.
Develop and iterate on LLM-based systems, including prompt engineering.
Write clean, maintainable Python code that meets production standards, including testing and documentation.
Stay curious and proactive-evaluating new tools, approaches, and technologies.
Communicate clearly across technical and non-technical audiences, contributing to a collaborative team culture.
Requirements: MSc in Computer Science, Data Science, Statistics, or a related quantitative field.
At least 6 years of industry experience applying data science to real-world problems, with a track record of deploying solutions to production.
Proven experience working with LLMs in production environments, including building, evaluating, and integrating them into workflows.
Solid understanding of Natural Language Processing (NLP) and its practical applications in industry settings.
Strong Python programming skills, with experience writing clean, maintainable, production-grade code.
Solid grasp of software engineering fundamentals, including object-oriented programming (OOP), testing practices, version control (Git), and CI/CD workflows.
Excellent communication and collaboration skills - able to clearly convey complex technical ideas to both technical and non-technical stakeholders.
Comfortable working independently and taking full ownership of projects in a fast-paced, cross-functional team environment.
Advantages:
Hands-on experience fine-tuning LLMs (e.g., using custom data or frameworks like Hugging Face, PEFT, LoRA) in production settings.
Familiarity with agentic workflows, including tools like LangGraph, AutoGen, or similar orchestration frameworks.
Experience working with or developing recommendation systems.
Experience in legal tech, NLP, or working with sensitive or domain-specific text data.
Background in applied research, evaluation strategies for LLMs, or prompt optimization at scale.
This position is open to all candidates.