Required Research Scientist - Audiovisual Understanding, Model Foundations
Team & role
The Core Generative AI team is a unified group of researchers and engineers dedicated to developing our generative foundational models that serve LTX Studio, our AI-based video creation platform. Our focus is on creating a controllable, cutting-edge video generative model by merging cutting-edge algorithms with exceptional engineering. This involves enhancing machine learning components within our sophisticated internal training framework, crucial for developing advanced models. We specialize in both research and engineering that enable efficient and scalable training and inference, allowing us to deliver state-of-the-art AI-generated video models.
As a Large Scale Video Understanding Research Scientist, you will play a key role in improving video generation quality and efficiency by improving video and audio understanding pipelines used for both training data construction and model evaluation.. This role demands hands-on work with large-scale Video Language Models (VLLMs), including fine-tuning, post-training, and control, alongside implementing classic computer vision and signal processing algorithms and applying strong research skills. Your expertise in post-training and controlling large scale foundational models, understanding statistics, implementing complex systems and eliminating bugs will be crucial, as our video training sets consist of petabytes of data processed across hundreds to thousands of virtual machines.
What you will be doing
Fine-tune and control VLLMs for video and audio understanding.
Design algorithms for balancing, filtering, and curating training and evaluation datasets, informed by model behavior and failure modes.
Implement classic and modern algorithms for processing, clustering, evaluation and filtering of large scale datasets.
Work within high-performance, scalable distributed systems capable of handling petabytes of data, with attention to throughput, correctness, and reproducibility..
Collaborate with other researchers and product stakeholders to iteratively improve training sets and evaluation protocols through tight feedback loops driven by model performance.
Requirements: Experience training, fine-tuning, or post-training large-scale VLLMs or multimodal foundation models.
Strong software engineering skills, proficient in Jax or PyTorch.
Ability to develop and implement computer vision models for data filtering and evaluation.
Understanding of relevant topics in statistics, clustering.
Enjoys delving into system implementations to enhance performance and maintainability.
This role is designed for individuals who are not only technically proficient but also deeply passionate about pushing the boundaries of AI and machine learning through innovative engineering and collaborative research.
This position is open to all candidates.