We are looking for a research intern to join us for a research project aimed at publication at a top-tier venue. The intern will design and develop novel agentic systems that leverage large language models and vision-language models to reason over extended video content.
Description
Our team focuses on generative AI applications for videos. You'll work alongside fellow researchers and engineers, leveraging Computer Vision and Agentic Systems technologies to build future products.
Responsibilities
Design and implement novel LLM-based agentic systems for long-form video understanding, targeting established academic benchmarks
Collaborate with researchers and engineers on the team to produce a publication-ready contribution
Benchmark against established evaluation suites and iterate toward state-of-the-art results
Requirements: Currently enrolled in a graduate program (M.Sc. or Ph.D.) in Computer Science, Electrical Engineering, or a related field
Publications at top-tier venues (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ACL, EMNLP or similar)
Strong programming skills in Python and experience with deep learning frameworks (e.g., PyTorch)
Solid foundation in computer vision, natural language processing, or multimodal learning
Proficiency with agentic development tools (e.g., Claude Code)
This position is open to all candidates.