Senior Software Engineer, Vision Language Models

Motional-posted 2 days ago

$175,000 - $234,000/Yr

Craft a resume that recruiters will want to see with Teal's resume Matching Mode

At Motional, data play a critical role in fueling our ML-centered autonomous driving vehicle. Our robo-taxi fleet collects petabytes of data on the road every day – the Data Mining team is mining & filtering the massive influx of fleet data by developing billion-scale data workflows and state-of-the-art mining algorithms. Through our mining and learning frameworks we continuously improve the on-road performance of ML products for perception, prediction & planning with every mile driven. We mine for model errors, anomalies, rare objects & long-tail driving scenarios across millions of driving hours – these are used for laser-focused ML model training and continuous edge case validation. We are looking for an engineer to spearhead new mining strategies & workflows and help deliver high-quality data that improve our core ML products.

Spearhead the development of cutting-edge data products by adapting and extending Vision-Language Models (VLMs) and other multimodal foundation models.
Apply advanced techniques like fine-tuning, RAG, in-context learning, continual pre-training, and knowledge distillation.
Design and curate high-quality multimodal datasets crucial for training and evaluating multimodal foundation models.
Develop innovative strategies for data curation, dataset creation, and synthetic data generation to optimize multimodal foundation models for long-tail event mining.
Drive the in-depth analysis of multimodal foundation models' performance, generalization, and robustness in diverse real-world settings.

MS/PhD in computer science or related fields with a strong emphasis on multimodal foundation models.
Strong publication record in premier conferences (e.g., CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR) demonstrating significant contributions to the field of vision-language understanding or multimodal foundation models.
Proficiency in Python and deep learning frameworks such as PyTorch, with a demonstrated ability to write clean, efficient, and maintainable code.

Experience in the application of Vision-Language Models (VLMs) or other multimodal foundation models to data mining in real-world settings.
Experience in production deployment of Vision-Language Models (VLMs) or other multimodal foundation models for real-world applications (e.g., image/video captioning, open-vocabulary image/video searching).
Experience with data from diverse sensor modalities (e.g., camera, lidar, radar).
Experience in applied machine learning for autonomous driving.

Medical, dental, vision insurance.
401k with a company match.
Health saving accounts.
Life insurance.
Pet insurance.

Track Jobs with Teal

Job Search Resources

•

AI Resume Builder

•

Senior Computer Vision Engineer Resume Examples

•

Senior Computer Vision Engineer Cover Letter Examples

Senior Software Engineer, Vision Language Models

Job Search Resources

Tools

Career Hubs

Guides

Company