Sr Data Scientist

The Walt Disney CompanySan Francisco, CA
1d$155,400 - $208,400Hybrid

About The Position

Our R&D teams at Lucasfilm and ILM are seeking a Sr Data Scientist to join a strategic R&D initiative focused on Generative AI. The goal of this project is to develop a robust data curation pipeline that can help identify and leverage our most useful assets and media for technical model training. You will play a critical role in bridging the gap between raw visual data and advanced machine learning applications. You will be responsible for the statistical analysis, sampling strategies, and evaluation metrics required to ensure our training data is diverse, relevant, and optimized for next-generation image and video synthesis. This role is considered Hybrid, which means the employee will work 2-3 days onsite at our San Francisco location and occasionally from home. This is a project position for 6 months.

Requirements

  • 5+ years experience in related field
  • Education - Bachelor’s degree in Data Science, Computer Science, or a related field of study, and/or equivalent work experience.
  • Proven background in Data Science with a strong emphasis on Computer Vision, Generative AI, or Deep Learning.
  • Proficiency in statistical analysis and dataset curation (distribution analysis, sampling techniques).
  • Familiarity with standard and novel metrics for evaluating Generative Models (e.g., FID, FVD, or similar).
  • Ability to translate complex statistical insights for engineering partners and non-technical creative leads.

Nice To Haves

  • Master’s Degree preferred
  • Experience working with large-scale unstructured media data is a plus.

Responsibilities

  • Data Strategy & Diversity Analysis Independently design and implement statistical methods to ensure curated datasets retain representative coverage across various visual attributes, stylistic choices, and subject matter.
  • Develop logic to identify and down-weight low-variance or repetitive data points to maximize training efficiency.
  • Collaborate with key stakeholders on algorithms for de-duplication to automatically eliminate redundant or near-identical assets from the training corpus.
  • Evaluation Metrics & Quality Assurance Design and lead implementation of automated metrics to assess the quality of generative images and videos.
  • Validate automated quantitative metrics by correlating them against qualitative feedback provided by senior creative stakeholders.
  • Establish success criteria for model fidelity, accuracy, and stylistic consistency.
  • Pipeline Integration Work closely with the engineering team to integrate data cleaning, normalization, and sampling modules into a scalable automated pipeline.
  • Assist in defining taxonomy and metadata standards to systematically organize unstructured visual assets.
  • Project Focus & Timeline This is a fast-paced, 6-month initiative. You will move through rapidly iterating phases: Phase 1: defining data taxonomy and establishing baseline automated metrics. Phase 2: refining metrics for temporal consistency and validating against initial model fine-tuning runs. Phase 3: final validation of metrics and delivery of fully curated, optimized datasets for cold storage.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service