Applied Scientist - Multimodal Guardrails & Evaluation

Adobe•San Jose, CA

About The Position

The success of generative AI guardrails is not determined solely by model architecture; it is defined by the quality of the concepts we protect, the robustness of our detection science, and the realism of our evaluation methodology. The Adobe Firefly Applied Science & Machine Learning team is expanding its IP guardrail systems for building safe and compliant image, video, and audio generative models. A central challenge is long-tail IP coverage: foundation models cannot internalize every protected concept. Advancing our guardrails therefore requires principled approaches to multimodal knowledge integration, detection modeling, and real-world evaluation science. We are seeking a P40 Applied Scientist to drive the scientific development of IP concept modeling, detection strategies, and evaluation frameworks that determine the robustness ceiling of our guardrail systems. This is a research-oriented role focused on advancing multimodal detection and evaluation methodologies at production scale. Research Areas You Will Drive Multimodal Concept Modeling & Knowledge Integration Develop principled methods for representing and organizing large-scale IP concept spaces across text, image, and audio. Study how retrieval-augmented systems (RAG), embedding alignment, and structured knowledge can complement multimodal foundation models. Investigate strategies for improving long-tail concept coverage beyond what VLMs inherently encode. Design concept modeling techniques that meaningfully influence downstream guardrail decisions. Data Acquisition & Curation for Detection Develop scalable approaches for acquiring and curating high-quality multimodal datasets that improve detection coverage for IP-sensitive concepts. Drive long-tail expansion through targeted data collection, web-scale sourcing, and synthetic data generation. Analyze failure cases in generative outputs to inform targeted data acquisition and dataset refinement. Design efficient data curation and labeling strategies that improve signal quality and robustness of downstream detection systems. Evaluation Methodology & Benchmark Design Define evaluation frameworks that reflect real-world Firefly usage patterns. Design multimodal benchmark datasets that stress-test guardrails under realistic and adversarial scenarios. Develop metrics that capture over-blocking, under-blocking, semantic similarity, and near-miss generation. Establish statistically rigorous offline and online evaluation strategies that guide research prioritization. Study how evaluation quality constrains and enables system-level progress. Scientific Iteration Guided by Input Leverage large-scale product feedback signals to identify systematic weaknesses in guardrail behavior. Translate real-world interaction patterns into structured evaluation hypotheses. Build reproducible experimental pipelines that enable continuous scientific iteration.

Requirements

PhD or MS in Computer Science, Machine Learning, AI, or related field.
5+ years of experience in applied ML, multimodal systems, or evaluation research.
Strong understanding of Vision-Language Models, multimodal transformers, and embedding-based retrieval systems.
Experience designing and analyzing large-scale benchmarks and evaluation datasets.
Solid background in statistical analysis, experimental design, and performance trade-off evaluation.
Proficiency in Python and modern ML frameworks (e.g., PyTorch).
Ability to reason about long-tail distributions and concept coverage gaps.
Experience analyzing complex multimodal system failure modes.
Strong intuition for measurement of quality and evaluation bias.
Comfort operating in ambiguous, research-driven problem spaces.
Demonstrated ability to use AI coding tools and AI-assisted workflows to rapidly prototype evaluation frameworks, detection experiments, and data analysis.
Ability to scale scientific insight through high-velocity experimentation.

Nice To Haves

Experience in safety evaluation, trust & safety systems, or content moderation science.

Responsibilities

Lead scientific design of IP concept expansion and detection methodologies.
Formulate hypotheses around long-tail coverage, detection robustness, and evaluation gaps.
Run rigorous experiments to quantify performance ceilings and identify high-leverage improvements.
Partner closely with generative model scientists to ensure alignment between detection, guidance, and evaluation systems.
Contribute to intellectual property and potential publications in multimodal learning, evaluation science, or AI safety.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume