Staff Generative AI Research Engineer, Multimodal, Agent Modeling - SIML

Apple•Cupertino, CA

About The Position

We are looking for a candidate with a proven track record in applied ML research. Responsibilities in the role will include training large scale-multimodal (2D/3D vision-language) models on distributed backends, deploying efficient neural architectures on device and private cloud compute, addressing emerging safety challenges to make the model/agents robust and aligned with human values. A key focus of the position is ensuring real-world quality, emphasizing model and agent safety, fairness, and robustness. You will collaborate closely with ML researchers, software engineers, and hardware and design teams across multiple disciplines. The core responsibilities include advancing the multimodal capabilities of large language models and strengthening agentic workflows. On the user experience front, the work will involve aligning image and video content to the space of LLMs for visual actions and multi-turn interactions, enabling rich, intuitive experiences powered by agentic AI systems.

Requirements

M.S. or PhD in Electrical Engineering/Computer Science or a related field (mathematics, physics or computer engineering), with a focus on computer vision and/or machine learning or comparable professional experience.
Strong ML and Generative Modeling fundamentals
Experience using one or more of the following: Reinforcement Learning, Distillation, and/or Pre-training or Post-training of Multimodal-LLMs
Familiarity with distributed training
Proficiency in using ML toolkits, e.g., PyTorch
You're aware of the challenges associated to the transition of a prototype into a final product
Proven record of research innovation and demonstrated leadership in both applied research and development

Nice To Haves

Experience with building & deploying AI agents, LLMs for tool use, and Multimodal-LLMs

Responsibilities

Training large scale-multimodal (2D/3D vision-language) models on distributed backends
Deploying efficient neural architectures on device and private cloud compute
Addressing emerging safety challenges to make the model/agents robust and aligned with human values
Ensuring real-world quality, emphasizing model and agent safety, fairness, and robustness
Collaborating closely with ML researchers, software engineers, and hardware and design teams across multiple disciplines
Advancing the multimodal capabilities of large language models and strengthening agentic workflows
Aligning image and video content to the space of LLMs for visual actions and multi-turn interactions, enabling rich, intuitive experiences powered by agentic AI systems.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume