Training builds capability. Post-training decides what it becomes. This team are rethinking how large multimodal models learn after pre-training — developing post-training and reinforcement learning methods that help models reason, plan, and interact in real time. Founded by the researchers behind several of the most influential modern AI architectures, this lab are pushing alignment and learning efficiency beyond standard RLHF. They’re scaling preference-based training (RLHF, DPO, hybrid feedback loops) to new model types and creating systems that learn from interaction rather than static data. You’ll work at the intersection of post-training, RL, and model architecture — designing reward models, scalable evaluation frameworks, and training strategies that make large-scale learning measurable and reliable. It’s applied research with direct impact, supported by serious compute and a tight researcher-to-GPU ratio. If you want to work where post-training meets architecture — shaping how foundation models learn, reason, and adapt — this is that opportunity. All applicants will receive a response.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed