Research Scientist - Post-training

Techire Ai•San Francisco, CA

16h•$300,000 - $600,000•Onsite

About The Position

Training builds capability. Post-training decides what it becomes. This team are rethinking how large multimodal models learn after pre-training — developing post-training and reinforcement learning methods that help models reason, plan, and interact in real time. Founded by the researchers behind several of the most influential modern AI architectures, this lab are pushing alignment and learning efficiency beyond standard RLHF. They’re scaling preference-based training (RLHF, DPO, hybrid feedback loops) to new model types and creating systems that learn from interaction rather than static data. You’ll work at the intersection of post-training, RL, and model architecture — designing reward models, scalable evaluation frameworks, and training strategies that make large-scale learning measurable and reliable. It’s applied research with direct impact, supported by serious compute and a tight researcher-to-GPU ratio. If you want to work where post-training meets architecture — shaping how foundation models learn, reason, and adapt — this is that opportunity. All applicants will receive a response.

Requirements

Experience in large-scale post-training or reinforcement learning (RLHF, DPO, or SFT pipelines)
A solid grasp of LLM or multimodal training systems
The curiosity to explore new optimisation and alignment methods