About The Position

We are seeking a highly skilled and hands-on Machine Learning Engineer specializing in large model post-training and alignment. This role focuses on designing, executing, and optimizing post-training pipelines to improve model performance, controllability, domain adaptation, and reasoning capabilities. You will work across the full lifecycle of post-training—from data strategy and reward modeling to reinforcement learning–based optimization and production-grade inference deployment.

Requirements

  • Bachelor's in Computer Science, AI, Machine Learning, or related fields with at least 8 years of industry experience.
  • Strong hands-on experience across the full post-training pipeline for large models.
  • Deep familiarity with preference learning and alignment techniques, including DPO, GRPO, and RL-based post-training methodologies.
  • Proven experience designing domain-specific data strategies and training methodologies.
  • Experience training and post-training specialized small models from scratch.
  • Solid understanding of reinforcement learning fundamentals and their application to model alignment.
  • Experience deploying models in low-latency production environments using frameworks such as vLLM, SGLang, or similar.

Responsibilities

  • Lead and execute the full post-training pipeline for large language models (LLMs), including supervised fine-tuning, preference optimization, and reinforcement learning–based methods.
  • Design and implement advanced training paradigms such as DPO (Direct Preference Optimization) and GRPO (Generalized Reward Policy Optimization).
  • Develop domain-specific data recipes, curation strategies, and augmentation pipelines to optimize task performance.
  • Conduct post-training of specialized small models from scratch, including architecture selection, dataset construction, and optimization strategy.
  • Build and refine Reward Models to support alignment and downstream optimization.
  • Design and implement RLAIF (Reinforcement Learning from AI Feedback) closed-loop systems.
  • Optimize inference efficiency and deploy models using low-latency serving frameworks such as vLLM and SGLang.
  • Evaluate model performance using both automated benchmarks and human/AI feedback loops.
  • Collaborate with research and infrastructure teams to productionize training and deployment workflows.

Benefits

  • Competitive total compensation package
  • L&D programs and Education subsidy for employees' growth and development
  • Various team building programs and company events
  • Wellness and meal allowances
  • Comprehensive healthcare schemes for employees and dependants
  • More that we love to tell you along the process!
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service