Help train large-language models (LLMs) to write production-grade code across a wide range of programming languages. You will compare & rank multiple code snippets, explaining which is best and why. Additionally, you will repair & refactor AI-generated code for correctness, efficiency, and style. Your role will involve injecting feedback (ratings, edits, test results) into the RLHF pipeline and keeping it running smoothly. The end result is that the model learns to propose, critique, and improve code the way you do. The RLHF process can be summarized as: Generate code ➜ expert engineers rank, edit, and justify ➜ convert that feedback into reward signals ➜ reinforcement learning tunes the model toward code you’d actually ship.