Research Engineer (Focused on RL)

Firecrawl•San Francisco, UT

2d•$180,000 - $270,000•Remote

About The Position

You'll bring reinforcement learning to Firecrawl's core product — building the training infrastructure, reward pipelines, and fine-tuning systems that make our models meaningfully better at extracting, understanding, and structuring web data. This isn't theoretical RL research. You'll build your own training infra, run fast experiments, ship models to production, and bridge the gap between classical RL approaches and modern LLM agent systems. If you care as much about training throughput as you do about reward design, this is the role. About Firecrawl Firecrawl is the easiest way to extract data from the web. Developers use us to reliably convert URLs into LLM-ready markdown or structured data with a single API call. In just a year, we've hit 8 figures in ARR and 90k+ GitHub stars by building the fastest way for developers to get LLM-ready data. We're a small, fast-moving, technical team building essential infrastructure super-intelligence will use to gather data on the web. We ship fast and deep.

Requirements

Someone who builds their own training infra and reward pipelines. You don't wait for an ML platform team to set things up. You build the training loops, reward models, data pipelines, and evaluation frameworks yourself — because you understand that the infra choices directly affect the quality of the results. You've operated GPU clusters, managed training runs, and debugged convergence issues in production.
Can fine-tune models to achieve SOTA results. You've taken models from baseline to best-in-class on tasks that matter. You understand the full fine-tuning lifecycle — data curation, training dynamics, hyperparameter sensitivity, evaluation methodology — and you have the taste to know when a model is actually good versus when the eval is flattering.
Bridges LLM agents and classical RL approaches. You're fluent in both worlds. You understand PPO, RLHF, reward modeling, and policy optimization — and you also understand how modern LLM agents work, where they fail, and how RL techniques can make them better. You see connections between these domains that most people miss.
Runs fast experiments and communicates clearly. You bias toward quick iterations over perfect setups. You'd rather run three rough experiments this week than one polished one next month. And when you have results, you communicate them clearly — to other researchers, to engineers, and to leadership. No one needs to decode your work to understand its impact.
Production-minded. You care about whether your models actually work in production, not just on benchmarks. You've deployed models that serve real traffic and you've made hard tradeoffs between model quality, latency, and cost.
3+ years in applied RL, ML engineering, or model training — with production systems
US Citizenship/Visa required for SF; N/A for Remote

Nice To Haves

RL engineers at AI labs or applied ML teams who've shipped models to production.
Researchers who've done RLHF or reward modeling for LLM systems.
ML engineers who've built training infrastructure at startups and cared as much about the pipeline as the model.
People who've worked at the intersection of RL and language models — whether in academic labs with a production bent or at companies building agent systems.

Responsibilities

Build training infrastructure and reward pipelines from scratch: Design and operate the systems that train and evaluate Firecrawl's models. You'll own the full loop — data collection, reward modeling, training runs, evaluation, and deployment. You build the infra yourself because you're the one who needs it to work.
Fine-tune models to achieve state-of-the-art results: Take foundation models and make them dramatically better at web data extraction, content understanding, and structured output generation. You know how to get from "decent fine-tune" to "best-in-class" and you have the patience and rigor to close that gap.
Bridge LLM agents and classical RL: The most interesting problems at Firecrawl sit at the intersection of modern LLM-based agents and classical RL techniques. You'll design reward signals for agent behaviors, apply RL methods to improve multi-step agent workflows, and figure out where traditional RL approaches outperform prompting — and vice versa.
Run fast experiments and iterate: You design experiments that test meaningful hypotheses, run them quickly, and make decisions based on results. You don't spend weeks on experiment infrastructure before getting a single result. Speed of iteration is a core part of how you work.
Communicate clearly to non-RL people: RL can be opaque. You translate your work into language that engineers, product people, and leadership can understand and act on. You know how to explain why a reward function matters without requiring everyone to read the paper.
Collaborate across the research team: Work closely with the Head of Research and the Search/IR-focused Research Engineer to connect RL improvements with search, ranking, and the broader product strategy.

Benefits

Salary that makes sense — $180,000–$270,000/year, based on impact, not tenure
Own a piece — Up to 0.15% equity in what you're helping build
Generous PTO — 15 days mandatory, anything after 24 days, just ask (holidays excluded); take the time you need to recharge
Parental leave — 12 weeks fully paid, for moms and dads
Wellness stipend — $100/month for the gym, therapy, massages, or whatever keeps you human
Learning & Development — Expense up to $1,000/year toward anything that helps you grow professionally
Team offsites — A change of scenery, minus the trust falls
Sabbatical — 3 paid months off after 4 years, do something fun and new
Full coverage, no red tape — Medical, dental, and vision (100% for employees, 50% for spouse/kids) — no weird loopholes, just care that works
Life & Disability insurance — Employer-paid short-term disability, long-term disability, and life insurance — coverage for life's curveballs
Supplemental options — Optional accident, critical illness, hospital indemnity, and voluntary life insurance for extra peace of mind
Doctegrity telehealth — Talk to a doctor from your couch
401(k) plan — Retirement might be a ways off, but future-you will thank you
Pre-tax benefits — Access to FSAs and commuter benefits (US-only) to help your wallet out a bit
Pet insurance — Because fur babies are family too
SF HQ perks — Snacks, drinks, team lunches, intense ping pong, and peak startup energy
E-Bike transportation — A loaner electric bike to get you around the city, on us