Fluency is looking for a Research Engineer to design experiments, build evaluation infrastructure, and drive model quality for our process conformance, productivity measurement, and AI impact analysis across Fortune 500 organisations. The Problem Space You'll be developing the methodology and systems that determine whether our models actually work. Screenshots, OCR text, application metadata, behavioural signals: the inputs are messy and the ground truth is ambiguous. The challenge is building rigorous evaluation frameworks that quantify model performance and identify improvement opportunities. This means: Designing evaluation pipelines that measure accuracy, precision, and recall across classification tasks Building ground truth datasets from ambiguous, real-world enterprise data Running systematic prompt engineering experiments to optimise LLM performance Developing A/B testing infrastructure for model comparison Researching novel approaches to process understanding, activity classification, and intent extraction Quantifying cost-accuracy tradeoffs across different model architectures and prompting strategies The playbook doesn't exist. You'll write it. You'll work directly with founders and our engineering team on technical challenges that span LLM evaluation, experimental design, and applied research.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed