The AI team is a hands-on applied AI group at Weights & Biases that turns frontier research into teachable workflows. We collaborate with leading enterprises and the OSS community. We are the team that took W&B from a few hundred users to millions of users and one of the most beloved tools in the ML community. A senior applied role at the research-to-product boundary. You will design, implement, and evaluate LLM applications and agents with cutting-edge techniques from the latest research, then document and teach them to our community and customers. The focus is application, not novel research: rapid prototyping, careful evaluation, and production-grade reference implementations with clear trade-offs. We prioritize responsible, safe deployment and reproducibility. About the role: Ship end-to-end GenAI workflows (prompting → RAG → tools/agents → eval → serve) with reproducible repos, W&B Reports, and dashboards others can run. Build agentic systems (tool use, function calling, multi-step planners) with MCP servers/clients and secure tool/resource integrations. Design evaluation harnesses (RAG/agent evals, golden sets, regression tests, telemetry) and drive continuous improvement via offline + online metrics. Build in public: Publish engineering artifacts (code, docs, talks, tutorials) and engage with OSS and customer engineers; turn repeated patterns into reusable templates. Partner with product/solutions to launch LLM-powered features with clear latency/cost/SLO targets and safety/guardrail checks. Run growth experiments to track the usage of the Weights & Biases suite of products from the artifacts built.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed