Senior Research Engineer

Microsoft•Redmond, WA

About The Position

As a Senior Research Engineer at Microsoft, you will advance Microsoft’s mission to empower every person and every organization to achieve more. You will help build and integrate cutting-edge AI into Microsoft products and services within the Business & Industry Copilot (BIC) group, ensuring solutions are inclusive, ethical, and impactful. This role blends applied research, machine learning engineering, and product innovation. You will lead efforts to ship reliable, production-grade AI systems across the stack, from model development and fine-tuning to performance optimization and deployment. Mission and Impact We are in an era of unprecedented AI innovation. As Microsoft leads the way in foundation models, multimodal systems, and AI agents, our goal is to build an open architecture platform where users can interact with tailored AI agents that drive tangible, real-world outcomes. As a Senior Research Engineer, you will: Bridge the gap between state-of-the-art research and customer-facing features Drive systems-level innovation across models, infrastructure, and deployment Champion responsible AI by embedding fairness, safety, privacy, and performance from the ground up Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Requirements

Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research)
OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research)
OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research)
OR equivalent experience.
Other Requirements: Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.

Nice To Haves

Master’s degree and 3 or more years in applied ML or AI research and product engineering,
OR PhD in a relevant field and 2 or more years with generative AI, LLMs, or related ML algorithms.
Experience across the product lifecycle from ideation to shipping.
Proficiency in Python and at least one deep learning framework such as PyTorch, JAX, or TensorFlow
Experience deploying Fine Tuned LLMs or multimodal models in live production environments
Experience shipping and maintaining production AI systems
Experience with Microsoft’s LLMOps stack: Azure AI Foundry, Azure Machine Learning, Semantic Kernel, Azure OpenAI Service, and Azure AI Search for vector/RAG.
Familiarity with responsible AI evaluation frameworks and bias mitigation methods.

Responsibilities

Bringing State-of-the-Art Research to Products Design and implement AI systems using foundation models, prompt engineering, retrieval-augmented generation, multi-agent architectures, and classic ML
Fine-tune large language models on domain-specific data and evaluate via offline and online methods such as A/B testing, telemetry, and shadow deployments
Build and harden prototypes into production-ready services using robust software engineering and MLOps practices
Drive original research and thought leadership (whitepapers, internal notes, patents); convert insights into shipped capabilities
Research Translation: Continuously review emerging work; identify high-potential methods and adapt them to Microsoft problem spaces
End-to-End System Development ML Design & Architecture: Own end-to-end pipeline from data prep, training, evaluation, deployment, and feedback loops
Identify and resolve model quality gaps, latency issues, and scale bottlenecks using PyTorch, or TensorFlow
Operate CI/CD and MLOps workflows including model versioning, retraining, evaluation, and monitoring
Integrate AI components into Microsoft products in close partnership with engineering and product teams
Data-Driven Innovation Evaluation & Instrumentation: Build robust offline/online evals, experimentation frameworks, and telemetry for model/system performance.
Learning Loop Creation: Operationalize continuous learning from user feedback and system signals; close the loop from experimentation to deployment.
Experimentation & E2E Validation: Design controlled experiments, analyze results, and drive product/model decisions with data.
Develop proofs of concept that validate ideas quickly at realistic scales
Curate high-signal datasets, including synthetic and red-team corpora, and establish labeling protocols and data quality checks tied to evaluation KPIs
Cross-Functional Collaboration Partner with software engineers, scientists, designers, and product managers to deliver high-impact AI features
Translate research breakthroughs into scalable applications aligned with product priorities
Communicate findings and decisions through internal forums, demos, and documentation
Responsible AI & Ethics Identify and mitigate risks related to fairness, privacy, safety, security, hallucination, and data leakage
Uphold Microsoft’s Responsible AI principles throughout the lifecycle
Contribute to internal policies, auditing practices, and tools for responsible AI
Operating Altitudes Paper level (ideas and math): Read, critique, and adapt the latest research; identify gaps; design methods with clear trade-offs and guarantees; communicate complex ideas clearly.
Example: “This objective is brittle under our data regime. Here is a tighter analysis and a revised loss we can test this sprint.”
Code level (implementation): Turn ideas into robust, tested, maintainable modules; integrate with CI/CD; profile and optimize for latency and throughput.
Example: “Refactored the prototype into a reusable PyTorch component, added unit tests and benchmarks, and cut P95 inference latency by 30%.”
Specialty Technical Areas Large-scale training and fine-tuning of LLMs, vision-language, or multimodal models
Multi-agent systems, dialogue agents, and copilots
Optimization of inference speed, accuracy, reliability, and cost in production
Retrieval systems and hybrid architectures using RAG and vector databases
ML for real-world data constraints such as missing data, noisy labels, and class imbalance