Principal Software Engineer

MicrosoftRedmond, WA
1d

About The Position

We are looking for a Principal Software Engineer to lead the design and development of next-generation agent architectures, model deployment systems, and training infrastructure for large-scale AI systems. In this role, you will partner closely with applied scientists, product teams, and platform engineers to build robust, scalable, and production-grade systems that power intelligent, agentic experiences. You will play a critical role in shaping how large language models are trained, deployed, and orchestrated to deliver real-world impact. This is a high-impact, cross-functional role requiring deep technical expertise, strong system design skills, and the ability to drive end-to-end execution across modeling and infrastructure. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.

Nice To Haves

  • 8+ years of experience in software engineering, with a strong focus on distributed systems and large-scale infrastructure
  • Proven experience designing and building production-grade systems for ML/AI or data platforms
  • Solid programming skills in languages such as Python, C++, Java, or similar
  • Experience with model serving, distributed training systems, or large-scale data pipelines
  • Deep understanding of system design, scalability, and reliability principles
  • Ability to work across disciplines and drive execution in ambiguous, fast-moving environments

Responsibilities

  • Lead agent architecture design for LLM-based systems, including multi-agent orchestration, tool use, and planning frameworks
  • Own model deployment infrastructure, enabling reliable, scalable, and low-latency serving of large models across diverse scenarios
  • Drive improvements in model training infrastructure, including data pipelines, training workflows, and evaluation systems
  • Partner with applied scientists to bridge modeling and production, ensuring efficient iteration from research to deployment
  • Design and implement end-to-end systems spanning retrieval, reasoning, execution, and feedback loops
  • Optimize systems for latency, cost, reliability, and quality at scale
  • Establish best practices for experimentation, evaluation, and monitoring of AI systems in production
  • Mentor engineers and contribute to technical strategy and roadmap for AI platform and agent systems
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service