About The Position

CoreAI sits at the center of Microsoft’s mission to redefine how software is built and experienced, providing the foundational platforms, services, and developer experiences that power the next generation of AI-driven applications. As part of CoreAI, the Experimentation Platform (ExP) enables trustworthy, high-scale online experimentation that accelerates product learning and drives progress across Microsoft’s AI ecosystem. You will play a key role in helping teams ship better AI experiences faster by providing the experimentation capabilities needed to evaluate, refine, and safely deploy new innovations. In this role, you will own and drive the development of critical components in one of the highest-scale experimentation platforms - infrastructure that enables rapid iteration in AI systems and product features. You will design and build services that empower engineers and scientists across the company to measure impact, validate hypotheses, and advance state-of-the-art AI capabilities through rigorous experimentation. This is a unique opportunity to build systems at scale while deepening your expertise in distributed systems, service reliability, and experimentation methodologies. You will thrive in this role if you enjoy driving technical excellence in distributed systems, mentoring engineers, and building reliable infrastructure that accelerates Microsoft’s progress in AI. Microsoft’s mission is to empower every person and every organization on the planet to achieve more, and we’re dedicated to this mission across every aspect of our company. Our culture is centered on embracing a growth mindset and encouraging teams and leaders to bring their best each day. Join us and help shape the future of the world.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Nice To Haves

  • Master’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Experience building and operating large-scale distributed systems on cloud platforms (Azure, AWS, GCP), including design, deployment, monitoring, and troubleshooting of production workloads.
  • Experience designing and implementing service architectures with strong emphasis on scalability, reliability, fault tolerance, and cost optimization.
  • Experience using observability tools (logging, metrics, distributed tracing) to diagnose complex service issues and drive systemic reliability improvements.
  • Experience mentoring engineers, driving code reviews, and raising engineering best practices within a team.
  • Familiarity with experimentation platforms, A/B testing methodologies, and statistical analysis of product metrics.
  • Experience with AI-assisted development workflows and responsible use of AI coding tools in production environments.

Responsibilities

  • Independently leverage AI tools and practices across the software development lifecycle (SDLC), taking responsibility for AI-generated assets and coaching team members to adopt responsible AI-assisted development practices.
  • Lead by example to produce extensible, maintainable, well-tested, secure, and performant code; apply metrics to drive code quality and stability, and continuously improve code performance, testability, and cost-effectiveness across the team.
  • Own and drive the architecture and design of product components, creating design specifications, and ensuring system architecture meets performance, scalability, resiliency, and disaster recovery requirements with minimal technical oversight.
  • Collaborate with partner teams, PMs, and subject matter experts (privacy, security, SRE) to determine customer requirements, incorporate feedback, and deliver scalable, reliable features with proper end-to-end testing.
  • Drive engineering excellence through automation, tooling improvements, security best practices, and deployment infrastructure.
  • Maintain operations of live site services on a rotational on-call basis, implement solutions to complex live-site issues, conduct and present incident postmortems, and proactively improve troubleshooting guides, telemetry, and monitoring to reduce incident volume.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service