About The Position

Overview: CoreAI sits at the center of Microsoft’s mission to redefine how software is built and experienced, providing the foundational platforms, services, and developer experiences that power the next generation of AI-driven applications. As part of CoreAI, the Experimentation Platform (ExP) enables trustworthy, high-scale online experimentation that accelerates product learning and drives progress across Microsoft’s AI ecosystem. You will play a pivotal role in shaping the technical direction of systems that help teams ship better AI experiences faster by providing the experimentation capabilities needed to evaluate, refine, and safely deploy new innovations. In this role, you will lead the architecture and development of one of the highest-scale experimentation platforms - critical infrastructure that enables rapid iteration in AI systems and product features across Microsoft. You will drive the technical vision for services that empower engineers and scientists across the company to measure impact, validate hypotheses, and advance state-of-the-art AI capabilities through rigorous experimentation. This is an opportunity to lead complex, cross-team technical initiatives while shaping the future of distributed systems architecture, service reliability, and experimentation methodologies at Microsoft scale. You will thrive in this role if you are a technical leader who enjoys driving architecture decisions across teams, mentoring senior engineers, and building the reliable infrastructure foundations that accelerate Microsoft’s progress in AI. Microsoft’s mission is to empower every person and every organization on the planet to achieve more, and we’re dedicated to this mission across every aspect of our company. Our culture is centered on embracing a growth mindset and encouraging teams and leaders to bring their best each day. Join us and help shape the future of the world.

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Nice To Haves

  • Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Extensive experience architecting and operating large-scale distributed systems on cloud platforms (Azure, AWS, GCP), with demonstrated ownership of critical production infrastructure serving millions of users. · Track record of designing highly scalable, resilient service architectures with strong emphasis on fault tolerance, disaster recovery, and cost optimization at scale. · Deep experience using observability tools (logging, metrics, distributed tracing) to diagnose complex cross-service issues and drive systemic reliability improvements across multiple products. · Proven experience mentoring senior engineers, driving technical direction, conducting design reviews, and raising the engineering bar across teams.
  • Experience with experimentation platforms, A/B testing at scale, and statistical methodologies for measuring product impact and driving data-informed ship decisions. · Experience leading security hardening efforts, threat modeling, and incident response processes for production systems.
  • Experience championing AI-assisted development workflows and establishing responsible AI coding practices across engineering teams.

Responsibilities

  • Champion and improve AI tools and practices across the software development lifecycle (SDLC), incorporating appropriate controls over AI-generated assets.
  • Lead by example across teams to produce extensible, maintainable, well-tested, secure, and performant code; identify and establish coding best practices, create and apply metrics to drive code quality and stability, and mentor engineers to continuously raise the engineering bar.
  • Own and lead the architecture of complex product solutions, driving design discussions, evaluating new technologies to solve problems, and ensuring system architecture meets performance, scalability, resiliency and disaster recovery requirements.
  • Lead cross-team collaboration to identify dependencies, negotiate delivery schedules, drive alignment across partner teams, and ensure proper end-to-end testing, live-site coverage, scalability and performance before going live.
  • Drive engineering excellence across products; lead efforts targeting zero-touch deployment, production reliability, and security hardening for both protections and detections.
  • Hold accountability as a designated responsible individual (DRI) across products and solutions, mentor engineers on live-site operations, lead incident retrospectives that drive systemic
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service