Principal Data Engineer

Microsoft•Redmond, WA

About The Position

Microsoft is on a mission to reinvent productivity with AI, and the Time + Places (T+P) organization powers the data foundation behind that transformation. We are seeking a Principal Data Engineer to architect and build the next generation of intelligence, analytics, and experimentation infrastructure that fuels Calendar AI, meeting productivity experiences, and the broader M365 ecosystem. As a Principal Data Engineer, you will design and operate large-scale data pipelines, feature stores, and experimentation systems that support applied science, data science, and engineering teams across Outlook, Teams, and Places. Your work will enable insights into how people manage time, how meetings work, and how AI can help users plan and prepare more effectively. You will lead the technical vision for data engineering across T+P—ensuring our datasets are trustworthy, our pipelines are reliable, our metrics are consistent, and our experimentation frameworks can scale to millions of users. This is a high-impact leadership role with visibility across M365 Copilot and the central AI ecosystem. If you enjoy solving complex data challenges, building scalable systems, and enabling AI-driven product innovation at massive scale, we’d love to meet you. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Requirements

Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, data modeling, or data engineering OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling, or data engineering OR equivalent experience.

Nice To Haves

Proficiency in software engineering fundamentals (Python, Scala, Java, or similar).
Experience building and maintaining large-scale ETL/ELT pipelines or distributed data processing systems.
Proficiency with SQL and working with big-data environments (Spark, Kusto, Hive, or similar frameworks).
Experience with cloud services such as Azure Data Lake, Data Factory, Synapse, Databricks, or equivalent.
Solid understanding of data modeling, schema design, and pipeline optimization.
Experience designing end-to-end data architectures for AI/ML workflows, including training data generation, evaluation datasets, and feature pipelines.
Experience with real-time or near–real-time data systems, streaming, or event-driven architectures (EventHub, Kafka, etc.).
Experience supporting experimentation frameworks and telemetry-based metrics.
Familiarity with Responsible AI, data privacy, GDPR, or enterprise compliance considerations.
Demonstrated ability to influence cross-functional teams and drive large-scale technical initiatives.
Experience mentoring engineers and leading technical design discussions at an org-wide level.

Responsibilities

Architect and build large-scale data pipelines across meetings, calendar actions, user behaviors, Teams signals, and Copilot interactions to unlock analytics, AI modeling, and product insights.
Design and maintain unified data models and feature stores that support personalization, meeting prep intelligence, time insights, and agentic AI features.
Ensure high data quality and reliability through robust validation, monitoring, lineage tracking, schema governance, and automated anomaly detection.
Build and operate experimentation and metric infrastructure that supports A/B testing, quasi-experiments, telemetry analysis, and longitudinal user studies.
Partner with applied scientists and model engineers to provide high-quality datasets for post-training, RLHF, evaluation, and model diagnostics.
Develop scalable, efficient data pipelines using technologies such as Spark, Kusto, Cosmos, Synapse, Data Factory, Delta Lake, or similar distributed systems.
Collaborate cross-functionally with PM, Applied Science, and Engineering to translate business needs into data pipelines and scalable production systems.
Drive engineering excellence through code reviews, documentation, automation, observability, and best practices in security, compliance, and privacy.
Provide technical leadership by mentoring engineers, setting architectural direction, and ensuring long-term maintainability of core data systems.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume