Member of Technical Staff - Data Platform

MicrosoftRedmond, WA
21hHybrid

About The Position

If you are excited by the challenge of designing distributed systems that process petabytes of data for the world's most advanced AI models, this is your team. We are not looking for someone to just write queries or maintain legacy pipelines. We are looking for Systems Builders—engineers who understand the internals of distributed compute, who treat data infrastructure as a product, and who want to architect the backbone of Microsoft Copilot. Join us to build the "Paved Road" for AI. You will own the platform that transforms raw, massive-scale signals into the fuel that powers training, inference, and evaluation for millions of users. We need someone who is energized by solving hard problems in stream processing, lakehouse architecture, and developer experience. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. By applying to this U.S. Mountain View, CA, Redmond WA position, you are required to be local to the San Francisco area , Redmond area and in office 3 days a week. Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction.

Requirements

  • Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 3+ years experience in business analytics, data science, software development, data modeling, or data engineering OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, data modeling, or data engineering OR equivalent experience.

Nice To Haves

  • Bachelor's or Master's Degree in Computer Science, Software Engineering, or related technical field.
  • 4+ years of experience in Software Engineering or Data Infrastructure.
  • Proficiency in Python, Scala, Java, or Go. You write production-grade application code with unit tests, CI/CD, and modular design.
  • Deep Distributed Systems Knowledge: Demonstrated technical understanding of massive-scale compute engines (e.g., Apache Spark, Flink, Ray, Trino, or Snowflake). You should understand internals like query planning, memory management, and distributed consistency.
  • Experience architecting Lakehouse environments at scale (using Delta Lake, Iceberg, or Hudi).
  • Experience building internal developer platforms or "Data-as-a-Service" APIs.
  • Strong background in streaming technologies (Kafka, Azure EventHubs, Pulsar) and stateful stream processing.
  • Experience with container orchestration (Kubernetes) for deploying data applications.
  • Experience enabling AI/ML workloads (Feature Stores, Vector Databases).

Responsibilities

  • Core Platform Engineering: Design and build the underlying frameworks (based on Spark/Databricks) that allow internal teams to process massive datasets efficiently, abstracting away the complexity of "ETL" into self-service infrastructure.
  • Distributed Systems Architecture: Modernize our data stack by moving from batch-heavy patterns to event-driven architectures, utilizing modern streaming architecture to reduce latency for AI inference.
  • Unstructured AI Data Pipelines: Architect high-throughput pipelines capable of processing complex, non-tabular data (documents, code repositories, chat logs) for LLM pre-training, fine-tuning and evaluations datasets.
  • AI Feedback Loops: Engineer the high-throughput telemetry systems that capture user interactions with Copilot, creating the critical data loops required for Reinforcement Learning and model evaluation.
  • Infrastructure as Code: Treat the data platform as software. Define and deploy all storage, compute, and networking resources using IaC (Bicep/Terraform) rather than manual configuration.
  • Data Reliability Engineering: Move beyond simple "validation checks" to build automated governance and observability systems that detect anomalies in the data mesh before they impact downstream models.
  • Compute Optimization: Deep-dive into query execution plans and cluster performance. Optimize shuffle operations, partition strategies, and resource allocation to ensure our platform is as cost-efficient as it is fast.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service