About The Position

As a Service Engineer on the M365 AGC team, you will play a critical role in ensuring the reliability, security, and operational excellence of Microsoft 365 services in air-gapped and sovereign cloud environments. You will develop deep expertise in service and system design, understand dependencies at scale, and coordinate across multiple work streams to drive incident response, mitigation, and continuous improvement. You will: · Provide advanced technical expertise and operational support for M365 workloads in highly secure environments. · Lead and coordinate incident response, root cause analysis, and service restoration efforts. · Collaborate with engineering, operations, and security teams to drive improvements in service health, telemetry, and automation. · Influence and implement best practices for operational excellence, compliance, and customer satisfaction. · Communicate impact and status to stakeholders, leadership, and customers with clarity and accountability. This role empowers you to make a direct impact on the resilience and trustworthiness of Microsoft’s most secure cloud offerings, supporting customers who depend on us for their most critical missions. Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Requirements

  • Bachelor's Degree in Computer Science, Information Technology, Mechanical Engineering, Electrical Engineering, Aerospace Engineering, Data Science, Cybersecurity, or related field AND 2+ years technical experience in software engineering, network engineering, service engineering, systems engineering, or industrial controls OR equivalent experience.
  • Hands-on experience supporting complex IT environments, with a strong understanding of system and service management challenges.
  • Experience operating in large distributed or air-gapped environments, with a focus on reliability, security, and compliance.
  • Ability to build consensus and influence across teams to achieve common goals.
  • Recent experience with Azure or equivalent hyperscale cloud technologies.
  • U.S. citizenship verification is required due to legal and customer requirements for this role.
  • Ability to meet Microsoft, customer, and/or government security screening requirements, including an active Top-Secret clearance with CI or FSP (if CI, willingness to upgrade to FSP) and successful completion of the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Nice To Haves

  • Master's Degree in Computer Science, Information Technology, Mechanical Engineering, Electrical Engineering, Aerospace Engineering, Data Science, Cybersecurity, or related field AND 3+ years technical experience in software engineering, network engineering, service engineering, systems engineering, or industrial controls OR Bachelor's Degree in Computer Science, Information Technology, Mechanical Engineering, Electrical Engineering, Aerospace Engineering, Data Science, Cybersecurity, or related field AND 5+ years technical experience in software engineering, network engineering, service engineering, systems engineering, or industrial controls OR equivalent experience.
  • 1+ year(s) technical experience working with large-scale cloud or distributed systems.

Responsibilities

  • Operate and support Microsoft 365 services in air-gapped and sovereign cloud environments, ensuring high availability and compliance with strict security standards.
  • Responds to incidents during regular on-call rotations by identifying the level of impact, troubleshooting incidents, and deploying appropriate fixes to resolve root cause(s). Shares details related to incidents and their resolution through postmortem reports.
  • Monitors and acts on telemetry data. Performs analyses to identify patterns that reveal errors that are affecting the system's availability, reliability and performance with minimal guidance. Develops scripting and/or automation used in monitoring based on observations and experience.
  • Implements reliable, scalable, and high-performance solutions across teams. Owns implementation and rollback plans. Quantifies and ensures the health and compliance of a service with minimal guidance.
  • Partner with engineering, program management, and security teams to deliver new features and maintain service parity with public cloud offerings.
  • Engage with internal and external stakeholders to communicate service status, risks, and mitigation plans.
  • Continuously assess and improve operational posture, adopting best practices for service engineering in regulated environments.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service