Senior Principal Software Engineer — IAG Platform / Reliability & DevOps Engineering

Oracle

19h

About The Position

We are seeking a Senior Principal Software Engineer to provide senior technical leadership for our Identity & Access Governance (IAG) services—initially focused on IAG and evolving into broader cross-organization technical leadership. This is a Software Engineer role first, with strong DevOps/SRE-grade capabilities. You build robust software systems, deliver major features into production, and take full ownership of reliability, operability, and secure-by-default engineering. You have deep experience building and operating distributed cloud services and understand control plane architecture, service-to-service communication, and production-grade operational design. You will drive the design of major service components, partner closely with Engineering Managers, Architects, and TPMs, and provide direct technical guidance to engineers across levels. You are equally comfortable writing architecture documentation and leading peer reviews as you are prototyping, writing production code, reviewing pull requests, improving build/deploy pipelines, and leading incident response when needed. You balance speed and quality through iteration and leave systems—and teams—meaningfully better through automation, instrumentation, and clear engineering standards.

Requirements

BS in Computer Science or related field (MS preferred), or equivalent practical experience
10+ years of software development experience building and operating distributed services in production
Strong proficiency in one or more modern programming languages (e.g., Java, Go, C++, Python) with a proven record of shipping production code
Proven ability to lead design and delivery of major service capabilities from concept through launch and sustained operations
Deep understanding of distributed systems fundamentals (data structures/algorithms, networking, concurrency, failure modes)
Strong knowledge of cloud architecture patterns, including control plane and service-to-service operational design
Demonstrated experience building DevOps capabilities: CI/CD pipelines, automated testing, deployment automation, and infrastructure-as-code
Strong production debugging skills across networking and persistence layers; understanding of databases and distributed persistence (SQL/NoSQL, replication, consistency tradeoffs)
Demonstrated experience leading high-severity incident response as a technical lead/escalation engineer, including rapid diagnosis, mitigation, and post-incident corrective actions
Strong Linux knowledge (or demonstrated ability to learn quickly in Linux-based production environments)
Experience partnering closely with Architects, Principals, Engineering Managers, Product, and Program/TPM leaders to deliver outcomes on time and with high quality

Nice To Haves

Hands-on experience developing and operating services on a public cloud platform (OCI strongly preferred; AWS/Azure also valuable)
Experience with container orchestration and cloud-native patterns (e.g., Kubernetes/OKE or equivalent), service mesh/API gateways, and modern identity/security patterns
Experience operating services across multi-AD/multi-AZ and/or multi-region footprints; strong understanding of regional resiliency strategies
Track record driving reliability programs such as SLO adoption, error budgets, production readiness reviews, game days, and resilience testing
Experience building mature CI/CD pipelines with robust testing and safe deployment strategies (canary/blue-green/progressive delivery)
Experience in regulated/compliance environments (e.g., FedRAMP, PCI DSS, or similar) and supporting audit requirements with strong operational controls
Expertise applying threat modeling or other risk identification techniques and translating findings into practical engineering changes
Ability to obtain and maintain a U.S. Government security clearance (or currently cleared) strongly preferred, for work in regulated environments where applicable+

Responsibilities

Lead the architecture and implementation of major capabilities across IAG services and critical platform dependencies, building software that is scalable, secure, and operationally excellent.
Set technical direction on reliability patterns, service maturity, and delivery standards, including SLIs/SLOs, error budgets, safe rollout strategies, backward-compatible changes, operational readiness expectations, and clear ownership boundaries between services.
Improve the end-to-end developer-to-production lifecycle by building and evolving: CI/CD pipelines Automated testing and validation Infrastructure-as-code patterns Deployment strategies (canary and progressive delivery)
Drive observability by design (metrics, logs, traces) and improve alerting quality, runbooks, and on-call effectiveness by reducing toil and ensuring teams have the right signals and tools to operate what they build.
Serve as a technical escalation resource and first responder for emergent operational work. For high-severity or technically complex production issues, lead real-time triage, mitigation, and coordination through stabilization.
Drive root cause analysis and durable remediation—turning incidents into engineering outcomes through fixes, automation, and a reliability backlog that measurably reduces recurrence.
Mentor and enable development teams by helping design operable systems, bootstrap new services, and raise the engineering bar through strong code reviews, reference implementations, and practical coaching.
Support security and compliance needs, including threat modeling, security reviews, and operational controls/audit readiness for regulated environments.