Senior Observability Engineer

LeidosAlexandria, VA
2d

About The Position

This Department of War enterprise data and analytics program delivers mission-critical capabilities that enable leaders across the Department to make faster, better-informed decisions using trusted data at scale. Leidos Digital Modernization sector is seeking an experienced Senior Observability Engineer to support the delivery, enhancement, and adoption of enterprise data and analytics products used across multiple DoD organizations. In this role, you will work alongside government partners, engineers, and other industry teammates to translate operational and strategic requirements into scalable, production-ready solutions. You will contribute directly to product planning, execution, and continuous improvement—helping ensure capabilities are delivered efficiently, aligned to mission priorities, and positioned for sustained success. This position offers the opportunity to work on a high-visibility, enterprise program at the intersection of data, analytics, and emerging AI technologies. Ideal candidates are motivated by mission impact, comfortable operating in complex stakeholder environments, and interested in building deep domain expertise while delivering capabilities with real-world national security outcomes.

Requirements

  • Active Top Secret (TS) clearance with SCI eligibility.
  • Bachelor’s degree in Computer Science, Engineering, Information Systems, or related technical discipline and 8–12 years of relevant experience OR Master’s degree in a related field and 6–10 years of relevant experience.
  • Minimum of 8 years of experience in software engineering, systems engineering, or a related field.
  • Experience in at least 4 of the following: Advanced analytical approaches and large-scale predictive analytics. DevSecOps practices and automated pipelines. Configuration management operations and tools. Implementing enterprise observability solutions (e.g., Splunk, Grafana, Prometheus, ELK/Elastic Stack, Datadog, or similar platforms). Designing logging, metrics, and distributed tracing architectures for cloud-native systems. Supporting containerized environments (e.g., Kubernetes, Docker) and microservices architectures. Conducting root cause analysis and performance optimization in production environments.
  • Proficiency in Agile software development methodologies
  • Ability to design and implement software testing tools and procedures.
  • Strong understanding of software development, integration, and testing processes.
  • Strong problem-solving skills

Nice To Haves

  • Active TS/SCI clearance.
  • SAFe Agilist (SA) or related SAFe certification.
  • Experience operating within SAFe or large-scale Agile frameworks supporting enterprise systems.
  • Experience with DoD Security Technical Implementation Guides (STIGs) and container hardening standards.
  • Familiarity with Infrastructure as Code (IaC) and Configuration as Code (CaC) practices.
  • Knowledge of open standards systems and their application in reducing operating costs and improving network performance.
  • Experience supporting observability across multi-enclave DoD cloud environments.
  • Experience implementing automated anomaly detection and predictive monitoring.
  • Experience defining and tracking SLOs, SLAs, and reliability metrics.
  • Experience supporting enterprise-scale data, analytics, or AI platforms.
  • Experience integrating observability with Zero Trust monitoring and continuous compliance frameworks.
  • Strong communication skills and ability to work collaboratively with cross-functional teams.

Responsibilities

  • Design and implement advanced analytical approaches to monitor the performance and health of the architecture.
  • Identify patterns and conduct large-scale predictive analytics to anticipate and mitigate potential issues.
  • Assist in incident response, conducting root cause analysis, and prescribing solutions to ensure system reliability.
  • Develop and maintain a Software Engineering Plan (SWP) for managing all aspects of the System software lifecycle.
  • Establish and operate a SAFE Agile development process, prioritizing product backlog items and scheduling them into Agile sprint cycles.
  • Maintain and operate the DevSecOps factory, including automated DevSecOps pipelines for software development, integration, testing, authorization, and provisioning.
  • Perform software development and integration for System software, including infrastructure, data tools, customer tools, cybersecurity, and user interfaces.
  • Translate systems engineering and reference architecture designs into software designs, considering business requirements and compliance standards.
  • Plan, design, and implement software testing tools, processes, and procedures, including automated test pipelines for unit, regression, security, integration, compliance, performance, and acceptance testing.
  • Conduct configuration management operations to control, identify, record, and report IT components, versions, and relationships for the System.
  • Establish and maintain development, test, integration, staging, and production environments for supporting the System software lifecycle.
  • Deliver and maintain all software artifacts on Government-owned repositories, including source code, Infrastructure as Code (IaC), Configuration as Code (CaC), and software executables.

Benefits

  • Employment benefits include competitive compensation, Health and Wellness programs, Income Protection, Paid Leave and Retirement.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service