Observability Operations Engineer

Technologent•Phoenix, AZ

About The Position

We are looking for a Senior Systems Engineer – Observability & Infrastructure to support Linux-based infrastructure and large-scale containerized environments within an enterprise technology ecosystem. This role focuses on platform stability, Elasticsearch administration, Kubernetes operations, and observability maturity across distributed systems. The ideal candidate brings deep systems administration expertise, strong troubleshooting capabilities, and experience managing high-availability environments at scale.

Requirements

Deep knowledge of Linux systems administration
Strong hands-on experience with Docker and Kubernetes in production environments
Experience administering Elasticsearch in enterprise-scale environments
Strong troubleshooting and root cause analysis skills across distributed systems
Solid understanding of networking fundamentals (TCP/IP, DNS, routing, load balancing, firewalls)
Experience supporting ITSM processes and infrastructure lifecycle management

Nice To Haves

Familiarity with observability concepts such as distributed tracing, metrics, monitoring, and logging
Experience managing large-scale Elasticsearch deployments
Knowledge of OpenTelemetry / OpenTracing
Hands-on experience with observability and monitoring tools such as: Jaeger Kibana Grafana Prometheus Splunk Dynatrace Kafka
Experience with Rancher or similar Kubernetes management platforms

Responsibilities

Manage and support Linux-based infrastructure and containerized environments (Docker, Kubernetes)
Administer, scale, and optimize large-scale Elasticsearch clusters , including performance tuning and troubleshooting
Provide end-to-end system administration support across development, staging, and production environments
Perform deep-dive troubleshooting across infrastructure, networking, and observability components
Support ITSM processes, including incident, change, and problem management
Manage hardware and software lifecycle activities
Ensure platform stability, high availability, and performance optimization
Collaborate with platform engineering and SRE teams to enhance observability capabilities
Support deployment, upgrades, and operational governance of monitoring and logging tools
Contribute to automation and continuous operational improvements