Monitoring & Automation Engineer

LingaTechHarrisburg, PA
1dHybrid

About The Position

This role serves as a subject matter expert responsible for enterprise monitoring, observability, and automation initiatives that improve operational visibility, service reliability, and incident response. The position focuses on modernizing monitoring processes through automation, standardized workflows, and strong IT service management practices across hybrid infrastructure environments.

Requirements

  • 5 years of experience in IT infrastructure monitoring, automation, and observability within hybrid environments.
  • Strong proficiency in PowerShell and at least one additional scripting language such as Python, SQL, or Bash.
  • Hands-on experience using Azure Monitor, Log Analytics, Ansible, SQL, and KQL for monitoring and analytics.
  • Experience implementing automation solutions using Azure Automation and CI/CD pipelines.
  • Expertise working with enterprise monitoring platforms such as SCOM, SquaredUp, or equivalent tools including Dynatrace, Datadog, or Splunk.
  • Knowledge of API integrations and secure authentication methods.
  • Experience utilizing ServiceNow or similar IT Service Management (ITSM) platforms.

Nice To Haves

  • Microsoft Certified: Azure Administrator Associate or Azure Solutions Architect Expert certification.
  • ITIL 4 Foundation certification or higher.

Responsibilities

  • Drive process and tooling improvements by identifying operational gaps and implementing automation-first solutions to reduce manual effort and enhance service quality.
  • Maintain endpoint monitoring connectivity by managing telemetry ingestion through agents, SNMP, WMI, APIs, and secure credential and certificate administration.
  • Develop, maintain, and organize documentation including runbooks, SOPs, service maps, and workflows within version-controlled repositories.
  • Document incidents and problems using monitoring and observability data, produce post-incident reviews, and maintain a Known Error Database.
  • Collaborate with change, incident, and problem management teams to ensure standardized processes, risk assessments, and communication plans are followed.
  • Monitor resolution performance by tracking SLAs, MTTR, and root cause analysis effectiveness while ensuring corrective actions are validated.
  • Implement standardized communication workflows for operational events and manage stakeholder notifications and self-service subscription options.
  • Ensure alignment with enterprise IT policies by recommending improvements that enhance reliability, security, and cost efficiency.
  • Utilize ServiceNow to create and manage Requests for Change, link risk assessments, and verify post-change monitoring health.
  • Produce SLA reporting and operational metrics related to availability, incidents, and service improvements.
  • Design, test, and maintain disaster recovery plans, including defining RTO/RPO targets and conducting periodic recovery exercises.
  • Maintain technical expertise by staying current on emerging monitoring technologies, tools, and industry best practices.
  • Support continuity operations during critical incidents, including performing assigned duties at alternate operational sites when required.
  • Adhere to ITIL-aligned service management processes and contribute to process maturity initiatives and compliance audits.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service