Technical Production Support Specialist

Midwest TapeHolland, OH
4dHybrid

About The Position

The Technical Production Support Specialist is responsible for monitoring, triaging, and supporting production systems to ensure availability, stability, and timely incident response. This role focuses on real-time system observation, alert handling, and execution of established operational procedures using defined tooling and runbooks. The specialist acts as a frontline support function within the QA and Production Support organization and participates in a rotational off hours monitoring schedule.

Requirements

  • Strong attention to detail and situational awareness.
  • Ability to follow documented procedures consistently and accurately.
  • Clear written and verbal communication skills.
  • Demonstrated proficiency using AI tools to enhance workflow efficiency.
  • Ability to remain calm and effective during time-sensitive production issues.
  • Willingness and ability to work in a rotational off hours monitoring and support schedule.
  • Familiarity with AWS logs, CloudWatch, and basic log analysis.
  • Experience monitoring production systems in a cloud-based environment, preferably AWS.
  • Hands-on experience with Datadog or similar observability and monitoring tools.
  • Experience using Slack or similar collaboration tools for alerting and incident communication.

Nice To Haves

  • 1 to 4 years of experience in QA, production support, application support, operations, or technical support roles.
  • Experience supporting customer-facing or business-critical systems.
  • Bachelor’s degree in computer science or related area, or equivalent experience.

Responsibilities

  • Production Monitoring and Alert Management
  • Monitor production systems and services using Datadog dashboards, alerts, and monitors.
  • Review AWS application logs and CloudWatch data to identify errors, anomalies, and performance degradation.
  • Monitor Slack alert channels and incident notifications in real time.
  • Acknowledge and respond to alerts according to defined SLAs and operational guidelines.
  • Incident Triage and Escalation
  • Perform initial incident assessment to determine scope, severity, and customer impact.
  • Execute documented runbooks and standard operating procedures.
  • Escalate issues to engineering or platform teams using established escalation paths.
  • Provide clear, accurate status updates during incidents and shift transitions.
  • Operational Support and Validation
  • Conduct basic troubleshooting using monitoring tools, dashboards, and logs.
  • Validate system recovery and confirm alert resolution after incidents.
  • Document incidents, actions taken, and outcomes in ticketing or incident tracking systems.
  • Support post-incident documentation and follow-up activities.
  • Process Adherence and Continuous Improvement
  • Adhere to defined production support processes, monitoring standards, and escalation protocols.
  • Identify recurring issues, alert noise, and monitoring gaps.
  • Provide feedback to QA, engineering, and platform teams to improve alert quality and operational readiness.
  • Contribute to the maintenance and improvement of runbooks and support documentation.
  • On-Call and Coverage Expectations
  • Provide off-hours monitoring support on a rotating schedule for nights, weekends, and holidays.
  • Ensure effective shift handoffs to maintain continuity of production monitoring.
  • Perform a wide range of testing, including functional, regression, integration, smoke, and user acceptance testing (UAT).
  • Make recommendations to improve user experience of website and mobile applications.

Benefits

  • Medical, dental, & vision insurance
  • 401k + match
  • Profit sharing
  • Paid vacation and personal time
  • Flex time
  • 10 paid holidays
  • Company performance bonus
  • Holiday bonus
  • Paid time to volunteer
  • Training & career development opportunities
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service