Production Support Manager

CapgeminiDallas, TX
3d

About The Position

The Production Support Manager leads end-to-end operations, system stability, service delivery, and continuous improvement for business‑critical platforms. The role oversees multi-level support teams, drives governance, ensures operational excellence, and manages high-value client and stakeholder relationships.

Requirements

  • Strong knowledge of data engineering platforms: Databricks, PySpark, Airflow, Snowflake.
  • Cloud & streaming familiarity: AWS services, Kafka, SQS.
  • Expertise in monitoring tools: Datadog, Splunk.
  • Strong ITSM capability across ServiceNow, Jira, and Confluence.
  • Excellent stakeholder management with executive presence and client handling maturity.
  • Ability to lead high-severity incidents and cross-functional coordination.
  • Strong analytical, planning, reporting, and communication skills.
  • Experience supporting financial services or high-availability platforms.

Nice To Haves

  • Certifications preferred: ITIL, AWS, Databricks, Snowflake, Scrum Master.

Responsibilities

  • Lead and manage L1, L2, and L3 Production Support teams across shifts and domains.
  • Steer daily operational performance, workload allocation, and capability building.
  • Own stakeholder engagement with client – ensuring transparent communication, timely updates, and expectation management.
  • Lead governance meetings, operational reviews, and provide SLA, KPI, and performance insights.
  • Drive Major Incident Management – including war room leadership and executive communication.
  • Oversee Problem Management including trend analysis, RCA quality, and long-term remediation governance.
  • Own Change Management processes ensuring CAB adherence, risk evaluation, and safe deployments.
  • Provide platform oversight across Databricks, PySpark, Airflow, Snowflake, and SQL ecosystems.
  • Oversee cloud & integrations across AWS services, SQS, Kafka.
  • Ensure full observability via Datadog, Splunk – including monitoring maturity and alert optimization.
  • Strengthen ITSM framework using ServiceNow, Jira, and knowledge base upkeep in Confluence.
  • Drive automation, AI/ML-enabled self-heal, and continuous improvement initiatives.
  • Publish executive-level dashboards, RCA packs, compliance reports, and operational insights.
  • Manage risks proactively and establish governance for performance, stability, and capacity.

Benefits

  • Paid time off based on employee grade (A-F), defined by policy: Vacation: 12-25 days, depending on grade, Company paid holidays, Personal Days, Sick Leave
  • Medical, dental, and vision coverage (or provincial healthcare coordination in Canada)
  • Retirement savings plans (e.g., 401(k) in the U.S., RRSP in Canada)
  • Life and disability insurance
  • Employee assistance programs
  • Other benefits as provided by local policy and eligibility
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service