Production Support Engineer-Associate

BlackRockSan Francisco, CA
17dHybrid

About The Position

We are seeking an Application Production Support Engineer to support and operate critical developer platform services. This role is operations and support focused and is not a software development position. The successful candidate will play a key role in ensuring platform stability, rapid incident resolution, and continuous improvement of our support and operational processes.

Requirements

  • 4–5+ years of experience in Production Support, Application Support, DevOps Support, or Platform Operations within a large-scale enterprise environment
  • Strong troubleshooting experience across distributed systems and multi-tier applications
  • Hands-on experience supporting CI/CD platforms (e.g., Jenkins, GitHub Actions, Azure DevOps or similar)
  • Experience with artifact repositories and container registries (e.g., JFrog Artifactory, Azure Container Registry (ACR) or similar)
  • Solid understanding of build and release processes across Java, .NET, or containerized workloads
  • Proficiency in Linux environments and command-line troubleshooting
  • Scripting ability in Python, Bash, or PowerShell to drive automation and operational efficiency
  • Experience working in cloud environments (Azure preferred)
  • Familiarity with monitoring and observability tools (e.g., Splunk, , Prometheus, Grafana, etc.)
  • Experience with incident management practices, root cause analysis, and post-incident reviews
  • Strong documentation skills with the ability to formalize and standardize operational processes

Nice To Haves

  • Experience driving automation or reliability improvements in a production support or SRE‑adjacent role
  • Familiarity with observability tools, monitoring, and incident management practices
  • Experience coordinating cross‑team initiatives or owning operational improvement projects
  • Background in structured support models, service management, or platform operations

Responsibilities

  • Provide day‑to‑day production support for core developer platform services, including artifact management, CI/CD tooling, build systems, and release pipelines
  • Triage, troubleshoot, and resolve platform incidents and service degradation in partnership with engineering teams
  • Act as a point of escalation for complex platform issues, ensuring timely resolution and clear communication to stakeholders
  • Participate in on‑call or support rotations as required
  • Analyze recurring incidents and operational pain points to identify underlying reliability gaps
  • Drive proactive improvements in automation, monitoring, alerting, and observability to reduce manual effort and incident volume
  • Contribute to post‑incident reviews and root cause analysis, ensuring learnings are captured and actions are tracked to completion
  • Formalize and standardize support processes, runbooks, and operating procedures across developer platform services
  • Improve documentation quality and accessibility to enable faster issue resolution and self-service by engineering teams
  • Design and implement structured support workflows, escalation paths, and service engagement models
  • Own and deliver small operational initiatives end‑to-end, coordinating across platform, infrastructure, and engineering teams
  • Partner closely with developers to understand usage patterns and operational requirements
  • Contribute to improving overall operational maturity and service quality of the Developer Platform

Benefits

  • employees are eligible for an annual discretionary bonus, and benefits including healthcare, leave benefits, and retirement benefits.
  • Flexible Time Off (FTO)
  • strong retirement plan
  • tuition reimbursement
  • comprehensive healthcare
  • support for working parents
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service