Sr. Engineering Manager, SRE and DevOps

EverBankJacksonville, FL
23h

About The Position

The IT Director directs all processes, operations and teams within the Information Technology department. This job determines the Information Technology needs of an organization and is responsible for overseeing the implementation of solutions to fulfill the organization's information systems requirements. Key Responsibilities and Duties Directs the team in successful delivery of Information Technology services for the organization’s hardware, software and network technologies. Manages Information Technology teams and systems to ensure that the area's technological level up to date by overseeing the implementation of new procedures and techniques related to Information Technology. Oversees the selection of hardware, software and supplies needed by the team. Develops IT team project plans, including resourcing, schedules and budgets. Serves as point of accountability to business stakeholders for IT solutions/services that measurably improve business KPIs. Serves as a liaison and builds bridges among leadership groups within and outside of the IT field. Manages performance of team through regular, timely feedback as well as the formal performance review process to ensure delivery of exceptional products and engagement, motivation, and development of team. Additional Information SRE - Key Responsibilities: Leadership & Strategy Lead, mentor, and develop a high-performing team of SRE and DevOps engineers. Define and execute the long-term roadmap for observability, reliability, DevOps automation, and low‑code platform adoption. Foster a culture of ownership, engineering excellence, continuous learning, and operational discipline. Collaborate with Engineering, Architecture, Security, Compliance, and Product leadership to align platform goals with business outcomes. Observability & Monitoring Own the enterprise observability framework, ensuring full coverage of metrics, logs, traces, events, and user experience monitoring. Implement and optimize APM, distributed tracing, log aggregation systems, synthetics, and infrastructure monitoring tools. Establish and enforce SLOs/SLIs, alerting strategies, dashboards, and runbooks to ensure proactive detection and rapid resolution of issues. Continuously improve system visibility, root‑cause detection speed, and telemetry quality. DevOps & Automation Drive the design, evolution, and governance of CI/CD pipelines, build systems, and deployment automation. Champion Infrastructure-as-Code (IaC), GitOps, automated testing, security automation, and zero‑touch deployment practices. Reduce manual toil across engineering teams through workflow automation and standardized tooling. Ensure cloud and on-prem environments follow best practices for scalability, resiliency, performance, and cost efficiency. Low‑Code PaaS Platform Engineering Lead the engineering and governance strategy for enterprise Low‑Code PaaS platforms (e.g., Terraform). Partner with business units to enable rapid, governed application development at scale. Implement security, compliance, data governance, and lifecycle management controls for citizen and professional developers. Optimize platform performance, integrations, capacity planning, and operational health. Site Reliability Engineering (SRE) Lead incident management processes, including response coordination, post-incident reviews, and long‑term remediation. Introduce chaos engineering, performance benchmarking, resilience testing, and failover planning. Create reliability scorecards and KPIs to track and improve service health and operational maturity. Security, Governance & Compliance Embed DevSecOps practices across pipelines, infrastructure, and platform operations. Partner with InfoSec to ensure compliance with regulatory, audit, and internal policy requirements. Implement guardrails, access controls, and automation to enforce secure-by-default operating practices. Cross-Functional Collaboration Act as the reliability and platform engineering advocate across product and engineering teams. Provide guidance, solutions, and best practices to application teams adopting observability, DevOps, and low‑code capabilities. Communicate roadmaps, risks, achievements, and KPIs to senior technical and business leaders. Innovation & Continuous Improvement Identify and adopt emerging technologies that improve reliability, velocity, and platform capabilities. Lead proof-of-concepts and build reusable patterns, templates, and frameworks that accelerate engineering productivity. Continuously evolve tools, processes, and engineering culture to align with modern SRE and DevOps principles.

Requirements

  • 5 years of experience leading engineering teams in a high-availability, large-scale environment
  • Strong experience with cloud platforms (AWS, Azure, or GCP)
  • Expertise in observability tools (Datadog, Splunk, Prometheus, Grafana, etc.)
  • Strong hands-on background with CI/CD pipelines, Git, Infrastructure-as-Code (Terraform, CloudFormation), and container orchestration (Kubernetes)
  • Deep understanding of system resiliency, distributed systems, performance engineering, and incident management
  • Familiarity with Low‑Code PaaS platforms and enterprise governance models
  • Excellent leadership, communication, and stakeholder management skills

Nice To Haves

  • 7+ years of experience as a SME in SRE and DevOps
  • Master's degree in Computer Science, Engineering, or equivalent practical experience
  • Experience leading multi-cloud or hybrid cloud environments.
  • Prior experience building platform engineering or internal developer platform (IDP) capabilities.
  • Certifications such as: AWS/Azure/GCP Architect Kubernetes CKA/CKAD ITIL, SRE Foundations, DevOps Institute certifications

Responsibilities

  • Directs the team in successful delivery of Information Technology services for the organization’s hardware, software and network technologies.
  • Manages Information Technology teams and systems to ensure that the area's technological level up to date by overseeing the implementation of new procedures and techniques related to Information Technology.
  • Oversees the selection of hardware, software and supplies needed by the team.
  • Develops IT team project plans, including resourcing, schedules and budgets.
  • Serves as point of accountability to business stakeholders for IT solutions/services that measurably improve business KPIs.
  • Serves as a liaison and builds bridges among leadership groups within and outside of the IT field.
  • Manages performance of team through regular, timely feedback as well as the formal performance review process to ensure delivery of exceptional products and engagement, motivation, and development of team.
  • Lead, mentor, and develop a high-performing team of SRE and DevOps engineers.
  • Define and execute the long-term roadmap for observability, reliability, DevOps automation, and low‑code platform adoption.
  • Foster a culture of ownership, engineering excellence, continuous learning, and operational discipline.
  • Collaborate with Engineering, Architecture, Security, Compliance, and Product leadership to align platform goals with business outcomes.
  • Own the enterprise observability framework, ensuring full coverage of metrics, logs, traces, events, and user experience monitoring.
  • Implement and optimize APM, distributed tracing, log aggregation systems, synthetics, and infrastructure monitoring tools.
  • Establish and enforce SLOs/SLIs, alerting strategies, dashboards, and runbooks to ensure proactive detection and rapid resolution of issues.
  • Continuously improve system visibility, root‑cause detection speed, and telemetry quality.
  • Drive the design, evolution, and governance of CI/CD pipelines, build systems, and deployment automation.
  • Champion Infrastructure-as-Code (IaC), GitOps, automated testing, security automation, and zero‑touch deployment practices.
  • Reduce manual toil across engineering teams through workflow automation and standardized tooling.
  • Ensure cloud and on-prem environments follow best practices for scalability, resiliency, performance, and cost efficiency.
  • Lead the engineering and governance strategy for enterprise Low‑Code PaaS platforms (e.g., Terraform).
  • Partner with business units to enable rapid, governed application development at scale.
  • Implement security, compliance, data governance, and lifecycle management controls for citizen and professional developers.
  • Optimize platform performance, integrations, capacity planning, and operational health.
  • Lead incident management processes, including response coordination, post-incident reviews, and long‑term remediation.
  • Introduce chaos engineering, performance benchmarking, resilience testing, and failover planning.
  • Create reliability scorecards and KPIs to track and improve service health and operational maturity.
  • Embed DevSecOps practices across pipelines, infrastructure, and platform operations.
  • Partner with InfoSec to ensure compliance with regulatory, audit, and internal policy requirements.
  • Implement guardrails, access controls, and automation to enforce secure-by-default operating practices.
  • Act as the reliability and platform engineering advocate across product and engineering teams.
  • Provide guidance, solutions, and best practices to application teams adopting observability, DevOps, and low‑code capabilities.
  • Communicate roadmaps, risks, achievements, and KPIs to senior technical and business leaders.
  • Identify and adopt emerging technologies that improve reliability, velocity, and platform capabilities.
  • Lead proof-of-concepts and build reusable patterns, templates, and frameworks that accelerate engineering productivity.
  • Continuously evolve tools, processes, and engineering culture to align with modern SRE and DevOps principles.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service