About The Position

Broadridge Trading & Connectivity Solutions (BTCS) is seeking a highly skilled Manager, Site Reliability Engineering to lead and develop our regional SRE organization across three countries (Philippines, Romania, and France). This role is pivotal in ensuring the reliability, performance, and scalability of our production systems while driving operational excellence and aligning technical operations with business objectives. You will manage a team of SRE professionals, guide them through complex operational challenges, and partner with engineering, product, and infrastructure teams to deliver resilient, automated, and customer-focused services. This role requires both strong technical leadership and the ability to manage people, priorities, and outcomes in a dynamic environment.

Requirements

  • Minimum 4 years of experience managing technical teams in a production environment.
  • 10+ years of experience working with production systems, infrastructure, or software services.
  • Strong experience in FinTech or financial services is required.
  • Solid understanding of: Linux systems Monitoring tools (e.g., Nagios) Deployment & automation tooling Containers and orchestration Networking fundamentals FIX protocol
  • Proven ability to lead, motivate, and develop distributed teams.
  • Strong decision-making and problem-solving skills, especially under pressure.
  • Excellent communication skills and the ability to influence across teams.
  • Ability to work autonomously within a managed services model.
  • Fluent in English (spoken and written).
  • Ability to work across time zones and collaborate in a global environment.

Responsibilities

  • Lead, mentor, and develop a distributed team of SRE engineers.
  • Conduct performance evaluations, training plans, coaching sessions, and career development activities.
  • Foster a culture of collaboration, ownership, continuous improvement, and operational discipline.
  • Adapt team priorities and resourcing to meet operational and business needs.
  • Define and implement best practices in Site Reliability Engineering focused on availability, resilience, and scalability.
  • Oversee incident response processes, ensuring timely communication, strong coordination, and effective remediation.
  • Drive post-incident reviews and ensure preventive measures and long-term fixes are implemented.
  • Partner with engineering, product, and infrastructure leaders to shape and execute the technical roadmap.
  • Ensure systems architecture, automation, and observability solutions support business growth and operational stability.
  • Identify opportunities to improve system reliability through automation, modernization, and tooling.
  • Promote and implement automation across operations, deployments, monitoring, and recovery.
  • Define performance monitoring strategies and ensure visibility into system health and SLAs.
  • Lead adoption of tools that enhance troubleshooting, observability, and diagnostic capabilities.
  • Manage relationships with external service providers and ensure compliance with SLAs.
  • Provide technical guidance to internal teams and stakeholders across the organization.
  • Communicate risks, trends, and operational insights to leadership and business stakeholders.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service