Engineer Lead, Site Reliability

FIS GlobalAtlanta, GA
2dHybrid

About The Position

Every day, our teams innovate across the world of finance. We collaborate to work smarter, while making a difference. We believe in diversity and inclusivity, giving a voice to everyone on the team. And we celebrate our success together. If you want to make an impact in fintech, we’d like to know: Are you FIS? NOTE: This position is hybrid (3 days onsite) in our FIS Office locations in Jacksonville (FL), Milwaukee (WI), & Atlanta (GA). About the Team: The team implements and supports Treasury and Payment solutions in Public Cloud environments (AWS/Azure), focus is around platform delivery and operational support using SRE principles What you will be doing: Build software solutions and systems to manage platform infrastructure and applications. Partner with development teams to improve services through rigorous testing and release procedures. Participate in system design consulting, platform management, and capacity planning. Improve reliability, quality, and time-to-market of our suite of software solutions. Build monitoring that alerts on symptoms rather than on outages. Run the production environment by monitoring availability and taking a holistic view of system health. Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve. Provide primary operational support and engineering for multiple large, distributed software applications. Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding. Create sustainable systems and services through automation and uplifts. Balance feature development speed and reliability with well-defined service level objectives. Partner with stakeholders to design and deliver a reliable, scalable, secure, and performant platform. Stay current on technical trends to suggest innovative tools and approaches to problems. A proactive approach to spotting problems, areas for improvement, and performance bottlenecks. Identify and resolve problems promptly to meet and improve service levels and standards.

Requirements

  • 5+ years of experience in IT operations, infrastructure management, or related technical roles.
  • Public Cloud (AWS) – Hands-on experience with AWS services for infrastructure and application hosting.
  • Infrastructure as Code (Terraform) – Strong experience in writing and managing Terraform scripts for provisioning cloud resources.
  • Containerization & Orchestration – Kubernetes (EKS) deployment and management experience.
  • Observability & Monitoring – Proficiency with tools like CloudWatch, Grafana, Prometheus, and Splunk for monitoring and alerting.
  • Scripting & Automation – Ability to automate tasks using Python, PowerShell, and Bash.
  • Operating Systems – Solid experience with Windows and Linux environments.
  • DevOps & CI/CD – Working knowledge of DevOps practices and CI/CD pipelines (e.g., Jenkins, GitHub Actions, or similar).
  • IT Operations & Support – Strong troubleshooting skills for production environments, including application and system components.
  • Problem Analysis & Resolution – Skilled in diagnosing and resolving failures in applications and infrastructure.
  • Documentation & Communication – Ability to create technical documentation and communicate effectively with technical and non-technical stakeholders.
  • Excellent Soft Skills – Analytical, decision-making, problem-solving, time management, and customer service skills.

Nice To Haves

  • ServiceNow – Experience using ServiceNow for ticket and incident management.
  • Harness.io – Familiarity with Harness.io for CI/CD deployments.
  • Azure Cloud – Exposure to Microsoft Azure services.
  • Certifications – AWS or Azure certifications.
  • Serverless Computing – AWS Lambda experience.
  • Database Knowledge – PostgreSQL administration or development experience.
  • Domain Knowledge – Understanding of Capital Markets and financial services industry.
  • Event Correlation & Analysis Tools – Experience with IT event correlation and analysis software.
  • Disaster Recovery/Business Continuity – Familiarity with DR/BC planning and support.
  • Leadership & Mentoring – Ability to guide junior technical staff and act as a mentor.

Responsibilities

  • Build software solutions and systems to manage platform infrastructure and applications.
  • Partner with development teams to improve services through rigorous testing and release procedures.
  • Participate in system design consulting, platform management, and capacity planning.
  • Improve reliability, quality, and time-to-market of our suite of software solutions.
  • Build monitoring that alerts on symptoms rather than on outages.
  • Run the production environment by monitoring availability and taking a holistic view of system health.
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve.
  • Provide primary operational support and engineering for multiple large, distributed software applications.
  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding.
  • Create sustainable systems and services through automation and uplifts.
  • Balance feature development speed and reliability with well-defined service level objectives.
  • Partner with stakeholders to design and deliver a reliable, scalable, secure, and performant platform.
  • Stay current on technical trends to suggest innovative tools and approaches to problems.
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
  • Identify and resolve problems promptly to meet and improve service levels and standards.

Benefits

  • A voice in the future of fintech
  • Always-on learning and development
  • Collaborative work environment
  • Opportunities to give back
  • Competitive salary and benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service