Site Reliability Engineer

Luma Financial TechnologiesCincinnati, OH
2d

About The Position

At Luma, our Site Reliability Engineer (SRE) team keeps our platform reliable, secure, and lightning fast. They own everything from AWS infrastructure and Kubernetes clusters to CI/CD pipelines, monitoring, and alerting. If you’re passionate about tackling big challenges, automating at scale, and making systems more resilient, we’d love to have you on the team.

Requirements

  • 5+ years of applicable experience in Site Reliability or Software Development Engineering required
  • You code to solve problems and are comfortable in the following languages: Java, Java, Python, Bash and Go.
  • You have strong experience with AWS (RDS, CloudFront, IAM, VPCs), Terraform, and Kubernetes.
  • You are resilience focused, with experience designing and running systems that remain dependable during failures and recover seamlessly.
  • You have hands-on experience improving and operating CI/CD pipelines (e.g., CircleCI, GitHub Actions, or similar) to help teams ship faster with confidence.
  • You stay calm under pressure, bringing incident response expertise and strong root-cause analysis skills.
  • Most importantly, you are a team player who brings clear communication, strong collaboration, and a mindset of continuous improvement.

Nice To Haves

  • Bachelor’s degree in Computer Science, Software Engineering or related concentration highly preferred

Responsibilities

  • Collaborate with product engineering teams to design and build the infrastructure their services run on.
  • Keep our Kubernetes clusters on AWS EKS running smoothly, secure, and ready to scale.
  • Design and deliver resilience strategies that cover multi-region architecture, backups, disaster recovery, and failover.
  • Automate infrastructure with Terraform and Infrastructure-as-Code, reducing manual effort and human error.
  • Help teams ship faster by improving CI/CD pipelines and deployment practices.
  • Monitor performance and reliability using modern observability tools.
  • Support on-call rotations and lead incident response with a focus on long-term fixes.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service