Staff SRE Engineer

FlowcodeNew York, NY
1d$260,000 - $290,000Hybrid

About The Position

Flowcode is seeking a Staff Site Reliability Engineer (SRE) to lead reliability and infrastructure efforts across our platforms. This role will help grow and drive our infrastructure strategy, operational rigor and observability while building and supporting the systems and tooling required to support Flowcode’s continued growth. As a technical leader within our engineering organization, you will grow and operate scalable cloud infrastructure, establish best practices around deployment and reliability, and partner closely with engineering teams to ensure systems are scalable, resilient and observable. This role combines hands-on engineering with systems and architectural leadership. You will be a pivotal member of our engineering leadership team, leading the charge for reliability and long term infrastructure growth.

Requirements

  • 8+ years of experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, or Platform Engineering
  • Hands-on experience with Kubernetes and container orchestration
  • Experience building and maintaining CI/CD and deployment pipelines
  • Experience implementing and growing GitOps workflows and tools such as ArgoCD
  • Github actions familiarity and exposure, ideally in a multiple contributor production pipeline
  • Experience with observability platforms, code quality tools and common security practices
  • Strong scripting or programming skills (Python, Go, or similar)
  • Experience supporting high-scale distributed systems
  • Experience with Infrastructure as Code (Terraform, Pulumi, or CloudFormation)
  • Strong core AWS service familiarity (EKS, EC2, S3, RDS, etc)

Nice To Haves

  • Experience designing highly available and multi-region architectures
  • Experience implementing progressive delivery or deployment strategies
  • Experience building internal developer platform tooling

Responsibilities

  • Lead Flowcode’s site reliability engineering strategy and implementation.
  • Improve system availability, scalability, and resilience across our platforms
  • Drive operational best practices across our engineering teams
  • Maintain, grow and operate scalable infrastructure on our AWS platform
  • Lead infrastructure best practices for scalability, failover, and disaster recovery
  • Work with critical infrastructure vendors on monitoring, analysis and security.
  • Build and maintain modern deployment and testing pipelines
  • Grow and maintain our GitOps workflows using ArgoCD
  • Enable safe, reliable releases through automated testing and validation
  • Manage monitoring, logging, and alerting systems
  • Improve system visibility through metrics, tracing, and logging
  • Serve as a reliability and infrastructure subject matter expert across engineering
  • Mentor engineers and promote best practices
  • Collaborate with our engineering and data team to ensure new systems are built for reliability and scale
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service