About The Position

As a Staff Database Reliability Engineer, you will own the performance, reliability, and scaling strategy for our PostgreSQL Aurora infrastructure that powers creative workflows for leading brands. As a technical leader, you'll partner closely with backend engineering and leadership to architect scalable solutions, establish best-in-class database operations practices, and build the observability framework that keeps Air running at 99.99%+ uptime.

Requirements

  • 5+ years using PostgreSQL and working with Distributed Database.
  • Proficient in Postgres Database operations, tuning and query optimization.
  • Familiar with database observability tooling and establishing best practices around health metric.
  • Familiar with AWS Aurora or RDS.
  • You like to work in public, own problems end-to-end, and move with intentional speed - so your best ideas ship fast and make a visible dent.
  • Everyone at Air plays to win, says the hard thing, and progresses every day while building genuine relationships.

Responsibilities

  • Ensure Database Reliability & Performance
  • Own the health, performance, and availability of Air's PostgreSQL Aurora infrastructure.
  • Proactively optimize database parameters, indexes, and query patterns to maintain sub-100ms p95 response times
  • Uplevel migration practices and tooling to ensure zero-downtime schema changes as the platform scales
  • Establish and maintain comprehensive backup, recovery, and disaster recovery procedures with documented RTO/RPO targets
  • Partner with backend engineers to implement database best practices in application code (connection pooling, query optimization, caching strategies)
  • Plan and Execute Long-Term Scaling Strategy
  • Develop multi-quarter roadmap to scale Air's database infrastructure to support 10x growth in asset volume and user activity.
  • Collaborate with backend engineers and product leadership to model data growth patterns and anticipate scaling inflection points
  • Evaluate and implement horizontal scaling strategies (read replicas, sharding, partitioning) aligned with business needs
  • Continuously assess AWS Aurora capabilities, PostgreSQL ecosystem innovations, and emerging database technologies for strategic advantage
  • Design and implement database architecture that supports Air's AI-powered features and real-time creative workflows
  • Build Observability and Data Health Framework
  • Create comprehensive monitoring, alerting, and reporting systems to maintain database reliability and inform data-driven infrastructure decisions.
  • Implement detailed instrumentation for database performance metrics (query latency, connection pool utilization, replication lag, disk I/O)
  • Build automated alerting for anomalies in query performance, connection patterns, and resource utilization
  • Create executive-level dashboards showing database health trends, capacity utilization, and cost efficiency
  • Develop regular database health review cadence with engineering leadership to surface insights and drive continuous improvement

Benefits

  • Air provides comprehensive medical, dental, and vision insurance—including dependent coverage.
  • We also offer a generous workplace stipend, professional development reimbursements, and unlimited vacation time.
  • Although we’re an early-stage company, we continually seek opportunities to invest in our employees’ long-term health, wellness, and professional growth.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service