Sr. Site Reliability Engineer

PitchBook DataSeattle, WA
20h$175,000 - $200,000Onsite

About The Position

As a member of the Product and Engineering team at PitchBook, you will be part of a team of big thinkers, innovators, and problem solvers who strive to deepen the positive impact we have on our customers and our company every day. We value curiosity and the drive to find better ways of doing things. We thrive on customer empathy, which remains our focus when creating excellent customer experiences through product innovation. We know that greatness is achieved through collaboration and diverse points of view, so we work closely with partners around the globe. As a team, we assume positive intent in each other’s words and actions, value constructive discussions, and foster a respectful working environment built on integrity, growth, and business value. We invest heavily in our people, who are eager to learn and constantly improve. Join our team and grow with us! As a Sr. Site Reliability Engineer (SRE) in PitchBook’s engineering division, you will be creating and evolving systems to automatically run our suite of products and services reliably and consistently. As part of a team of site reliability engineers and platform engineers and in conjunction with group leadership, you will help define service level objectives (SLOs) that determine success and build systems to achieve those objectives. You will utilize your strong background in deploying, managing, and maintaining production systems, working with developers to operate and monitor large-scale services with complex distributed systems and data integrations. You will incorporate observability tools (monitoring, telemetry, tracing, alerting), perform incident management, conduct root cause analyses, eliminate single points of failure, build reliability and redundancy into our infrastructure, establish and test our recoverability, mitigate failures, and do all of these things through automation and tools. As a Sr. Site Reliability Engineer, you will take independent responsibility for building and managing large subsets of our systems. You will help build our best practices for infrastructure-as-code and your code will exemplify our quality controls. You will mentor and train other Site Reliability Engineers, platform engineers, and software engineers in reliability topics. Your ability to collaborate with colleagues, exhibit poise and adaptability in stressful situations, communicate effectively, and build resilient systems that can be consistently relied upon will be critical to your success. You will solicit feedback, learn constantly, engage others with empathy, and help create a culture of belonging, teamwork, and purpose. If you love building customer-centric solutions, strive for excellence every day, are adaptable and focused, and believe work should be fun, come join us!

Requirements

  • Bachelor's in Computer Science, Software Engineering, or related (Master's preferred)
  • 5+ years of experience building and maintaining Linux/UNIX-based systems, primarily in cloud environments (preferably GCP & AWS)
  • 5+ years of experience in a Reliability Engineering, DevOps, or infrastructure role, where infrastructure-as-code tools (e.g. Terraform, Puppet, Ansible, Chef) were used as a primary job function
  • 5+ years of experience coding in an object-oriented language, such as Java, Python, Go, or Kotlin
  • 2+ years of experience with containers and orchestration platforms, including Kubernetes and Docker
  • Deep knowledge of infrastructure systems, networking, and security, including in a cloud environment
  • Experience owning operational reliability, scalability, recoverability (backups, disaster recovery, failover), and capacity planning
  • Experience performing operational activities including batch processing, system backups, maintenance, monitoring, and providing first-tier on-call support and being part of a 24/7 response team
  • Experience with distributed, scalable microservices and event-driven architectures
  • Experience with data storage, replication, caching, and search technologies, such as PostgreSQL, MySQL, MS SQL Server, Amazon RDS, GCP CloudSQL, Redis, Elasticsearch, and Lucene/Solr
  • Hold at least one professional certification in AWS or GCP (DevOps or SysOps Engineer preferred)
  • Proficiency with the Microsoft Office suite including in-depth knowledge of Outlook, Word, and Excel with the ability to pick up new systems and software easily
  • Must be authorized to work in the United States without the need for visa sponsorship now or in the future

Responsibilities

  • Establish service level objectives (SLOs), error budgets, and service level indicators (SLIs) as success criteria that our systems and processes consistently meet or exceed these targets
  • Build recoverability into our services and systems, including disaster recovery (DR), backups/recovery, and incorporation of multi-AZ multi-regionality into cloud constructs
  • Manage connectivity (CIDRs, VPCs, Subnets), latency, and availability across distributed systems
  • Establish clustering and load balancing techniques for high availability and scalability in containerized cloud-native environments
  • Build observability systems and services (monitoring, telemetry, tracing) for reuse in our platform architecture, creating alerting for fault identification and building dashboards for metrics
  • Operate and continuously improve our services’ reliability, scalability, performance, security, and uptime
  • Learn constantly, including in available cloud-managed services (PaaS/SaaS/IaaS), libraries, frameworks, and platforms (commercial and open-source)
  • Participate in the company’s application of Agile, Lean, and principles of fast flow to engineering department efficiency and productivity and own certain tasks in process automation to achieve fluidity
  • Support the vision and values of the company through role modeling and encouraging desired behaviors
  • Participate in various company initiatives and projects as requested

Benefits

  • Comprehensive health benefits
  • Additional medical wellness incentives
  • STD, LTD, AD&D, and life insurance
  • Paid sabbatical program after four years
  • Paid family and paternity leave
  • Annual educational stipend
  • Ability to apply for tuition reimbursement
  • CFA exam stipend
  • Robust training programs on industry and soft skills
  • Employee assistance program
  • Generous allotment of vacation days, sick days, and volunteer days
  • Matching gifts program
  • Employee resource groups
  • Subsidized emergency childcare
  • Dependent Care FSA
  • Company-wide events
  • Employee referral bonus program
  • Quarterly team building events
  • 401k match
  • Shared ownership employee stock program
  • Monthly transportation stipend
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service