Site Reliability Engineer (SRE)

SoftProRaleigh, NC
1dRemote

About The Position

SoftPro is seeking a well-rounded Site Reliability Engineer (SRE) to join our amazing Cloud Operations Team in our Raleigh, NC office or as a remote employee. This team supports our solutions using the latest Microsoft technologies and hosting services including Microsoft Azure, encouraging your design input and creativity. The ideal candidate will develop in-depth knowledge of the platforms that run our solutions. This position will ensure our SLA’s are met through the engineering solutions the team develops.

Requirements

  • Infrastructure as Code: Ability to provision Cloud Infrastructure, services and resources using Infrastructure as Code, development of Terraform Modules, leveraging Ansible to define and set configurations of deployed resources.
  • Automation: Skills necessary to conduct automation of tasks and infrastructure orchestration leveraging PowerShell, AZ CLI, Ansible, CI/CD and other languages / tools.
  • Operating Systems: Strong understanding of Linux/Unix command line and administration, Windows Server OS administration. Experience with Building, Deploying, Patching & Managing Orchestration platforms (Service Fabric & Kubernetes), VM Scale Sets and other Azure compute workload types.
  • Proven experience with and familiarity with containerization and technologies like Docker & Kubernetes & cloud native applications.
  • Strong knowledge of distributed computing concepts and microservices.
  • Ability to manage work via ticketing system and source control (for example: Jira / DevOps / TFS / Git Repos)
  • Familiarity with CI/CD pipeline development and associated tools.
  • Working knowledge of application performance monitoring & alerting platforms / systems (For example – Azure Monitor, Application Insights, AWS CloudWatch as well as commercial off the shelf solutions)
  • Experience architecting & implementing High Availability / Disaster Recovery Solutions for Cloud Platforms constructed with Public Cloud IaaS & PaaS resources.
  • Databases: Knowledge of / familiarity with structured & document database technologies (MS SQL Server, MySQL, MongoDb / Azure CosmosDb)
  • Experience in support and configuration of reverse proxy technology stacks such as NGINX and HAProxy.
  • Strong operational knowledge of Identity Access Management, Authentication & Authorization, Role Based Access Control configurations.
  • Collaboration & Teamwork: Strong customer service skills and able to work in a team environment, with developers and operations teams. Experience supporting Highly available systems & production operations.
  • Communication: Proven ability to clearly and concisely communicate technical information to both technical and non-technical audiences.
  • Incident Management: Ability to think clearly and logically under pressure during system outages or impactful events. Able to adapt to the constant changing landscape of technologies, systems and environments.
  • Self-driven learner - must exhibit a high-level of logical & analytical skills as well as attention to detail.

Nice To Haves

  • An understanding of industry standard authentication protocols such as OpenID Connect (OIDC), OAuth 2.0 and Security Assertion Markup Language (SAML)
  • Operational experience and knowledge of the OSI model, software defined networks and public cloud network infrastructure / technologies, design principles, Network Security.
  • Understanding of SDLC, Object oriented concepts and software development languages
  • Audit & Compliance experience.

Responsibilities

  • Produce and refine observability metrics associated with our platforms and the user experience.
  • Help to refine our service level indicators monitoring capabilities with the goal of proactively identifying and resolving issues.
  • Participate in the development & architecture of our operational systems and solutions to include performance, application health, security, and scalability for all cloud-based environments.
  • Develop and maintain our Infrastructure as Code used to build alerting and configure monitoring systems.
  • Build and maintain CI/CD automation pipelines to support the services and systems we provide.
  • Demonstrate your technical curiosity through your drive to understand services leveraged in depth.
  • Collaborate with product development and engineering teams in support of future organizational direction.
  • Provide incident response; triage production issues in our software products including the underlying Cloud Infrastructure & service providers.
  • Define, implement and assess system reporting and monitoring needs.
  • Participate in team activities (standups, backlog refinements, blameless postmortems, product demonstrations / reviews).
  • Share your passion for problem solving and continuous improvement with the team.

Benefits

  • Medical, dental, vision, disability, and more
  • 401(k) with company match and Employee Stock Purchase Plan
  • vacation, holidays, and parental leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service