Infrastructure Engineer III

TruckstopBoise, ID
2d

About The Position

As a member of the Technology Operations Team, the Infrastructure Engineer III role performs work under minimal supervision, handling complex issues and problems and referring only the most complex issue to higher-level staff. The Infrastructure Engineer III possesses comprehensive knowledge of subject matter, provides leadership, coaching, and/or mentoring to a subordinate group and may act as a lead or first-level supervisor. The Infrastructure Engineer III also works with a focus on operational readiness, automation, instrumentation, security, scalability, and stability for our platforms. They design and implement improvements in our cloud infrastructure and ensure that our platforms are production ready. This includes rightsizing and design of infrastructure, monitoring SLIs, negotiating SLOs with Product Owners, and managing alerting. The Infrastructure Engineer III participates in alert/incident response and troubleshooting production issues and collaborates with other Technology Operations teams on IT operation projects to improve system resiliency and reliability.

Requirements

  • Bachelor’s degree or equivalent professional experience required; Computer Science or Engineering preferred.
  • Minimum 5 to 7 years of experience.
  • 3 to 5 years of experience with Microsoft Azure and/or Amazon Web Services.
  • Azure/AWS associate or higher-level certification preferred.
  • 3 to 5 years building operational systems for custom software applications.
  • Expert experience with containers/orchestration tools such as Kubernetes, Azure Kubernetes Service (AKS), Amazon Elastic Kubernetes Services (EKS), etc.
  • Expert experience with Kafka event bus or equivalent message architecture.
  • Expert experience with Ansible, or other configuration management tools such as Chef, Packer, Puppet.
  • Accomplished in continuous integration and delivery services including: GitLab, GitHub Actions, etc.
  • Accomplished in using Terraform and other Infrastructure-As-Code (IAC) tools.
  • Proficiency with monitoring tools such as New Relic, Azure Insights, AWS CloudWatch, etc.
  • Possesses strong scripting skills with: PowerShell, Python.
  • Possesses strong skills to manage YAML files, integrating with Kustomization, Helm Charts, etc.
  • Demonstrates Linux-based systems administration skills in a cloud environment.
  • Expertise in managing IAM accounts and policies.
  • Experience with Rest and GraphQL.
  • Mastery of effective collaboration in a remote team environment.
  • Is a team player and strong communicator.
  • Projects a positive attitude and works well in a fast-paced environment.
  • Shows proactive problem-solving skills.
  • Exhibits excellent time management and organizational skills.
  • Proven history of learning new technologies and welcomes new challenges.
  • Communicates well with all levels of management.
  • Has a genuine interest in helping others and helping other people succeed.

Nice To Haves

  • Experience with developing in .Net or node.js is preferred.
  • Azure/AWS associate or higher-level certification preferred.

Responsibilities

  • Design and build reference architectures for Infrastructure operations under the direction of the architecture team.
  • Assist architects and designers in standardizing and automating system implementation and configuration.
  • Analyze, execute, and streamline Infrastructure Engineering practices.
  • Partner with network and security teams to implement structure, connectivity, and security of our cloud environments.
  • Perform incident resolution and root cause analysis of critical outages.
  • Implement solutions to systematic failures.
  • Provide alert response support, including after hours.
  • Provide subject matter expertise and ability to confidently resolve complex issues with minimal escalation.
  • Provide technical leadership to Infrastructure Engineering group.
  • Implement monitoring, develop automated provisioning, and develop self-healing automation.
  • Provide reporting and management information on infrastructure performance and operational KPIs.
  • Document the environments and processes that support our products.
  • Supporting System Operations in their patching schedules and acting as an escalation point.
  • Other duties as assigned.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service