Senior Kubernetes Platform Engineer

Apex Fintech SolutionsAustin, TX
1dHybrid

About The Position

As a Senior Site Reliability Engineer, you’ll play a pivotal role in our platform organization, driving the full lifecycle management of Kubernetes clusters—from design and deployment to maintenance and continuous improvement. You’ll focus on ensuring the reliability, security, and performance of our Kubernetes environments through automation and best practices. You’ll also collaborate closely with our Developer Experience and Release Engineering teams to deliver a reliable continuous delivery system for our application teams.

Requirements

  • Bachelor’s degree in Computer Science, Information Technology (or work equivalent experience) required
  • 5+ years of software development experience (Go, Java, Python, etc.)
  • 2+ years of hands-on Kubernetes experience (GKE, EKS, RKE, etc.)
  • Experience with Infrastructure as Code (IaC) tools and concepts (Terraform, CloudFormation, Pulumi)
  • Experience with CI/CD tools and pipelines (GitHub Actions, ArgoCD, FluxCD), and GitOps practices
  • Proficient in deploying and managing Kubernetes clusters on cloud platforms (Google Cloud, AWS, Azure) and on-premises environments
  • Solid understanding of container technologies (Docker, ContainerD, etc.) and cloud-native, microservice architectures
  • Solid Linux system administration skills
  • Excellent problem-solving and troubleshooting abilities
  • Strong communication and collaboration skills

Responsibilities

  • Tooling Development: Build and enhance automation tools and libraries for generating Kubernetes manifests, eliminating manual errors and improving deployment efficiency.
  • Cluster Management: Deploy, configure, and maintain Kubernetes clusters and supporting infrastructure, ensuring high availability, security, and performance.
  • Monitoring & Troubleshooting: Set up and manage monitoring and alerting systems (Datadog), proactively identify issues, and resolve incidents quickly.
  • Security & Compliance: Implement best practices, conduct regular audits, and ensure compliance with relevant industry standards and regulations.
  • Documentation: Maintain clear and comprehensive documentation of configurations, procedures, and best practices, and provide training to other teams on platform tools, usage and best practices.
  • Collaboration: Work closely with application teams to streamline and support Kubernetes-based deployments, and partner with other platform teams to solve infrastructure challenges and develop robust solutions.
  • Continuous Improvement: Stay current with cloud-native technologies; proactively improve existing code, processes, and systems, and advocate for enhancements to processes and platform architecture.
  • On-Call Support: Participate in an on-call rotation to respond to and resolve production incidents, maintaining system reliability and minimizing downtime.

Benefits

  • healthcare benefits (medical, dental and vision, EAP)
  • competitive PTO
  • 401k match
  • parental leave
  • HSA contribution match
  • paid subscription to the Calm app
  • generous external learning and tuition reimbursement benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service