Platform Engineer

Allen Control SystemsAustin, TX
15h

About The Position

Allen Control Systems (ACS) is a cutting-edge defense startup founded by two former Navy electrical engineers with a proven track record in robotics and software. We are developing a small, autonomous gun turret that employs advanced computer vision and control systems to precisely target and neutralize small drones and loitering munitions. Our innovative approach requires overcoming significant technical challenges, making this an exciting and dynamic environment for experienced engineers. With an engineering-first culture, ACS values technical excellence and innovation. Backed by our founders’ successful exits from two previous ventures acquired for a combined $180M in 2022, we are committed to ensuring that the groundbreaking technologies we develop have a real-world impact. We are seeking an experienced Platform Engineer to design, build, and own the infrastructure powering the development of ACS’s autonomous counter-drone systems. You will manage a 130+ GPU bare-metal Kubernetes cluster, own our CI/CD pipelines, and ensure our systems run reliably in both lab and field environments.

Requirements

  • Skill at Python programming and Bash scripting
  • 2+ years of experience in platform engineering, DevOps, or infrastructure engineering, with hands-on experience in production Kubernetes environments.
  • Deep expertise in bare-metal Kubernetes administration, including CNI configuration, storage backends, node management, and cluster upgrades.
  • Hands-on experience with NVIDIA GPU infrastructure, including CUDA, device plugins, GPU scheduling in Kubernetes, and KubeFlow or similar ML orchestration tooling.
  • Strong CI/CD experience including Debian packaging, build automation, artifact management, and pipeline tooling (e.g., GitLab CI, GitHub Actions, Jenkins, or equivalent).
  • Proficiency with observability tooling (e.g. ELK) for log aggregation, metrics, and alerting in distributed Linux environments.
  • Experience building C++ and Python toolchains on Linux using CMake, with familiarity with cross-compilation for ARM targets such as NVIDIA Jetson.
  • Strong Linux systems knowledge (Debian/Ubuntu preferred), including networking, storage, kernel tuning, and security hardening for production environments.

Responsibilities

  • Deploy and operate Kubernetes clusters on bare-metal infrastructure hosting 130+ NVIDIA GPUs, with hybrid burst capability to AWS for scalable compute and storage workloads.
  • Manage NVIDIA GPU clusters for ML training.
  • Own the full CI/CD pipeline from source to deployment, including artifact signing, build automation, and version pinning, ensuring repeatable delivery to cloud and edge targets.
  • Build and maintain the observability stack, including log aggregation, metrics collection, alerting, and dashboards providing real-time visibility into cluster health and system performance.
  • Collaborate with computer vision, robotics, and software engineering teams to build low-friction developer tooling that accelerates iteration on the ACS turret platform.
  • Define and enforce infrastructure-as-code practices using Terraform, Helm, or Ansible across on-prem and cloud deployments.
  • Manage network configuration, storage provisioning, and security hardening for the bare-metal cluster in compliance with applicable defense security requirements.

Benefits

  • Competitive salary
  • Health, Dental, Vision Insurance
  • Paid Time Off
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service