Senior DevOps Engineer

Astera LabsSan Jose, CA
10h

About The Position

Astera Labs (NASDAQ: ALAB) provides rack-scale AI infrastructure through purpose-built connectivity solutions. By collaborating with hyperscalers and ecosystem partners, Astera Labs enables organizations to unlock the full potential of modern AI. Astera Labs’ Intelligent Connectivity Platform integrates CXL®, Ethernet, NVLink, PCIe®, and UALink™ semiconductor-based technologies with the company’s COSMOS software suite to unify diverse components into cohesive, flexible systems that deliver end-to-end scale-up, and scale-out connectivity. The company’s custom connectivity solutions business complements its standards-based portfolio, enabling customers to deploy tailored architectures to meet their unique infrastructure requirements. Discover more at www.asteralabs.com. We are seeking a skilled Senior DevOps Engineer to join our Silicon Engineering Infrastructure team. In this role, you will be instrumental in building, maintaining, and optimizing cloud-based infrastructure that supports our semiconductor design and verification workflows. You will work closely with silicon engineering teams to ensure reliable, scalable, and efficient compute environments.

Requirements

  • 3+ years of hands-on DevOps/Infrastructure engineering experience
  • Strong problem-solving skills with the ability to debug complex system issues
  • Solid operational knowledge of AWS Cloud services, including:
  • EC2 (instance management, AMIs, spot/on-demand strategies)
  • FSx (Lustre/NetApp ONTAP for high-performance storage)
  • VPC, Security Groups, IAM, and networking fundamentals
  • Experience with scripting languages such as Python, Bash, or similar
  • Familiarity with Infrastructure-as-Code tools (Terraform, CloudFormation, or Ansible)
  • Experience with CI/CD pipelines and version control systems (Git)
  • A proactive, self-motivated approach to identifying and solving infrastructure challenges
  • Strong communication skills to collaborate with cross-functional engineering teams
  • Ability to work in a fast-paced environment with competing priorities
  • Passion for automation and continuous improvement

Nice To Haves

  • Experience with AWS ParallelCluster or similar HPC cluster management tools
  • Background in the Semiconductor/EDA industry with understanding of:
  • EDA tool workflows (simulation, synthesis, place & route, verification)
  • License management and job scheduling (LSF, Slurm, SGE)
  • Debug scenarios specific to silicon design environments
  • Knowledge of container technologies (Docker, Singularity)
  • Experience with monitoring and observability tools (CloudWatch, Prometheus, Grafana)
  • AWS certifications (Solutions Architect, SysOps Administrator) are a plus
  • Experience with hybrid cloud architectures (on-prem + cloud)
  • Familiarity with cost optimization strategies for large-scale cloud deployments
  • Understanding of security best practices in regulated environments
  • Experience with AI based tools like claude-code or copilot a plus

Responsibilities

  • Design, deploy, and maintain cloud infrastructure on AWS to support silicon engineering workloads
  • Manage and optimize EC2 instances, FSx file systems, and related AWS services for high-performance computing needs
  • Implement and manage AWS ParallelCluster for provisioning and scaling compute clusters and partitions
  • Troubleshoot and resolve complex infrastructure issues across cloud and on-premises environments
  • Develop automation scripts and Infrastructure-as-Code (IaC) solutions to streamline operations
  • Collaborate with EDA tool administrators and silicon engineers to optimize workflows and resource utilization
  • Monitor system performance, implement alerting, and ensure high availability of critical infrastructure
  • Document processes, runbooks, and best practices for team knowledge sharing
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service