Platform Engineer II – Cloud Backup & Recovery (AWS/GCP)

First AmericanSanta Ana, CA
4dRemote

About The Position

The Enterprise Backup & Recovery Platform (EBRP) team designs, builds, and operates secure, resilient, cloud-based backup solutions that protect critical business systems across First American. Our platform supports AWS and GCP environments and leverages logically air-gapped and immutable design principles to ensure recoverability, security, and regulatory compliance. We are seeking a Platform Engineer II to help evolve and operate our multi-cloud backup platform. This role blends infrastructure engineering, operational reliability, and automation development to deliver secure and scalable backup services across the enterprise. We are open to remote candidates for this role Role Overview This role is comprised of approximately: 50% Platform Engineering 30% Backup Operations & Reliability 20% Development & Automation As a Platform Engineer II, you will independently deliver infrastructure solutions of moderate scope, contribute to automation and platform improvements, and support the operational health of production backup systems. You will work under general supervision while collaborating closely with senior engineers and cross-functional teams.

Requirements

  • 2–5 years of directly related experience in cloud infrastructure, platform engineering, or DevOps
  • Experience defining infrastructure and platform capabilities using Infrastructure-as-Code and automation technologies
  • Hands-on experience with AWS required; familiarity with GCP strongly preferred
  • Experience working with CI/CD pipelines and build automation tools
  • Practical experience leveraging AI-assisted development or automation tools to improve engineering productivity and reduce manual effort
  • Experience troubleshooting production cloud systems
  • Bachelor’s degree in Computer Science, Information Technology, or related field or equivalent combination of education, certifications, and experience
  • Proficiency in at least one scripting language (Python, Bash, or similar)
  • Familiarity with multiple IaC tools (Terraform preferred; CloudFormation acceptable)
  • Experience with CI/CD pipelines, automation, and build tools
  • Working knowledge of cloud networking fundamentals (VPCs, subnets, routing, security groups/firewalls)
  • Familiarity with AWS and GCP core services (IAM, object storage, logging/monitoring, compute)
  • Familiarity with LLM-assisted coding tools and models (e.g., Cursor/Copilot, and Claude/GPT) and their secure use in dev/CI workflows.
  • Understanding of backup and disaster recovery concepts (RPO, RTO, retention policies, immutability)
  • Experience with monitoring and logging systems (CloudWatch, Cloud Logging, or similar)
  • Strong written and verbal communication skills

Responsibilities

  • Design, implement, and maintain cloud infrastructure supporting enterprise backup services across AWS and GCP
  • Define and manage infrastructure using Infrastructure-as-Code (Terraform preferred)
  • Translate engineering requirements into secure, scalable cloud architecture solutions
  • Modify and enhance existing platform components to improve resiliency, performance, and maintainability
  • Build and improve CI/CD pipelines supporting infrastructure and platform deployments
  • Contribute to IAM design, least-privilege access models, and secure cloud architecture patterns
  • Develop detailed technical specifications and documentation for infrastructure implementations
  • Partner with Security, Infrastructure, and Application teams to ensure successful platform integrations
  • Monitor and maintain production backup jobs, replication workflows, and recovery processes
  • Troubleshoot backup failures and restore issues across AWS and GCP environments
  • Perform platform maintenance, installations, upgrades, and lifecycle management activities
  • Participate in incident response, root cause analysis, and corrective action planning
  • Support disaster recovery testing and validation efforts to ensure alignment with RPO/RTO objectives
  • Improve operational runbooks and documentation to enhance reliability and efficiency
  • Participate in on-call support as required by business needs
  • Leverage AI-assisted development tools to reduce operational toil, improve code quality, and automate repetitive engineering tasks
  • Develop automation scripts and tooling (Python, Bash, or similar) to reduce manual operational effort
  • Build and enhance pipeline automation to ensure the quality and reliability of infrastructure changes
  • Create internal tools used by Platform Engineering and partner teams
  • Contribute to automated validation and regression testing for platform updates
  • Document technical designs and implementation details to support knowledge sharing

Benefits

  • First American offers a comprehensive benefits package including medical, dental, vision, 401k, PTO/paid sick leave and other great benefits like an employee stock purchase plan.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service