The GCP Platform Engineer at New York Life is responsible for designing, building, and operating secure, compliant, and scalable cloud and AI-enabled platforms on Google Cloud Platform (GCP). This role enables application, data, and analytics teams by providing standardized cloud infrastructure, Kubernetes platforms, and approved Google AI services, while meeting financial services regulatory, security, and resiliency requirements. The engineer partners with the Cloud, Data & AI teams, Information Security, and Risk to ensure AI workloads are deployed with appropriate governance, data controls, and observability. What You’ll Do: Enterprise Cloud & AI Platform Design and maintain enterprise GCP landing zones using Google Cloud Deployment Manager, Terraform, and Cloud Foundation Toolkit aligned with NYL governance standards. Build and operate shared cloud services supporting AI and non-AI workloads on GCP components like Cloud Storage, Cloud Functions, Cloud Run, Cloud Pub/Sub, and Cloud Spanner. Implement Infrastructure as Code (Terraform) for platform, networking, and AI service enablement Support hybrid connectivity and secure data access patterns for AI use cases using Cloud Interconnect and Cloud VPN. Kubernetes, Containers & AI Workloads Engineer and operate GKE (Google Kubernetes Engine) clusters for application and AI inference workloads Enable containerized AI services and microservices using approved base images from Google Container Registry (GCR) or JFrog Artifact Registry. Support GPU-enabled workloads where approved Implement standardized deployment patterns for AI APIs and services using Helm for Kubernetes deployment management Google AI / GenAI Enablement Enable and operate approved Google AI services, including: Vertex AI (model hosting, endpoints, pipelines – platform enablement only, agentic AI deployments and communication protocols in Vertex AI Agent Builder and Agent Engine) Gemini APIs and other managed GenAI services (as approved by NYL governance) BigQuery ML and AI-integrated analytics platforms Implement secure access controls, networking, and monitoring for AI services using Cloud Identity & Access Management (IAM), VPC Service Controls, and Cloud Monitoring. Integrate AI platforms with CI/CD pipelines and enterprise SDLC controls using tools like Harness CICD Partner with Data & AI teams to operationalize AI workloads safely and compliantly within Google Cloud environments. DevOps, Automation & MLOps Foundations Build secure CI/CD pipelines for application and AI workloads using Harness CI/CD Support MLOps foundations such as: Model deployment automation via Kubeflow, TensorFlow Extended (TFX), Vertex AI Pipelines, and Vertex AI Model Registry. Environment promotion and rollback using Terraform Monitoring and logging for AI endpoints using New Relic for synthetic monitoring, and Cloud Logging and Cloud Monitoring for deeper observability and troubleshooting. Enforce guardrails, approvals, and policy-as-code for AI usage with Cloud Security Command Center, Google Cloud Policy Analyzer, and Open Policy Agent (OPA). Security, Risk & Compliance Implement IAM, workload identity, and least-privilege models for AI services using Cloud Identity & Access Management (IAM) and Workload Identity Federation. Enforce data residency, encryption, and access policies using Cloud Key Management Service (KMS) and Cloud Data Loss Prevention (DLP). Integrate AI platform telemetry with enterprise logging, monitoring, and SIEM using Cloud Logging, Cloud Monitoring, and New Relic. Support audits, risk reviews, and regulatory requirements (SOC2, SOX, data privacy) by leveraging Google Cloud Security Command Center, Cloud Audit Logs, and Cloud Data Loss Prevention API. Reliability, Observability & Cost Management Design platforms for high availability and resilience, including AI services using GKE, Cloud Spanner, Cloud SQL, and Google Cloud Load Balancing. Monitor AI workloads for performance, reliability, and cost usage using New Relic for synthetic monitoring, Cloud Monitoring, and Cloud Trace for performance insight and Harness CCM for cost Optimize cloud and AI service costs using budgets and usage controls using Google Cloud Billing, Budgets, Alerts and Harness CCM Participate in incident response and root-cause analysis logged in service now and manage incident notifications through PagerDuty. Collaboration & Governance Partner with Data & AI, InfoSec, Security, Risk, and Application teams to ensure secure, compliant, and efficient AI platform usage. Contribute to enterprise standards for cloud and AI platform usage including Best Practices for GCP and Google Cloud Architecture Framework. Provide guidance on responsible AI platform adoption using frameworks like Google's AI Principles and Fairness Indicators. Document reference architectures and best practices for GCP AI services, MLOps, and cloud infrastructure.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Executive
Education Level
No Education Listed