Corporate Vice President - Google Cloud Platform Engineer - Enterprise Cloud & AI Platform

New York Life•New York, NY

29d•Hybrid

About The Position

The GCP Platform Engineer at New York Life is responsible for designing, building, and operating secure, compliant, and scalable cloud and AI-enabled platforms on Google Cloud Platform (GCP). This role enables application, data, and analytics teams by providing standardized cloud infrastructure, Kubernetes platforms, and approved Google AI services, while meeting financial services regulatory, security, and resiliency requirements. The engineer partners with the Cloud, Data & AI teams, Information Security, and Risk to ensure AI workloads are deployed with appropriate governance, data controls, and observability. What You’ll Do: Enterprise Cloud & AI Platform Design and maintain enterprise GCP landing zones using Google Cloud Deployment Manager, Terraform, and Cloud Foundation Toolkit aligned with NYL governance standards. Build and operate shared cloud services supporting AI and non-AI workloads on GCP components like Cloud Storage, Cloud Functions, Cloud Run, Cloud Pub/Sub, and Cloud Spanner. Implement Infrastructure as Code (Terraform) for platform, networking, and AI service enablement Support hybrid connectivity and secure data access patterns for AI use cases using Cloud Interconnect and Cloud VPN. Kubernetes, Containers & AI Workloads Engineer and operate GKE (Google Kubernetes Engine) clusters for application and AI inference workloads Enable containerized AI services and microservices using approved base images from Google Container Registry (GCR) or JFrog Artifact Registry. Support GPU-enabled workloads where approved Implement standardized deployment patterns for AI APIs and services using Helm for Kubernetes deployment management Google AI / GenAI Enablement Enable and operate approved Google AI services, including: Vertex AI (model hosting, endpoints, pipelines – platform enablement only, agentic AI deployments and communication protocols in Vertex AI Agent Builder and Agent Engine) Gemini APIs and other managed GenAI services (as approved by NYL governance) BigQuery ML and AI-integrated analytics platforms Implement secure access controls, networking, and monitoring for AI services using Cloud Identity & Access Management (IAM), VPC Service Controls, and Cloud Monitoring. Integrate AI platforms with CI/CD pipelines and enterprise SDLC controls using tools like Harness CICD Partner with Data & AI teams to operationalize AI workloads safely and compliantly within Google Cloud environments. DevOps, Automation & MLOps Foundations Build secure CI/CD pipelines for application and AI workloads using Harness CI/CD Support MLOps foundations such as: Model deployment automation via Kubeflow, TensorFlow Extended (TFX), Vertex AI Pipelines, and Vertex AI Model Registry. Environment promotion and rollback using Terraform Monitoring and logging for AI endpoints using New Relic for synthetic monitoring, and Cloud Logging and Cloud Monitoring for deeper observability and troubleshooting. Enforce guardrails, approvals, and policy-as-code for AI usage with Cloud Security Command Center, Google Cloud Policy Analyzer, and Open Policy Agent (OPA). Security, Risk & Compliance Implement IAM, workload identity, and least-privilege models for AI services using Cloud Identity & Access Management (IAM) and Workload Identity Federation. Enforce data residency, encryption, and access policies using Cloud Key Management Service (KMS) and Cloud Data Loss Prevention (DLP). Integrate AI platform telemetry with enterprise logging, monitoring, and SIEM using Cloud Logging, Cloud Monitoring, and New Relic. Support audits, risk reviews, and regulatory requirements (SOC2, SOX, data privacy) by leveraging Google Cloud Security Command Center, Cloud Audit Logs, and Cloud Data Loss Prevention API. Reliability, Observability & Cost Management Design platforms for high availability and resilience, including AI services using GKE, Cloud Spanner, Cloud SQL, and Google Cloud Load Balancing. Monitor AI workloads for performance, reliability, and cost usage using New Relic for synthetic monitoring, Cloud Monitoring, and Cloud Trace for performance insight and Harness CCM for cost Optimize cloud and AI service costs using budgets and usage controls using Google Cloud Billing, Budgets, Alerts and Harness CCM Participate in incident response and root-cause analysis logged in service now and manage incident notifications through PagerDuty. Collaboration & Governance Partner with Data & AI, InfoSec, Security, Risk, and Application teams to ensure secure, compliant, and efficient AI platform usage. Contribute to enterprise standards for cloud and AI platform usage including Best Practices for GCP and Google Cloud Architecture Framework. Provide guidance on responsible AI platform adoption using frameworks like Google's AI Principles and Fairness Indicators. Document reference architectures and best practices for GCP AI services, MLOps, and cloud infrastructure.

Requirements

5+ years of experience in cloud, platform, or DevOps engineering
Strong hands-on experience with Google Cloud Platform specifically services like GKE, BigQuery, Cloud Storage, Cloud Functions, and Vertex AI.
Expertise in Terraform and Infrastructure as Code
Experience operating Kubernetes / GKE in enterprise environments with tools like kubectl, Helm
Proficiency in scripting with languages like Python, Bash, or Go.
Strong understanding of cloud security, IAM, and networking using VPC, Cloud IAM, and VPC Service Controls.
Experience working in regulated or highly governed environments

Nice To Haves

Experience enabling or operating Google AI services, such as: Vertex AI (endpoints, pipelines, monitoring, agentic AI engine and communication protocols) Gemini APIs or other managed GenAI services BigQuery ML and AI-integrated analytics platforms
Familiarity with MLOps concepts (model deployment, versioning, monitoring) using Kubeflow, TensorFlow Extended (TFX), and Vertex AI Pipelines.
Experience supporting AI inference workloads (not necessarily model training) in GKE or Cloud Run
Understanding of Responsible AI, data governance, and model risk controls
GCP certifications like Google Cloud Certified – Professional Cloud Architect, Google Cloud Certified – Professional Cloud DevOps Engineer; AI-related certifications such as Google Cloud Certified – Professional Machine Learning Engineer are a plus

Responsibilities

Design and maintain enterprise GCP landing zones using Google Cloud Deployment Manager, Terraform, and Cloud Foundation Toolkit aligned with NYL governance standards.
Build and operate shared cloud services supporting AI and non-AI workloads on GCP components like Cloud Storage, Cloud Functions, Cloud Run, Cloud Pub/Sub, and Cloud Spanner.
Implement Infrastructure as Code (Terraform) for platform, networking, and AI service enablement
Support hybrid connectivity and secure data access patterns for AI use cases using Cloud Interconnect and Cloud VPN.
Engineer and operate GKE (Google Kubernetes Engine) clusters for application and AI inference workloads
Enable containerized AI services and microservices using approved base images from Google Container Registry (GCR) or JFrog Artifact Registry.
Support GPU-enabled workloads where approved
Implement standardized deployment patterns for AI APIs and services using Helm for Kubernetes deployment management
Enable and operate approved Google AI services, including: Vertex AI (model hosting, endpoints, pipelines – platform enablement only, agentic AI deployments and communication protocols in Vertex AI Agent Builder and Agent Engine) Gemini APIs and other managed GenAI services (as approved by NYL governance) BigQuery ML and AI-integrated analytics platforms
Implement secure access controls, networking, and monitoring for AI services using Cloud Identity & Access Management (IAM), VPC Service Controls, and Cloud Monitoring.
Integrate AI platforms with CI/CD pipelines and enterprise SDLC controls using tools like Harness CICD
Partner with Data & AI teams to operationalize AI workloads safely and compliantly within Google Cloud environments.
Build secure CI/CD pipelines for application and AI workloads using Harness CI/CD
Support MLOps foundations such as: Model deployment automation via Kubeflow, TensorFlow Extended (TFX), Vertex AI Pipelines, and Vertex AI Model Registry.
Environment promotion and rollback using Terraform
Monitoring and logging for AI endpoints using New Relic for synthetic monitoring, and Cloud Logging and Cloud Monitoring for deeper observability and troubleshooting.
Enforce guardrails, approvals, and policy-as-code for AI usage with Cloud Security Command Center, Google Cloud Policy Analyzer, and Open Policy Agent (OPA).
Implement IAM, workload identity, and least-privilege models for AI services using Cloud Identity & Access Management (IAM) and Workload Identity Federation.
Enforce data residency, encryption, and access policies using Cloud Key Management Service (KMS) and Cloud Data Loss Prevention (DLP).
Integrate AI platform telemetry with enterprise logging, monitoring, and SIEM using Cloud Logging, Cloud Monitoring, and New Relic.
Support audits, risk reviews, and regulatory requirements (SOC2, SOX, data privacy) by leveraging Google Cloud Security Command Center, Cloud Audit Logs, and Cloud Data Loss Prevention API.
Design platforms for high availability and resilience, including AI services using GKE, Cloud Spanner, Cloud SQL, and Google Cloud Load Balancing.
Monitor AI workloads for performance, reliability, and cost usage using New Relic for synthetic monitoring, Cloud Monitoring, and Cloud Trace for performance insight and Harness CCM for cost
Optimize cloud and AI service costs using budgets and usage controls using Google Cloud Billing, Budgets, Alerts and Harness CCM
Participate in incident response and root-cause analysis logged in service now and manage incident notifications through PagerDuty.
Partner with Data & AI, InfoSec, Security, Risk, and Application teams to ensure secure, compliant, and efficient AI platform usage.
Contribute to enterprise standards for cloud and AI platform usage including Best Practices for GCP and Google Cloud Architecture Framework.
Provide guidance on responsible AI platform adoption using frameworks like Google's AI Principles and Fairness Indicators.
Document reference architectures and best practices for GCP AI services, MLOps, and cloud infrastructure.

Benefits

We provide a full package of benefits for employees – and have unique offerings for a modern workforce, including leave programs, adoption assistance, and student loan repayment programs.
Based on feedback from our employees, we continue to refine and add benefits to our offering, so that you can flourish both inside and outside of work.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume