Cloud Platform Engineer

Ariel PartnersNyc, NY
1dOnsite

About The Position

SNAP Payment Error Rate (CAP) Reduction Project SNAP Payment Error Rate (CAP) reduction initiative is a top-priority, agency-wide strategic effort aimed at mitigating federal oversight findings and avoiding substantial financial penalties amounting to millions of dollars. In partnership with McKinsey, the agency is leveraging artificial intelligence, Robotic Process Automation and advanced analytics to strengthen eligibility, case processing accuracy, and quality control review. This initiative will modernize error detection, introduce proactive prevention capabilities, and enhance operational decision-making through data-driven insights. The outcome is expected to reduce payment inaccuracies, accelerate case resolution, improve compliance, and increase public trust in program integrity.

Requirements

  • Minimum 7 years of hands-on AWS experience: EC2, RDS, S3, CloudWatch, CloudTrail, IAM, KMS, AWS Backup, and Lambda.
  • Minimum 7 years of experience in Linux/Unix administration and automation scripting (Bash, Shell, Python).
  • Minimum 7 years of experience with Infrastructure as Code (IaC) and automation tools, including CloudFormation, Terraform, and Ansible, for provisioning and maintaining.
  • Minimum 7 years of knowledge in AWS networking: VPC, subnets, NACLs, security groups, Route 53, and multi-AZ architectures.
  • Minimum 5 years of experience CI/CD pipelines, Jenkins, and IaC for deploying AI agents and ML models into production, monitoring autonomous workflows, and supporting MLOps using Kubernetes, ECS, or EKS.
  • Minimum 4 years of experience architecting, building, and maintaining scalable data processing workflows using AWS managed services and Python (including PySpark); strong understanding of data architecture and ETL/ELT patterns.
  • Minimum 4 years of experience working with AWS AI/ML services such as SageMaker, Bedrock, and vector databases (OpenSearch).
  • Strong understanding of machine learning algorithms, NLP concepts, and deep learning frameworks such as TensorFlow, PyTorch, or Hugging Face.

Responsibilities

  • Monitor database and system performance using CloudWatch metrics, alarms, and logs; troubleshoot proactively.
  • Develop, deploy, and optimize AI/ML solutions using AWS AI services including SageMaker and Bedrock, supporting model training, inference, and integration into production systems.
  • Automate operational tasks using AWS Lambda, Systems Manager (SSM), and Infrastructure-as-Code tools such as CloudFormation or Terraform.
  • Design, build, and maintain scalable, fault-tolerant data processing and analytics workflows on AWS using services such as API Gateway, S3, EC2, RDS, Lambda, Glue, Athena, DynamoDB, EMR, Kinesis, DataSync.
  • Design and integrate agentic AI systems, including LLM-based agents, multi-agent workflows, and autonomous orchestration pipelines using frameworks such as LangChain and LangGraph.
  • Implement ETL/ELT pipelines and data architectures that support machine learning, analytics, and intelligent agent-based applications.
  • Support CI/CD pipelines for AI models and data workflows using Jenkins and container-based platforms such as ECS, EKS, or Kubernetes.
  • Apply security best practices across AI and data platforms, including IAM least-privilege access, encryption, audit logging, and compliance controls.
  • Maintain technical documentation for AI architectures, data pipelines, infrastructure configurations, and operational runbooks.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service