Cloud Data Platform Administrator

Custom Software Systems Inc.Washington, DC
2dRemote

About The Position

Custom Software Systems, Inc. (CSS) is seeking an experienced Cloud Data Platform Administrator to support the deployment, security, and operations of a modern Enterprise Data Platform (EDP) in a secure AWS GovCloud environment. This hands-on role will focus on AWS infrastructure administration, Terraform automation, CI/CD integration, and secure cloud platform operations supporting enterprise analytics and AI/ML workloads. The ideal candidate will have strong experience in cloud infrastructure, DevOps practices, monitoring, governance, and cost optimization, and will work closely with engineering, security, and platform teams to ensure the environment remains secure, scalable, compliant, and highly available. Fully Remote candidates will be considered. Hybrid candidates who can come in person up to twice a month at FRB locations in Washington, DC will given preference

Requirements

  • US Citizenship or Green Card required.
  • Three (3) years' experience building AWS Infrastructure using Terraform.
  • Three (3) years' experience building CI/CD pipelines, preferably using Azure DevOps or Gitlab CI/CD practices for promotion across SDLC environments.
  • Minimum of five (5) years of experience with integration, systems analysis, or programming experience within Cloud environments.
  • Minimum of five (5) years of experience developing systems requirements and design specifications.
  • At least seven (7) years’ demonstrated experience in: Developing software according to software development lifecycles (SDLCs), including DevOps, Agile, Lean, Iterative, or Waterfall.
  • Designing, deploying, and migrating secure and maintainable systems for Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) environments.
  • DevOps, CI/CD pipelines, containers, and related best practices for cloud deployment.
  • Experience with Amazon Web Services (AWS), Microsoft Azure or ServiceNow.
  • Hands-on experience with AWS security and networking services, including PrivateLink, Secrets Manager/Systems Manager integration, CloudWatch/CloudTrail integration, S3 bucket policies, cross-account access patterns, and KMS encryption key management.
  • Cloud platform expertise (AWS ): IAM roles/policies, object storage security patterns, networking basics (VPC concepts), logging/monitoring integration.
  • Identity & Access Management proficiency: SSO concepts, SCIM provisioning, group-based RBAC, service principals, and least-privilege patterns.
  • Security fundamentals: secrets management, secure connectivity, audit logging, access monitoring, and evidence-ready operations.
  • Automation skills: IaC using Terraform, CLI, and REST APIs for repeatable configuration and environment promotion.
  • Strong troubleshooting and problem-solving; communicate clearly during incidents and changes.
  • Proficient in at least one high-level programming language such as Python, Ruby, or Go.
  • Understanding of and ability to evaluate new technologies for fit in the current infrastructure architecture.
  • Understanding of cloud-based architecture, web servers, caching, application servers, load balancers, and storage.
  • Familiarity with loose coupling, stateless systems, and best practices for designing cloud-ready applications.
  • Understanding of cloud federation technologies such as SAML, Auth, and OpenID Connect, and how to apply these technologies to enterprise and public-facing applications.
  • Awareness of cloud information security risks and best practices, especially in a highly secure operating environment.
  • Experience transitioning legacy systems to cloud-ready architecture.
  • Experience with route tables, access control lists, firewalls, NAT, HTTP, DNS, IP, and OSI Network.
  • Familiarity with government cloud deployment regulations/compliance policies such as FedRAMP, FISMA, etc.

Nice To Haves

  • SQL proficiency and data engineering fundamentals for troubleshooting query performance issues, understanding ETL/ELT workflow patterns, and debugging data pipeline failures; basic Python/Scala familiarity for notebook/code issue diagnosis.
  • Experience with compliance and regulatory frameworks (FedRAMP, HIPAA, SOC2, or similar) including implementation of data residency requirements, retention policies, and audit-ready evidence collection.
  • SLA/SLO management, incident management, and stakeholder communication skills; ability to define platform service levels, produce operational reports, translate technical issues to business stakeholders, and manage vendor relationships (Databricks account teams).

Responsibilities

  • Implement platform monitoring/alerting, operational dashboards, and health checks; maintain runbooks and operational procedures.
  • Provision and administer AWS GovCloud infrastructure components supporting EDP environments (networking, compute, storage, IAM, logging/monitoring).
  • Implement and maintain standardized “secure-by-default” configurations aligned to agency security requirements (baseline hardening, patching coordination, configuration management).
  • Operate cloud services supporting data and analytics platforms (e.g., storage integrations, encryption/KMS patterns, secure service endpoints, VPC constructs).
  • Establish and maintain operational monitoring/alerting, health checks, runbooks, and incident support in coordination with the platform and security teams.
  • Manage change control for upgrades, feature rollouts, configuration changes, and integration changes; document impacts and rollback plans.
  • Enable and maintain audit logging and access/event visibility; support security reviews and evidence requests.
  • Configure logging and auditability (e.g., CloudTrail/CloudWatch patterns) and support evidence collection for security/compliance activities.
  • Coordinate secure networking patterns (private connectivity, egress controls, firewall/proxy constraints) with network and security stakeholders.
  • Build and manage POC environments (isolated accounts/VPCs where applicable), ensuring repeatability, cost controls, and safe teardown.
  • Coordinate secure connectivity and guardrails with cloud/network teams: private connectivity patterns, egress controls, firewall/proxy needs.
  • Implement cost guardrails: cluster policies, auto-termination, scheduling, workload sizing standards, and capacity planning.
  • Produce usage/cost insights and optimization recommendations; address waste drivers (idle compute, oversized clusters, inefficient jobs).
  • Automate administration and configuration using APIs/CLI/IaC (e.g., Terraform) to reduce manual drift and improve repeatability.
  • Maintain platform documentation: configuration baselines, security/governance standards, onboarding guides, and troubleshooting references.
  • Manage third-party integrations and ecosystem connectivity, including BI tool integrations (e.g., Power BI), and external metadata catalog integrations.
  • Conduct capacity planning and scalability analysis, including forecasting concurrent user/workload growth, platform scaling strategies, and proactive resource allocation during peak usage periods.
  • Facilitate user onboarding and enablement, including new user/team onboarding procedures, training coordination, workspace access provisioning, and creation of self-service documentation/guides.

Benefits

  • Health insurance plans
  • Health Savings Account (HSA)
  • Dental
  • Vision
  • Long-term disability
  • Short-term disability
  • Basic term life insurance
  • Supplemental term life insurance for employees, spouses, and dependents
  • Simple IRA
  • Parking/Commuting expense reimbursement
  • Training/Education
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service