Senior Cloud Systems Engineer (Databricks Administrator)

Providge ConsultingWashington, DC
7dHybrid

About The Position

Our client is seeking a Senior Cloud Systems Engineer to serve as the hands-on Databricks Administrator supporting the Enterprise Data Platform (EDP). This role is responsible for the operational management, security configuration, governance enforcement, and cost optimization of the Databricks environment. The engineer will ensure the platform is compliant, reliable, and scalable to support secure analytics, machine learning, and AI workloads. The position requires strong experience administering Databricks, implementing governance through Unity Catalog, integrating with cloud infrastructure, and automating platform configuration through Infrastructure-as-Code and CI/CD practices.

Requirements

  • Bachelor’s Degree in Computer Science, Information Technology, Engineering, or a related field, or equivalent practical experience.
  • 7+ years of experience in cloud infrastructure, data platform administration, or enterprise platform operations.
  • 3+ years of hands-on experience administering Databricks environments.
  • Hands-on experience managing Databricks workspaces, clusters, compute policies, SQL warehouses, runtime versions, jobs, and repositories.
  • Experience administering Unity Catalog including metastores, catalogs, schemas, permissions, service principals, and storage access.
  • Strong knowledge of identity and access management including SSO, SCIM provisioning, and role-based access control.
  • Experience implementing platform security including secrets management, audit logging, and secure connectivity.
  • Experience with automation tools such as Terraform, REST APIs, or CLI-based configuration management.
  • Experience implementing CI/CD pipelines for notebooks, jobs, and configuration promotion across environments.
  • Understanding of lakehouse architecture concepts including Delta Lake and compute-storage separation.

Nice To Haves

  • Experience working with AWS cloud services, including IAM roles, S3 storage security, and basic networking concepts.
  • Working knowledge of SQL and data engineering concepts, with basic familiarity in Python or Scala for troubleshooting.
  • Experience supporting environments with security or compliance requirements such as FedRAMP, HIPAA, or SOC2.
  • Familiarity with cloud cost optimization (FinOps) and platform performance monitoring.
  • Experience managing service operations including incident management and operational reporting.
  • Relevant certifications such as Databricks Platform Administrator, Databricks Data Engineer, or AWS Solutions Architect are a plus.

Responsibilities

  • Administer Databricks accounts and workspaces across SDLC environments.
  • Standardize configuration, naming conventions, and operational practices.
  • Configure and maintain clusters, compute policies, SQL warehouses, runtime versions, libraries, jobs, repositories, and workspace settings.
  • Monitor platform health through operational dashboards, alerts, and monitoring tools.
  • Maintain operational documentation, runbooks, and platform procedures.
  • Implement and enforce least-privilege access controls across platform resources.
  • Manage identity integrations including SSO, SCIM provisioning, and role-based access control.
  • Administer service principals and group-based access permissions.
  • Enable audit logging and support security monitoring and compliance reviews.
  • Implement secure secrets management and connectivity patterns.
  • Administer Unity Catalog including metastores, catalogs, schemas, and tables.
  • Manage data ownership, permission grants, and governance policies.
  • Configure and maintain external locations and storage credentials.
  • Support data classification, tagging, and lineage integrations with governance teams.
  • Coordinate with cloud and network teams to establish secure connectivity patterns.
  • Implement storage access controls and secure object storage integrations.
  • Support cloud logging, monitoring, and security integration with enterprise platforms.
  • Automate platform configuration and administration using APIs, CLI tools, and Infrastructure-as-Code frameworks.
  • Implement CI/CD pipelines for deploying jobs, notebooks, and configurations across environments.
  • Implement Databricks Asset Bundles (DABs) for standardized deployment workflows.
  • Reduce configuration drift through automated deployment processes.
  • Implement cost control policies such as cluster policies and auto-termination rules.
  • Analyze usage metrics and provide recommendations to improve cost efficiency.
  • Monitor and optimize SQL warehouse performance and cluster autoscaling.
  • Implement Delta Lake optimization strategies including OPTIMIZE, VACUUM, and Z-ordering.
  • Administer Delta Live Tables pipelines and support data engineering teams.
  • Monitor pipeline health and address job failures or performance issues.
  • Support integrations with business intelligence tools and metadata catalog systems.
  • Assist with troubleshooting data pipeline and query performance issues.
  • Maintain platform configuration documentation and governance standards.
  • Develop onboarding materials and self-service guides for platform user.
  • Support user onboarding and workspace access provisioning.
  • Provide guidance to platform users and development teams on best practices.
  • Conduct capacity planning and forecast resource usage based on platform growth.
  • Monitor concurrent workloads and resource allocation.
  • Recommend scaling strategies to support increased platform usage.
  • Ensure platform stability during peak usage periods.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service