Big Data Engineer (Cloudera, AWS)

Elite TechnicalHerndon, VA
84dRemote

About The Position

Elite Technical is seeking a Cloudera Big Data Engineer with AWS Cloud experience to support our customer's Production Support Environment. Although primarily remote, monthly meetings occur in Reston VA and/or Washington DC. This is a 40 hour M-F position, however, as with all Production Support Environments, weekend releases and/or late night production troubleshooting efforts should be expected. Responsibilities include: - Ensure Cloudera installation and configuration is at optimal specifications (CDP, CDSW, Hive, Spark, NiFi). - Perform critical data migrations from CDH to CDP. - Design and implement big data pipelines and automated data flows using Python/R and NiFi. - Assist and provide expertise as it pertains to automating the entire project lifecycle. - Perform incremental updates and upgrades to the Cloudera environment. - Assist with new use cases (i.e., analytics/ML, data science, data ingest and processing), Infrastructure (including new cluster deployments, cluster migration, expansion, major upgrades, COOP/DR, and security). - Assist in testing, governance, data quality, training, and documentation efforts. - Move data and use YARN to allocate resources and schedule jobs. - Manage job workflows with Oozie and Hue. - Implement comprehensive security policies across the Hadoop cluster using Ranger. - Configure and manage Cloudera Data Science Workbench using Cloudera Manager. - Troubleshoot potential issues with Kerberos, TLS/SSL, Models, and Experiments, as well as other workload issues that data scientists might encounter once the application is running. - Supporting the Big Data / Hadoop databases throughout the development and production lifecycle. - Troubleshooting and resolving database integrity issues, performance issues, blocking and deadlocking issues, replication issues, log shipping issues, connectivity issues, security issues, performance tuning, query optimization, using monitoring and troubleshooting tools. - Create, test, and implement scripting for automation support. - Experience in working with Kafka ecosystem (Kafka Brokers, Connect, Zookeeper) in production is ideal - Implement and support streaming technologies such as Kafka, Spark & Kudu

Requirements

  • 8+ years Experience in Supporting Cloudera Applications running in AWS Cloud
  • Experience in Supporting Cloud Applications with Mongo DB
  • Experience in Cloud Disaster Recovery
  • Apache Kafka - strong Administration & troubleshooting skills Kafka Streams API stream processing with KStreams & Ktables
  • Kafka integration with MQ
  • Kafka broker management
  • Topic/ offset management
  • Apache Nifi - Administration - Flow management, registry server, management controller, service management
  • Nifi to kafka /Hbase /solr integration
  • Flume - integration with Kafka, Nifi & IBM MQ
  • Hbase - administration database management troubleshooting
  • Solr - administration managing Logging level managing shards & high availability

Nice To Haves

  • Experience in working with Kafka ecosystem (Kafka Brokers, Connect, Zookeeper) in production is ideal

Responsibilities

  • Ensure Cloudera installation and configuration is at optimal specifications (CDP, CDSW, Hive, Spark, NiFi).
  • Perform critical data migrations from CDH to CDP.
  • Design and implement big data pipelines and automated data flows using Python/R and NiFi.
  • Assist and provide expertise as it pertains to automating the entire project lifecycle.
  • Perform incremental updates and upgrades to the Cloudera environment.
  • Assist with new use cases (i.e., analytics/ML, data science, data ingest and processing), Infrastructure (including new cluster deployments, cluster migration, expansion, major upgrades, COOP/DR, and security).
  • Assist in testing, governance, data quality, training, and documentation efforts.
  • Move data and use YARN to allocate resources and schedule jobs.
  • Manage job workflows with Oozie and Hue.
  • Implement comprehensive security policies across the Hadoop cluster using Ranger.
  • Configure and manage Cloudera Data Science Workbench using Cloudera Manager.
  • Troubleshoot potential issues with Kerberos, TLS/SSL, Models, and Experiments, as well as other workload issues that data scientists might encounter once the application is running.
  • Supporting the Big Data / Hadoop databases throughout the development and production lifecycle.
  • Troubleshooting and resolving database integrity issues, performance issues, blocking and deadlocking issues, replication issues, log shipping issues, connectivity issues, security issues, performance tuning, query optimization, using monitoring and troubleshooting tools.
  • Create, test, and implement scripting for automation support.
  • Implement and support streaming technologies such as Kafka, Spark & Kudu

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

101-250 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service