What You’ll Do Design and build Spark data ETL pipelines on AWS data platform. Collaborate with cross functional teams such as data scientists, fraud, marketing and other business stakeholders to understand their data needs and deliver reliable solutions. Optimize data infrastructure - Design and maintain robust data infrastructure by using modern data platform architecture. Ensure data quality and reliability. Innovate and follow best practices. Ensure operational excellence of the data platform, including monitoring, incident response, performance optimization, and continuous improvement. Who We’re Looking For (“Must Haves”) Professional experience working in data warehousing, data architecture, and/or data engineering environments, especially using spark, hadoop, hive etc with solid understanding of streaming pipelines. Proficiency in at least one high-level programming language (Scala, Java, Python or equivalent) Good understanding of databases You have built large-scale data products and understand the tradeoffs made when building these features You have a deep understanding of system design, data structures, and algorithms You have an excellent knowledge of distributed computing frameworks such as Hadoop MapReduce, Spark You have a strong knowledge of following AWS infrastructure - EMR, S3, Lambda, Redshift etc You have strong understanding of data quality, governance You are a team player, self-driven, highly motivated individual who loves to learn new things
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed