Senior Software Engineer - Distributed Data Systems

DataBricks•Mountain View, CA

316d

About The Position

At Databricks, we are passionate about enabling data teams to solve the world's toughest problems - from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to improve their business. Founded by engineers - and customer obsessed - we leap at every opportunity to solve technical challenges, from designing next-gen UI/UX for interfacing with data to scaling our services and infrastructure across millions of virtual machines. And we're only getting started. Modern data analysis employs sophisticated methods such as machine learning that go well beyond the roll-up and drill-down capabilities of traditional SQL query engines. As a software engineer on the Runtime team at Databricks, you will be building the next generation distributed data storage and processing systems that can outperform specialized SQL query engines in relational query performance, yet provide the expressiveness and programming abstractions to support diverse workloads ranging from ETL to data science.

Requirements

BS (or higher) in Computer Science, related technical field or equivalent practical experience.
Comfortable working towards a multi-year vision with incremental deliverables.
Motivated by delivering customer value and impact.
5+ years of production level experience in either Java, Scala or C++.
Strong foundation in algorithms and data structures and their real-world use cases.
Experience with distributed systems, databases, and big data systems (Apache Spark, Hadoop).

Responsibilities

Develop the de facto open source standard framework for big data (Apache Spark).
Provide reliable and high performance services and client libraries for storing and accessing humongous amounts of data on cloud storage backends, e.g., AWS S3, Azure Blob Store.
Build a storage management system (Delta Lake) that combines the scale and cost-efficiency of data lakes with the performance and reliability of a data warehouse.
Simplify the complexity of real-world data engineering architecture with higher level abstractions and guarantees, including ACID transactions and time travel.
Orchestrate and operate tens of thousands of data pipelines with the Delta Pipelines project.
Build the next generation query optimizer and execution engine that's fast, tuning free, scalable, and robust.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Industry

Professional, Scientific, and Technical Services

Education Level

Bachelor's degree

Number of Employees

5,001-10,000 employees

Senior Software Engineer - Distributed Data Systems

About The Position

Requirements

Responsibilities

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company