Staff Software Engineer - Distributed Data Systems

Databricksposted 18 days ago

$192,000 - $260,000/Yr

Full-time • Senior

Mountain View, CA

Professional, Scientific, and Technical Services

Upload and Match ResumeTrack Jobs with Teal

About the position

At Databricks, we are obsessed with enabling data teams to solve the world's toughest problems, from security threat detection to cancer drug development. We do this by building and running the world's best data and AI infrastructure platform, so our customers can focus on the high value challenges that are central to their own missions. Founded in 2013 by the original creators of Apache Spark™, Databricks has grown from a tiny corner office in Berkeley, California to a global organization with over 1000 employees. Thousands of organizations, from small to Fortune 100, trust Databricks with their mission-critical workloads, making us one of the fastest growing SaaS companies in the world. Our engineering teams build highly technical products that fulfill real, important needs in the world. We constantly push the boundaries of data and AI technology, while simultaneously operating with the resilience, security and scale that is critical to making customers successful on our platform. We develop and operate one of the largest scale software platforms. The fleet consists of millions of virtual machines, generating terabytes of logs and processing exabytes of data per day. At our scale, we regularly observe cloud hardware, network, and operating system faults, and our software must gracefully shield our customers from any of the above. Modern data analysis employs sophisticated methods such as machine learning that go well beyond the roll-up and drill-down capabilities of traditional SQL query engines. As a software engineer on the Runtime team at Databricks, you will be building the next generation distributed data storage and processing systems that can outperform specialized SQL query engines in relational query performance, yet provide the expressiveness and programming abstractions to support diverse workloads ranging from ETL to data science.

Responsibilities

Develop the de facto open source standard framework for big data (Apache Spark™).
Deliver reliable and high performance services and client libraries for storing and accessing humongous amounts of data on cloud storage backends, e.g., AWS S3, Azure Blob Store.
Create a storage management system that combines the scale and cost-efficiency of data lakes with the performance and reliability of a data warehouse, including features like ACID transactions and time travel.
Simplify the complexity of real-world data engineering architecture through higher level abstractions.
Orchestrate and operate tens of thousands of data pipelines with a focus on deployment, testing, and upgrading.
Build the next generation query optimizer and execution engine that is fast, tuning free, scalable, and robust.

Requirements

BS in Computer Science, related technical field or equivalent practical experience.
8+ years of production level experience in either Java, Scala or C++.
Strong foundation in algorithms and data structures and their real-world use cases.
Experience with distributed systems, databases, and big data systems (Apache Spark™, Hadoop).

Nice-to-haves

Optional: MS or PhD in databases, distributed systems.
Comfortable working towards a multi-year vision with incremental deliverables.
Driven by delivering customer value and impact.

Benefits

Comprehensive benefits and perks that meet the needs of all employees.
Eligibility for annual performance bonus.
Equity options.

A Smarter and Faster Way to Build Your Resume

Go to AI Resume Builder

Staff Software Engineer - Distributed Data Systems

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company