Data Platform Engineer

KeplerNew York City, NY
20h

About The Position

We're building the ground truth platform for AI. Generic tools hallucinate data, confabulate reports, and don't show their work. We made accuracy the only possible outcome: every answer traces to its source, every calculation is reproducible, every insight is defensible. We're starting in finance and building the foundational data layer for anywhere decisions depend on trustworthy data. Kepler was founded by two Palantir veterans (20 years combined) who built core parts of Gotham and Foundry, created Palantir Quiver (the analytics engine behind $100M+ deals with BP and Airbus), led major DoD projects, and served as Head of Business Engineering at Citadel. Kepler is backed by founders of OpenAI, Facebook AI, MotherDuck, dbt, Outerbounds, and others. The Role You'll architect the foundational data platform that powers Kepler's AI research experience. Financial data is fragmented, messy, and comes in every format imaginable: SEC filings, earnings transcripts, market data feeds, research reports, live audio, internal documents. You'll own the architecture that ingests, structures, and unifies all of it into a single coherent system where every answer traces back to its source. This is a greenfield build. You'll define the storage technologies, search and retrieval systems, indexing strategies, and observability tools that become the foundation for everything we do. You'll drive technical direction, mentor engineers, and make architectural decisions that shape the platform for years to come. This role is for engineers who want to build the data infrastructure for the AI era, not another dashboard or data warehouse. Within your first 90 days, you will: Own and ship a major data pipeline end-to-end Make foundational technology decisions that shape platform architecture Build ingestion systems that power real financial research workflows Establish data engineering patterns and best practices for the team

Requirements

  • 10+ years of data engineering experience building enterprise data platforms from scratch
  • Data architecture: Proven track record designing and scaling ingestion, storage, transformation, and retrieval systems
  • Diverse data types: Deep experience with structured, unstructured, and semi-structured data. Bonus if you've worked with document processing, audio, or financial data
  • Modern data stack: Strong opinions about storage technologies, indexing strategies, orchestration tools, and observability
  • AI infrastructure: Curiosity about vector databases, embedding pipelines, and retrieval systems. You don't need to be an ML engineer, but you want to work at the intersection
  • Technical leadership: Experience driving architectural decisions and mentoring engineers
  • Practices: Git workflows, CI/CD, automated testing, data quality frameworks
  • Systems thinker who cares about how ingestion affects transformation, how transformation affects governance, how governance affects what's possible downstream
  • Strong communicator who can articulate technical trade-offs to engineering and business stakeholders
  • Thrives in fast-paced environments with high ownership

Nice To Haves

  • Financial services experience preferred but not required

Responsibilities

  • Architect the data platform: Define storage technologies, indexing strategies, search and retrieval systems, and observability tools from first principles.
  • Drive technical direction and make high-stakes architecture decisions.
  • Build ingestion pipelines: Design systems that ingest data from dozens of heterogeneous sources: SEC filings, earnings transcripts, market data, research reports, live audio, internal documents. Structured, unstructured, and everything in between.
  • Build semantic layers: Create the mapping between raw data and precise definitions that powers our platform. Normalize entities across sources, resolve ambiguity, and ensure the same concept means the same thing everywhere.
  • Build for AI and analytics: Infrastructure that serves both traditional query performance and AI-native requirements: document processing, embedding pipelines, vector search, retrieval systems that pull the right context from millions of documents in milliseconds.
  • Build provenance systems: Every number traces to a source document, section, and disclosure. Full lineage that satisfies institutional compliance and makes our AI trustworthy.
  • Own data quality: Observability, monitoring, validation, and governance. Set the standard for data reliability across the platform.
  • Mentor and grow the team: Code reviews, architectural guidance, and technical mentorship for engineers.
  • Ship with production excellence: Comprehensive testing, monitoring, deployment pipelines. Set the bar for engineering quality.

Benefits

  • Comprehensive medical, dental, vision, 401k, insurance for employees and dependents
  • Automatic coverage for basic life, AD&D, and disability insurance
  • Daily lunch in office
  • Development environment budget - latest MacBook Pro, multiple monitors, ergonomic setup, and any development tools you need
  • Unlimited PTO policy
  • "Build anything" budget - dedicated funding for whatever tools, libraries, datasets, or infrastructure you need to solve technical challenges, no questions asked
  • Learning budget - attend any conference, course, or program that makes you better at what we're building
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service