Principal Engineer, Data Intelligence & Retrieval

ProRata.ai•Bellevue, WA

2d•Onsite

About The Position

We are seeking a Principal Engineer to lead the development of our high-scale retrieval systems & Retrieval-Augmented Generation (RAG) construction. This role is critical in bridging the gap between massive, licensed datasets and real-time generative inference. You will be the primary architect for our RAG pipeline, focusing on the sophisticated processing, chunking, and indexing of millions of documents to power both semantic and full-text discovery.

Requirements

15+ years in Engineering with at least 8+ years architecting large-scale distributed systems.
Proven track record in Python and Golang or Rust for building commercial-grade software.
Expert-level mastery in architecting and scaling high-throughput indexing pipelines using ElasticSearch, OpenSearch, or distributed vector databases. Proven ability to design sophisticated query and indexing strategies that balance semantic richness with sub-second retrieval latency.
Deep hands-on experience optimizing large-scale search architectures for millions of documents, utilizing both traditional inverted indices and modern vector stores. Demonstrated success in implementing advanced caching and indexing optimizations to minimize generative costs while maximizing retrieval relevance.
Expertise in both SQL and NoSQL databases (e.g., MongoDB, Clickhouse, Postgres) and big data processing tools.
Excellent knowledge of algorithms, data structures, graph theory, and modern distributed application principles (REST API design, scaling, capacity sizing).
MS in Computer Science or Engineering

Responsibilities

Design and implement the end-to-end RAG construction pipeline, ensuring high-performance ingestion and transformation of diverse datasets in near real-time.
Develop and optimize hybrid retrieval strategies that combine the precision of full-text search with the contextual depth of semantic (vector) search.
Own the 'document-to-chunk' lifecycle. Implement advanced strategies for chunking, metadata enrichment, and quality filtering to ensure the most relevant context is fed into generative models.
Architect systems to handle jobs across millions of documents while optimizing indices for sub-second latency and high-throughput serving.
Recommend and implement optimizations for GPU/CPU performance, concurrency, and memory management to minimize serving costs and maximize ROI.
Act as a technical expert and influencer, guiding the team in software design and providing superior diagnostic skills for complex distributed system issues.