Senior Software Engineer, Ingestion Team

PryonWashington, DC
5dRemote

About The Position

We’re a team of AI, technology, and language experts whose DNA lives in Alexa, Siri, Watson, and virtually every human language technology product on the market. Now we’re building an industry-leading knowledge management and Retrieval-Augmented Generation (RAG) platform. Our proprietary, cutting-edge natural language processing capabilities transform unstructured data into meaningful experiences that increase productivity with unmatched accuracy and speed. The Opportunity: The Ingestion team is responsible for everything that happens between content arriving from a connector and that content being ready for search and retrieval. This means document processing pipelines that handle parsing, text extraction, chunking, metadata enrichment, embedding generation, and index population — across every file format and content type our customers throw at us. We’re in the middle of a significant architectural evolution — migrating from a legacy pipeline to a modern, workflow-orchestrated architecture with cleanly separated processing stages: intake, transformation, enrichment, and indexing. The team is also actively designing the next iteration of the pipeline to push further on throughput and resilience. This is real systems engineering: the problems are about scale, reliability, and the messy realities of processing millions of documents with wildly different structures.

Requirements

  • 5+ years of software engineering experience, with meaningful time on data processing pipelines, ETL systems, or similar infrastructure
  • Strong proficiency in Python and/or Go
  • Experience with workflow orchestration tools — Temporal, Airflow, Prefect, Step Functions, or similar
  • Understanding of distributed systems patterns: queues, workers, backpressure, idempotency, retry strategies
  • Hands-on experience with Kubernetes, Docker, Terraform, and Helm
  • Familiarity with message brokers and event streaming (Kafka, RabbitMQ, SQS, or similar)
  • Comfort working across cloud providers (AWS, Azure, GCP)

Responsibilities

  • Design and build pipeline stages for our modern ingestion architecture - from document intake through embedding generation and index writing
  • Contribute to the design of next-generation pipeline architecture as the system evolves
  • Improve system stability and scale: identify bottlenecks, reduce failure rates, and build observability into every stage
  • Work with workflow orchestration tools to manage complex, multi-step document processing with retry logic, error handling, and state management
  • Handle the realities of document diversity: PDFs, HTML, Office formats, images, structured and semi-structured data - all flowing through the same pipeline
  • Collaborate with the Connectors team (upstream) and Retrieval team (downstream) to ensure data flows cleanly across system boundaries
  • Participate in the ongoing migration from legacy systems, balancing new development with operational stability

Benefits

  • Remote first organization
  • 100% Company paid Health/Dental/Vision benefits for you and your dependents
  • Life Insurance, Short-term and Long-term Disability
  • 401k
  • Unlimited PTO
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service