About The Position

This role offers the opportunity to design and scale enterprise-grade document intelligence systems powered by self-hosted large language models. You will architect and operate production-ready OCR and LLM pipelines capable of processing complex, long-form documents with deterministic and auditable outputs. As a hands-on technical leader, you will own infrastructure decisions, optimize GPU-backed inference environments, and ensure system reliability at scale. The position combines deep AI engineering with production-grade backend architecture, focusing on performance, governance, and measurable correctness. Working remotely in a highly collaborative environment, you will partner closely with AI leadership while maintaining end-to-end ownership of mission-critical systems.

Requirements

  • 6+ years of backend engineering experience, primarily in Python, building scalable production systems.
  • Proven experience developing OCR-based document intelligence solutions for large, long-form PDFs (100+ pages).
  • Hands-on experience deploying and managing open-source LLMs (e.g., LLaMA, Qwen, Mistral) using vLLM or Hugging Face TGI.
  • Experience operating GPU-backed inference infrastructure with performance optimization and cost-efficiency strategies.
  • Strong expertise in deterministic validation systems, including schema enforcement and rule-based governance layers.
  • Excellent debugging, systems thinking, and architectural decision-making skills.
  • Ability to clearly communicate technical trade-offs and business impact to both technical and non-technical stakeholders.

Nice To Haves

  • Preferred experience with layout-aware models (LayoutLM, DocFormer), regulated industry environments (finance, healthcare), or document-intensive workflows such as underwriting or claims processing.

Responsibilities

  • Architecting and implementing end-to-end OCR-heavy pipelines for long-form document processing, including PDF ingestion, layout-aware parsing, segmentation, and metadata tracking.
  • Designing scalable systems capable of handling 200+ page documents with high concurrency, performance consistency, and operational stability.
  • Integrating and optimizing OCR engines (e.g., Tesseract, PaddleOCR) and layout-aware or vision-language models for structured data extraction.
  • Building deterministic validation frameworks, including schema enforcement, rule-based checks, invariant validation, and automated exception routing.
  • Deploying and managing self-hosted LLM infrastructure using tools such as vLLM and Hugging Face TGI, including GPU-backed inference services.
  • Optimizing inference workloads through batching strategies, KV cache tuning, context window management, and cost-efficiency improvements.
  • Implementing robust observability systems with structured logging, tracing, monitoring, and automated recovery mechanisms.
  • Ensuring auditability, traceability, reproducibility, and governance standards for compliance-driven environments.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service