Senior AI Engineer (Remote from United States)

Jobgether

3d•Remote

About The Position

This role offers the opportunity to design and scale enterprise-grade document intelligence systems powered by self-hosted large language models. You will architect and operate production-ready OCR and LLM pipelines capable of processing complex, long-form documents with deterministic and auditable outputs. As a hands-on technical leader, you will own infrastructure decisions, optimize GPU-backed inference environments, and ensure system reliability at scale. The position combines deep AI engineering with production-grade backend architecture, focusing on performance, governance, and measurable correctness. Working remotely in a highly collaborative environment, you will partner closely with AI leadership while maintaining end-to-end ownership of mission-critical systems.

Requirements

6+ years of backend engineering experience, primarily in Python, building scalable production systems.
Proven experience developing OCR-based document intelligence solutions for large, long-form PDFs (100+ pages).
Hands-on experience deploying and managing open-source LLMs (e.g., LLaMA, Qwen, Mistral) using vLLM or Hugging Face TGI.
Experience operating GPU-backed inference infrastructure with performance optimization and cost-efficiency strategies.
Strong expertise in deterministic validation systems, including schema enforcement and rule-based governance layers.
Excellent debugging, systems thinking, and architectural decision-making skills.
Ability to clearly communicate technical trade-offs and business impact to both technical and non-technical stakeholders.

Nice To Haves

Preferred experience with layout-aware models (LayoutLM, DocFormer), regulated industry environments (finance, healthcare), or document-intensive workflows such as underwriting or claims processing.

Responsibilities

Architecting and implementing end-to-end OCR-heavy pipelines for long-form document processing, including PDF ingestion, layout-aware parsing, segmentation, and metadata tracking.
Designing scalable systems capable of handling 200+ page documents with high concurrency, performance consistency, and operational stability.
Integrating and optimizing OCR engines (e.g., Tesseract, PaddleOCR) and layout-aware or vision-language models for structured data extraction.
Building deterministic validation frameworks, including schema enforcement, rule-based checks, invariant validation, and automated exception routing.
Deploying and managing self-hosted LLM infrastructure using tools such as vLLM and Hugging Face TGI, including GPU-backed inference services.
Optimizing inference workloads through batching strategies, KV cache tuning, context window management, and cost-efficiency improvements.
Implementing robust observability systems with structured logging, tracing, monitoring, and automated recovery mechanisms.
Ensuring auditability, traceability, reproducibility, and governance standards for compliance-driven environments.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume