LLMOps Engineer

steampunk•McLean, VA

5d•$115,000 - $145,000

About The Position

We are looking for an experienced LLMOps Engineer to design, implement, and maintain production-grade large-language-model (LLM) pipelines, deployment architectures, and monitoring systems across enterprise environments. The Senior LLMOps Engineer will play a critical role in operationalizing generative AI capabilities, ensuring that LLM-based applications are scalable, secure, reliable, and compliant with emerging AI risk and governance frameworks. This role spans the spectrum of model deployment, orchestration, evaluation, and optimization. Contributions Architect and maintain scalable LLM and RAG pipelines, including model hosting, inference optimization, retrieval layers, and context management frameworks. Lead the design and implementation of secure GenAI infrastructure across cloud environments, ensuring reliability, performance, and cost efficiency. Build and manage automated evaluation systems that assess LLM output quality, safety, latency, and adherence to AI governance requirements. Develop CI/CD workflows tailored for LLM- and GenAI-based applications, including dataset versioning, model lineage, and automated testing of prompt and model behaviors. Collaborate with AI Product Engineers and Data Scientists to productionize LLM-based prototypes into enterprise-grade, maintainable systems. Integrate vector databases, model gateways, content filters, and guardrail frameworks into end-to-end LLM solutions. Implement observability and monitoring solutions that track performance metrics, hallucination rates, cost profiles, and user interaction patterns. Lead troubleshooting and root-cause analysis for issues related to LLM deployment, inference performance, or pipeline reliability. Stay current with emerging LLM architectures, inference optimizations, fine-tuning techniques, and relevant MLSecOps patterns. Ensure compliance with data privacy, ethical AI, and AI-governance frameworks throughout pipeline design and operations. Mentor junior engineers and contribute to Steampunk’s AI engineering best practices, tooling, and reusable infrastructure patterns. You will contribute to the growth of our AI & Data Exploitation Practice!

Requirements

Ability to hold a position of public trust with the U.S. government.
Master's Degree (related program) and 7 years of relevant experience; OR Bachelor's Degree (related program) and 10 years of relevant experience; OR No degree and 16 years of relevant experience
Possesses at least one professional certification relevant to the technical service provided.
Maintain a certification relevant to the product being deployed and/or maintained.
5+ years of experience in software engineering, data engineering, MLOps , or cloud engineering, with 2+ years focusing specifically on LLM or GenAI operations.
Strong experience deploying models using frameworks such as Hugging Face Transformers, vLLM , TensorRT -LLM, or similar.
Proficiency in Python and operational tooling such as FastAPI , PyTorch , LangChain , LlamaIndex , and vector databases (FAISS, Milvus, Pinecone, or similar).
Advanced knowledge of cloud platforms (AWS, Azure, GCP) including model hosting, distributed compute , and secure networking patterns.
Hands-on experience building CI/CD pipelines , automated testing frameworks, and environment provisioning for AI/ML workloads.
Experience with Docker, Kubernetes, and infrastructure-as-code (Terraform, CloudFormation).
Familiarity with MLSecOps , AI governance , model hardening, prompt injection defenses, and content safety monitoring.
Strong understanding of logging, observability, and performance profiling for high-throughput LLM inference systems.
Excellent written and verbal communication skills, with the ability to explain trade-offs and architectural decisions to technical and non-technical stakeholders.
Demonstrated ability to balance long-term platform thinking with hands-on operations and rapid problem solving.
Experience working in agile teams and using modern project management tools.

Responsibilities

Architect and maintain scalable LLM and RAG pipelines, including model hosting, inference optimization, retrieval layers, and context management frameworks.
Lead the design and implementation of secure GenAI infrastructure across cloud environments, ensuring reliability, performance, and cost efficiency.
Build and manage automated evaluation systems that assess LLM output quality, safety, latency, and adherence to AI governance requirements.
Develop CI/CD workflows tailored for LLM- and GenAI-based applications, including dataset versioning, model lineage, and automated testing of prompt and model behaviors.
Collaborate with AI Product Engineers and Data Scientists to productionize LLM-based prototypes into enterprise-grade, maintainable systems.
Integrate vector databases, model gateways, content filters, and guardrail frameworks into end-to-end LLM solutions.
Implement observability and monitoring solutions that track performance metrics, hallucination rates, cost profiles, and user interaction patterns.
Lead troubleshooting and root-cause analysis for issues related to LLM deployment, inference performance, or pipeline reliability.
Stay current with emerging LLM architectures, inference optimizations, fine-tuning techniques, and relevant MLSecOps patterns.
Ensure compliance with data privacy, ethical AI, and AI-governance frameworks throughout pipeline design and operations.
Mentor junior engineers and contribute to Steampunk’s AI engineering best practices, tooling, and reusable infrastructure patterns.