DevOps/Platform Engineer II

Pacific Northwest National Laboratory•Richland, WA

6d•Onsite

About The Position

At PNNL, our core capabilities are divided among major departments that we refer to as Directorates within the Lab, focused on a specific area of scientific research or other function, with its own leadership team and dedicated budget. Our Science & Technology directorates include National Security, Earth and Biological Sciences, Physical and Computational Sciences, and Energy and Environment. In addition, we have an Environmental Molecular Sciences Laboratory, a Department of Energy, Office of Science user facility housed on the PNNL campus. The National Security Directorate (NSD) drives science-based, mission-focused solutions to take on complex, real-world threats to our nation and the world. The AI and Data Analytics Division, part of NSD, combines profound domain expertise and creative integration of advanced hardware and software to deliver computational solutions that address complex data and analytic challenges. Working in multidisciplinary teams, we connect foundational research to engineering to operations, providing the tools to innovate quickly and field results faster. Our strengths are integrated across the data analytics lifecycle, from data acquisition and management to analysis and decision support. We are seeking a DevOps/Platform Engineer to join PNNL's AI engineering team, contributing to innovative systems spanning agentic AI platforms, large-scale data orchestration, and real-time intelligence processing. This is an excellent opportunity for early to mid-career developers to apply their software engineering skills to meaningful national security challenges while growing their expertise in AI/ML systems, cloud infrastructure, and distributed computing. Who You Are You're a motivated software engineer with foundational experience in building production systems and a strong desire to grow your expertise in AI/ML and scalable infrastructure. You're comfortable working both independently on defined tasks and collaboratively on larger initiatives. You're eager to learn new technologies, apply software engineering best practices, and contribute to mission-critical systems while building your professional network and technical reputation. What You'll Build AI Systems & Platforms Develop components of agentic AI systems and LLM-based applications Implement features using frameworks like LangChain, LlamaIndex, or similar tools Build and maintain ML pipelines, data preprocessing workflows, and model deployment infrastructure Create utilities and tools that support AI/ML development and operations Work with multi-modal data including text, structured data, and sensor information Data Pipelines & Infrastructure Build data pipelines for large-scale ETL, transformation, and analytics workflows Implement streaming data processors and event-driven components Develop microservices and APIs within distributed architectures handling high-throughput workloads Deploy containerized applications using Docker and Kubernetes Contribute to CI/CD pipelines and automated testing frameworks Mission-Critical Production Systems Write clean, well-tested code following established best practices Implement monitoring, logging, and observability for applications Build developer tooling and documentation to support team productivity Contribute to system performance optimization and debugging efforts Support deployments in cloud and secure environments Technical Leadership Work on small tasks and project elements, progressing to independent ownership Collaborate with cross-functional teams including data scientists, researchers, and senior engineers Participate in code reviews, design discussions, and technical planning Mentor junior staff and students when opportunities arise Contribute technical content to proposals and project documentation Present your work at team meetings and technical forums

Requirements

Working proficiency in Python with foundational knowledge of at least one additional language (Bash, Go, C#, JavaScript/TypeScript) for scripting and automation tasks
Understanding of Infrastructure as Code principles with exposure to tools like Terraform, CloudFormation, or Ansible and ability to write basic infrastructure configurations
Familiarity with version control workflows (Git) including branching, commits, pull requests, and collaborative development practices with willingness to learn CI/CD pipeline concepts and contribute to build automation
Eagerness to learn and apply AI assist tools (e.g., GitHub Copilot, Claude, ChatGPT) to accelerate learning, generate infrastructure code, troubleshoot issues, and improve automation script quality
Foundational knowledge of machine learning concepts including model training, evaluation, and deployment with exposure to frameworks (PyTorch, TensorFlow, scikit-learn)
Basic understanding of the ML lifecycle and MLOps principles including experiment tracking, model versioning, and monitoring with willingness to learn tools like MLflow, Weights & Biases, or Kubeflow
Exposure to or willingness to learn about ML model serving, inference APIs, and supporting infrastructure for training and deployment pipelines
Interest in supporting LLM applications, agent-based frameworks, and ML workloads on cloud platforms or Kubernetes with eagerness to grow expertise through hands-on projects
Basic knowledge of cloud computing principles and familiarity with services within AWS, Azure, or GCP (compute, storage, networking, IAM)
Exposure to containerization with Docker and foundational understanding of container orchestration concepts (Kubernetes) with willingness to learn pod management, deployments, and services
Understanding of basic networking concepts including DNS, load balancing, and firewalls with awareness of RESTful API principles and microservice architecture patterns
Familiarity with monitoring and logging tools (CloudWatch, Prometheus, Grafana, ELK Stack) and willingness to learn observability practices
Awareness of cloud-native data pipeline concepts and ETL/ELT principles with exposure to services like AWS S3, Lambda, Glue, or equivalent Azure/GCP services
Basic knowledge of cloud-based data storage systems (S3, PostgreSQL, MongoDB) and understanding of differences between relational and NoSQL databases
Foundational understanding of distributed computing and streaming concepts with exposure to frameworks like Spark, Kafka, or Ray through coursework or personal projects
Knowledge of common data formats (JSON, CSV, Parquet, Avro) with basic understanding of schema design, data validation, and data quality considerations
Ability to collaborate effectively within DevOps, platform engineering, and cross-functional teams while actively seeking mentorship and learning opportunities
Developing communication skills to document infrastructure configurations, write clear runbooks, and articulate technical challenges through team discussions and written documentation
Enthusiastic participation in code reviews and infrastructure design discussions with openness to constructive feedback and eagerness to learn best practices
Demonstrated ability to incorporate feedback, learn from operational incidents, and continuously improve through peer collaboration, self-study, and hands-on experience
PhD -OR- MS/MA -OR- BS/BA and 2 years of relevant experience
U.S. Citizenship
Applicants selected will be subject to a Federal background investigation and must meet eligibility requirements for access to classified matter in accordance with 10 CFR 710, Appendix B.
All Security Clearance positions are Testing Designated Positions, which means that the applicant selected for hire is subject to pre-employment drug testing, and post-employment random drug testing. In addition, applicants must be able to demonstrate non-use of illegal drugs, including marijuana, for the 12 consecutive months preceding completion of the requisite Questionnaire for National Security Positions (QNSP).
Applicants will be considered ineligible for security clearance processing by the U.S. Department of Energy if non-use of illegal drugs, including marijuana, for 12 months cannot be demonstrated.

Nice To Haves

Degree in computer science, software engineering, or related technical field
Exposure to infrastructure automation, deployment pipelines, or cloud platform management through coursework, personal projects, labs, or internship experience
Basic scripting or programming experience with Python, Bash, or similar languages demonstrated through academic projects or personal automation initiatives
Experience with containerization (Docker) through personal projects, coursework, or labs with interest in learning Kubernetes
Strong problem-solving abilities demonstrated through technical challenges, troubleshooting exercises, or course projects
Active engagement in learning cloud technologies, automation, MLOps, or modern infrastructure practices (e.g., coursework, certifications, or technical projects)
Demonstrated commitment to professional growth in platform or DevOps engineering through mentorship, training, or technical skill development
Participation in relevant communities, online courses (Coursera, Udemy, A Cloud Guru), or technical forums demonstrating commitment to continuous learning

Responsibilities

Develop components of agentic AI systems and LLM-based applications
Implement features using frameworks like LangChain, LlamaIndex, or similar tools
Build and maintain ML pipelines, data preprocessing workflows, and model deployment infrastructure
Create utilities and tools that support AI/ML development and operations
Work with multi-modal data including text, structured data, and sensor information
Build data pipelines for large-scale ETL, transformation, and analytics workflows
Implement streaming data processors and event-driven components
Develop microservices and APIs within distributed architectures handling high-throughput workloads
Deploy containerized applications using Docker and Kubernetes
Contribute to CI/CD pipelines and automated testing frameworks
Write clean, well-tested code following established best practices
Implement monitoring, logging, and observability for applications
Build developer tooling and documentation to support team productivity
Contribute to system performance optimization and debugging efforts
Support deployments in cloud and secure environments
Work on small tasks and project elements, progressing to independent ownership
Collaborate with cross-functional teams including data scientists, researchers, and senior engineers
Participate in code reviews, design discussions, and technical planning
Mentor junior staff and students when opportunities arise
Contribute technical content to proposals and project documentation
Present your work at team meetings and technical forums

Benefits

Employees and their families are offered medical insurance, dental insurance, vision insurance, robust telehealth care options, several mental health benefits, free wellness coaching, health savings account, flexible spending accounts, basic life insurance, disability insurance, employee assistance program, business travel insurance, tuition assistance, relocation, backup childcare, legal benefits, supplemental parental bonding leave, surrogacy and adoption assistance, and fertility support.
Employees are automatically enrolled in our company-funded pension plan and may enroll in our 401 (k) savings plan with company match.
Employees may accrue up to 120 vacation hours per year and may receive ten paid holidays per year.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume