About The Position

Costco IT is responsible for the technical future of Costco Wholesale, the third largest retailer in the world with wholesale operations in fourteen countries. Despite our size and explosive international expansion, we continue to provide a family, employee centric atmosphere in which our employees thrive and succeed. This is an environment unlike anything in the high-tech world and the secret of Costco’s success is its culture. The value Costco puts on its employees is well documented in articles from a variety of publishers including Bloomberg and Forbes. Our employees and our members come FIRST. Costco is well known for its generosity and community service and has won many awards for its philanthropy. The company joins with its employees to take an active role in volunteering by sponsoring many opportunities to help others. Come join the Costco Wholesale IT family. Costco IT is a dynamic, fast-paced environment, working through exciting transformation efforts. We are building the next generation retail environment where you will be surrounded by dedicated and highly professional employees. Software Engineers perform development work across the technology stack (both front-end/back-end expertise). They are versatile in how they can add value, demonstrating the ability to manage the completion of projects that involve databases, back-end services, or the development of front-end applications. They should be able to demonstrate a strong understanding of emerging technologies to support the development of new solutions. Software Engineers understand the full technology stack and underlying applications, services, and databases in order to ensure optimal performance. The AI Software Engineer is responsible for the hands-on development, maintenance, and expansion of our enterprise AI Platform as a Service. The candidate in this role will be a key contributor to building the 'Operating System for Intelligence,' developing the shared services, standardized interfaces, and agentic orchestration modules that enable the organization to deploy semantic discovery, conversational intelligence, and autonomous agents. This role will move beyond building isolated applications to engineering a scalable, multi-tenant execution architecture using a managed Agent Runtime Environment. The mission of this role is to maintain the platform's core infrastructure while building specialized supervisor agents and standardized toolsets that serve as the modular building blocks for enterprise-wide automation. If you want to be a part of one of the worldwide BEST companies “to work for”, simply apply and let your career be reimagined.

Requirements

  • 5+ years years of experience in back-end software development with a focus on API design and microservices.
  • 5+ years of experience with API development, with an emphasis on security and performance.
  • 5+ years of experience with microservice-based debugging and performance testing.
  • 5+ years of experience developing within an agile methodology.
  • 1+ years of hands-on experience building with LLM APIs, function calling, or orchestration frameworks.
  • Expertise experience architecting and maintaining containerized autonomous workloads across elastic container orchestration and scalable serverless runtimes, with a focus on high-availability event-driven architectures.
  • Extensive experience managing high-scale semantic indices and engineered multimodal data pipelines that unify structured relational repositories with unstructured knowledge for agentic grounding.
  • Expert-level proficiency in asynchronous orchestration frameworks utilizing type-safe schema validation and high-performance API gateways. Complementary mastery of statically typed systems for engineering highly concurrent, low-latency agentic back-end infrastructure.
  • Expert mastery of stateful graph orchestration and multi-agent coordination frameworks, with a proven ability to design complex cyclic reasoning loops and automated task delegation systems.
  • Proficiency in architecting semantic retrieval layers, attribute-aware discovery, and stateful persistence systems to provide high-fidelity long-term context for autonomous agents.
  • Deep understanding of MCP, A2A, REST/gRPC APIs, Oauth2 security, and function calling mechanics.
  • Experience with Infrastructure as Code and CI/CD for prompt engineering and model deployment.
  • Experience leading technical workstreams, translating business problems into AI-native architectures.
  • Strong verbal and written communication skills and be able to communicate to both technical and Business audiences.
  • Ability to work under pressure in crisis with a strong sense of urgency.
  • Responsible, conscientious, organized, self-motivated, and able to work with limited supervision.
  • Detail-oriented and possess strong problem-solving skills and ability to analyze potential future issues.
  • Able to support off-hours work as required, including weekends, holidays, and 24/7 on-call responsibilities on a rotational basis.

Nice To Haves

  • Bachelor’s degree in Computer Science, Software Engineering, or a related technical field.
  • Master’s degree or PhD with a focus on Distributed Systems, AI Orchestration, or Machine Learning.
  • Google Cloud Professional Data Engineer, Google Cloud Professional Cloud Architect, or any Agentic AI Specialty Certification focusing on Multi-Agent Systems and Autonomous Reasoning.
  • 3+ years distributed cache technologies
  • Experience with deploying and configuring Cloud Platform resources.
  • Experience working in a retail ecommerce environment.
  • Proficient in Google Workspace applications, including Sheets, Docs, Slides, and Gmail.

Responsibilities

  • Develops the conceptual systems architecture design and the supporting technologies needed to enable new and/or enhanced functionality within a given product/application, applying principles that promote availability, reusability, interoperability, and security into the design framework.
  • Identifies deficiencies within a product/application’s code base and identifies opportunities to improve overall code quality.
  • Collaborates with team members (e.g., Systems Architects, Systems Analysts) to define project specifications and release documentation for all phases of the product development cycle from product definition to design, through implementation.
  • Conducts peer code reviews for the software changes made by other Engineers within a team.
  • Maintains and evolves core AI platform services, such as shared memory banks and session management. This includes managing the central reasoning engine.
  • Architects and optimizes modular agentic framework templates using standardized orchestration SDKs to manage complex, stateful workflows.
  • Creates and maintains a library of standardized capability servers using the model context protocol, enabling agents to securely orchestrate tasks across enterprise data platforms, customer relationship management systems, and distributed microservices.
  • Integrates the AI Platform as a Service with existing enterprise infrastructure. This includes developing code for authentication systems, logging pipelines, and CI/CD workflows.
  • Manages and optimizes the platform’s semantic retrieval architecture, ensuring high-speed, low-latency access to grounded enterprise knowledge through a unified discovery and retrieval engine.
  • Engineers and optimizes discovery agents that leverage semantic retrieval and neural search platforms to ingest, validate, and cite evidence from unified multimodal lakehouses.
  • Establishes Agent Identity (IAM) protocols to ensure every autonomous action is authenticated, authorized, and logged under a secure service principal.
  • Implements and scales automated AI performance benchmarking systems to continuously monitor mission success rates and proactively identify regressions in reasoning, safety guardrails, or autonomous decision-making.
  • Serves as the primary responder for systemic platform anomalies, utilizing distributed traceability, and aggregated telemetry to diagnose and resolve bottlenecks within complex multi-agent reasoning chains.
  • Maintains internal documentation and SDKs that empower other software teams to onboard their use cases onto the AI Platform as a Service.
  • Ensures the longevity, scalability, and quality of our systems through continuous improvement, comprehensive documentation, meticulous profiling, and significant performance enhancements.

Benefits

  • We offer a comprehensive package of benefits including paid time off, health benefits - medical/dental/vision/hearing aid/pharmacy/behavioral health/employee assistance, health care reimbursement account, dependent care assistance plan, short-term disability and long-term disability insurance, AD&D insurance, life insurance, 401(k), stock purchase plan to eligible employees.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service