Sr. Software Engineer – Platform Performance & Resilience (AI-Enabled)

Toshiba Global Commerce Solutions - ExternalDurham, NC
18h

About The Position

Toshiba Global Commerce Solutions is seeking a Senior Software Engineer – Platform Performance & Resilience that plays a key role in engineering performance, resilience, and observability across a three‑tier distributed architecture spanning edge devices, in-store servers, and cloud services. This role uses AI‑enabled automation to validate and enforce production‑grade reliability, with the ultimate goal of delivering measurable system stability at retail scale. The position operates at the intersection of distributed systems architecture, performance engineering, reliability validation, and intelligent automation.

Requirements

  • 4–6+ years of professional software engineering experience.
  • Strong proficiency in Node.js and Java.
  • Proven experience in performance engineering, reliability engineering, or distributed systems architecture.
  • Demonstrated experience designing systems with deterministic timeouts, retry/backoff strategies, circuit breakers, and concurrency controls.
  • Experience modeling multi‑tier systems (edge, middleware, cloud).
  • Solid understanding of SLOs, SLIs, and non‑functional validation.
  • Experience deploying services in Kubernetes‑based cloud environments.
  • Strong debugging and profiling skills for distributed systems.

Nice To Haves

  • Experience building automated resilience or fault‑injection systems.
  • Familiarity with event‑driven architectures (Kafka, Pub/Sub, MQ).
  • Experience implementing structured observability frameworks.
  • Exposure to AI‑enabled automation or workflow orchestration.
  • Experience optimizing systems in intermittently connected environments.

Responsibilities

  • Architect Reliability Across Edge–Store–Cloud
  • Design and implement platform mechanisms that ensure transaction integrity and availability across POS terminals, store middleware, and cloud services.
  • Define and validate failure‑mode strategies for intermittent connectivity, tier isolation, data replay, and synchronization conflicts.
  • Engineer patterns that prevent cascading failures and support graceful degradation under real‑world load.
  • Engineer Performance at Retail Scale
  • Define latency budgets and performance envelopes across all tiers.
  • Build systems that measure and validate throughput, concurrency limits, and resource saturation.
  • Collaborate with development teams to eliminate bottlenecks before production.
  • Build Automated Resilience Validation
  • Develop AI‑enabled systems that automatically generate and execute performance and resilience validation scenarios.
  • Integrate non‑functional quality gates into CI/CD workflows.
  • Continuously evaluate timeout, retry, circuit breaker, and backoff strategies under stress.
  • Elevate Observability & Signal Quality
  • Architect structured telemetry across edge, store, and cloud tiers.
  • Ensure end‑to-end transaction traceability.
  • Improve root‑cause detection by strengthening monitoring signal‑to-noise ratio.
  • Own Engineering Outcomes End‑to-End
  • Produce technical designs and failure‑mode analyses.
  • Implement and deploy platform components in Node.js and companion services in Java.
  • Drive production‑readiness improvements based on performance data.

Benefits

  • Group health coverage (medical, dental, & vision)
  • Employee Assistance Programs
  • Pre-tax spending accounts
  • 401(k) plan (with company match)
  • Company provided life insurance
  • Pet Insurance
  • Employee discounts
  • Generous paid holiday schedule, paid vacation & sick/personal days
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service