DevOps / Site Reliability Engineer

General Matter•Los Angeles, CA

12d•$100,000 - $200,000

About The Position

About the Company General Matter is enriching uranium in America. Our mission is to restore our country’s ability to make nuclear fuel. Our fuel will help power AI, manufacturing, and other critical industries. It will power our next generation of reactors. Ultimately, it will power our national ambitions. We were incubated by Founders Fund, like Anduril and Palantir before us, and we are backed by top tier investors. Our lean, world-class team of engineers and operators is applying a first-principles approach to solving the problem of nuclear fuel production. We are a mission-driven company with a culture of urgency, accountability and transparency. About This Role As a General Matter Embedded Software Engineer, you will develop performant, safe and high-quality software to directly control our systems. Your code will be responsible for commanding actuators and processing high-speed signals in applications where safety and accuracy are exceedingly important. You will work closely with cross-functional teams, including electrical engineers, software engineers, chemical engineers, manufacturing engineers, nuclear engineers, materials scientists and physicists. If you seek high-impact and are excited by fast-paced, intense, Skunkworks-style projects, we encourage you to reach out to join our team. DevOps / Site Reliability Engineer We are seeking a highly capable DevOps / Site Reliability Engineer to help build and operate the software systems underpinning uranium enrichment R&D and production infrastructure. This role is foundational to our reliability, safety, and developer velocity. You will be responsible for designing and maintaining observability, alerting, and developer productivity systems, and for ensuring that critical internal and production services are correctly instrumented and monitored. We are only interested in candidates with strong fundamentals, sound judgment, and the ability to operate with rigor in a production environment where failures matter.

Requirements

Strong fundamentals in web service development and distributed systems
Solid understanding of networking concepts, DNS, TLS/certificate management, and HTTP
Experience operating and debugging production systems
Familiarity with observability tools (metrics, logging, alerting) and incident response
Ability to write clear, maintainable code and automation scripts
Demonstrated ownership, attention to detail, and sound technical judgment
Ability to work extended hours and weekends as necessary.

Nice To Haves

Experience with modern observability stacks (e.g., Prometheus, Grafana, OpenTelemetry, Datadog)
Hands-on experience with cloud infrastructure and infrastructure-as-code
Exposure to CI/CD pipelines and developer tooling at scale
Experience supporting safety-critical or high-reliability systems
Strong debugging skills across application, OS, and network boundaries
Prior on-call experience in a production environment

Responsibilities

Design, implement, and maintain observability and alerting systems across critical services and infrastructure
Ensure all production and internal services are properly instrumented with metrics, logs, and traces
Own and maintain developer productivity tools, CI/CD systems, and internal platforms
Participate in an on-call rotation and respond to production incidents with urgency and discipline
Lead incident reviews and drive long-term reliability improvements
Automate operational workflows to reduce manual toil and improve system resilience