Senior Software Engineer, ML Infrastructure

GridmaticCupertino, CA
6d$210,000 - $267,000

About The Position

Gridmatic is a high-growth startup and a new kind of energy company, delivering affordable, clean power by optimizing renewable energy and grid-scale batteries. With offices in the Bay Area and Houston, we bring together Silicon Valley–style innovation with deep, hands-on expertise in real-world power markets and energy retail. As solar and wind become the fastest-growing sources of electricity, variability from weather and grid conditions makes energy prices more volatile. Gridmatic tackles this challenge with industry-leading forecasting and optimization—and gives our team the opportunity to work on problems that truly matter. Forecasting and trading energy are the foundation of what we do. We ingest large-scale data—weather, prices, load, and grid conditions—to build probabilistic machine learning forecasts that drive real operational decisions. Our work directly determines when power is bought, stored, or deployed, turning uncertainty into value for customers and the grid. Our impact is measurable. Gridmatic is the most profitable participant in ERCOT’s wholesale market and operates the top-performing battery asset in CAISO. Profitable without venture capital, we offer a collaborative, low-ego environment where rigorous thinking, autonomy, and continuous learning are core to how we work. We’re looking for a strong backend engineer to work closely with our ML engineers to help speed up their work through better infrastructure and tooling. What you might work on: Scaling model evaluation to handle large timeseries data Measuring and improving utilization of GPUs Automating staging and deployment of ML models Moving complex workflows to orchestration tools like Airflow/Flyte Improving python monorepo tooling (code sharing, docker, CI/CD)

Requirements

  • Strong backend software engineer who has worked with ML engineers and has helped solve their problems
  • Strong distributed systems and infrastructure skills. Is comfortable standing up services in AWS/GCP, scaling and debugging Kubernetes services, writing Terraform, and working with orchestration tools like Flyte, Airflow, or Temporal.
  • Strong software engineering skills. Being able to write easy-to-extend and well-tested code.
  • Has worked with large-scale data, and makes good choices on data storage and schema design (relational databases, data warehouses, object storage, timeseries data)

Responsibilities

  • Scaling model evaluation to handle large timeseries data
  • Measuring and improving utilization of GPUs
  • Automating staging and deployment of ML models
  • Moving complex workflows to orchestration tools like Airflow/Flyte
  • Improving python monorepo tooling (code sharing, docker, CI/CD)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service