Senior Software Engineer, Observability

CrusoeSan Francisco, CA
1d$172,000 - $209,000

About The Position

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability. Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure. About This Role: We’re seeking a Senior Software Engineer to play a key role on our Observability team within the Cloud Infrastructure organization. This team owns the real-time observability platforms that underpin visibility, reliability, and operational insight across our cloud and data center infrastructure.

Requirements

  • 5+ years of experience in software or systems engineering.
  • Proficiency in Java or Go or Python for writing production-level code.
  • Practical experience managing Kubernetes clusters in a production environment.
  • Experience deploying and managing services using Helm and YAML-based configurations.
  • Ability to troubleshoot and resolve issues within distributed system architectures.
  • Experience participating in an on-call rotation for business-critical systems.

Nice To Haves

  • Experience with common observability tools such as Prometheus, Grafana, Loki, ClickHouse or Elasticsearch.
  • Familiarity with Kafka or similar message queuing systems.
  • Experience using Terraform for infrastructure provisioning.
  • Knowledge of OpenTelemetry standards.
  • Familiarity with GPU-based infrastructure or machine learning workloads.

Responsibilities

  • Maintain and manage core observability tools, including platforms for metrics, events, logs and tracing.
  • Develop and operate data pipelines to move telemetry data from various sources to backend storage.
  • Manage large-scale data ingestion and storage requirements for high-volume environments.
  • Perform regular updates and software enhancements to ensure system stability and security.
  • Participate in a standard on-call rotation to address production issues and perform root cause analysis.
  • Work with other engineering teams to implement monitoring best practices and standardized tooling.
  • Contribute to the long-term technical roadmap for the company's internal infrastructure.

Benefits

  • Industry competitive pay
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Subscription to the Calm app
  • MetLife Legal
  • Company paid commuter benefit; $300/month
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service