About The Position

Building the Future of Crypto Our Krakenites are a world-class team with crypto conviction, united by our desire to discover and unlock the potential of crypto and blockchain technology. What makes us different? Kraken is a mission-focused company rooted in crypto values. As a Krakenite, you’ll join us on our mission to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. For over a decade, Kraken’s focus on our mission and crypto ethos has attracted many of the most talented crypto experts in the world. Before you apply, please read the Kraken Culture page to learn more about our internal culture, values, and mission. We also expect candidates to familiarize themselves with the Kraken app. Learn how to create a Kraken account here. As a fully remote company, we have Krakenites in 70+ countries who speak over 50 languages. Krakenites are industry pioneers who develop premium crypto products for experienced traders, institutions, and newcomers to the space. Kraken is committed to industry-leading security, crypto education, and world-class client support through our products like Kraken Pro, Desktop, Wallet, and Kraken Futures. Become a Krakenite and build the future of crypto! Proof of work The team Join our Global Operational Resiliency (OpR) team and help ensure the security, availability, and durability of one of the world's leading cryptocurrency exchanges! The OpR team is at the forefront of operational excellence, driving initiatives that safeguard our platform and support its continued growth. A dynamic group of specialists with a passion for resilience, the team thrives on collaboration across diverse business units. Together, we tackle all aspects of Incident Management and Change Management while continuously enhancing monitoring, alerting, procedural documentation, and recovery capabilities. The team plays a key role in strengthening Kraken’s Business Continuity and Disaster Recovery posture, ensuring that critical services can be sustained and recovered during disruptive events. With a focus on innovation, adaptability, and a commitment to excellence, the OpR team fosters an environment of shared learning, constructive feedback, and strategic improvements. As we look ahead, we aim to refine our processes and tools, ensuring we remain at the cutting edge of operational resiliency in the ever-evolving crypto landscape. The opportunity Contribute to maintaining our position as one of the most secure and resilient crypto exchanges by ensuring our services are operational around the clock

Requirements

  • 5+ years as an incident responder, major incident manager or disaster recovery specialist
  • Hands-on experience designing, maintaining, or executing business continuity plans and disaster recovery strategies
  • Experience running or supporting disaster recovery tests, simulations, or tabletop exercises, including documenting outcomes and follow-up actions
  • Understanding of recovery objectives (RTO/RPO) and how they apply to modern, distributed technology environments
  • Exceptionally organized and highly responsive, capable of managing operations within a technical environment that demands continuous availability
  • Experienced in leading discussions effectively during high-stakes technical bridge calls involving numerous technical stakeholders at all levels, adjusting communication styles as needed
  • You have a keen eye for detail
  • You have a solid understanding of the software development lifecycle, including the significance of testing and strategies for rollback
  • You are a quick learner with a natural aptitude for grasping complex technical solutions and articulating the impact of incidents in terms that are relevant to both IT and business considerations
  • You have prior experience with Atlassian Tools (Confluence/Jira)

Responsibilities

  • Participate in on-call duties as an Incident Manager during US business hours (2PM to 8PM UTC), taking accountability for incident responses, escalations, stakeholder coordination, and maintaining accurate records until resolution
  • Work with stakeholders to identify the root cause of incidents, assist in postmortem activities and agree follow-up actions
  • Help establish and continuously improve Business Continuity and Disaster Recovery (BCP/DR) capabilities, ensuring critical services can be recovered within defined recovery objectives
  • Plan, coordinate, and facilitate disaster recovery exercises and tabletop scenarios, working closely with technology and business stakeholders to validate recovery strategies
  • Develop, maintain, and test business continuity plans and disaster recovery documentation, ensuring alignment with regulatory expectations and operational realities
  • Partner with engineering and business teams to track remediation actions arising from incident postmortems and DR exercises
  • Inform automation efforts to further enhance monitoring and alerting capabilities
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service