ByteDanceposted 5 days ago
Seattle, WA
Publishing Industries

About the position

ByteDance is seeking Software Engineers to join the Serverless Compute Infrastructure team, which is part of the core Technical Infrastructure engineering team. This team is responsible for innovating, designing, and implementing a Kubernetes-like cloud-native orchestration and cluster management system to host various online and offline workloads across ByteDance's business lines. The role involves working on microservices, big data processing, machine learning, LLM training and inference, distributed storage services, and edge computing platforms. The team aims to deliver a world-class serverless compute system with high agility, large scalability, high availability, and extreme performance assurance. As new AI/ML workloads emerge globally, the team is focused on accelerating innovation in next-gen Cloud-Native Infrastructure and Orchestration frameworks to enable cost-efficient and easy-to-use AI/ML solutions. This position offers opportunities to learn cutting-edge cloud-native and Kubernetes technologies in a hyper-scale environment and to collaborate with innovative engineers across diverse and inclusive cultures.

Responsibilities

  • Build cutting edge application orchestration framework to host various types of production workloads, e.g., service management, big data jobs, distributed machine learning systems, and distributed storage services, edge computing and Public Cloud.
  • Build complex container-based cluster management to manage our hyper-scale resources and workloads, with extremely high-performance, scalability, and resilience.
  • Design and build a flexible, unified, and distributed resources/tasks scheduling framework to meet various new application requirements.
  • Design and build cluster federation, scaling, and co-location solutions to optimize resource utilization in multi-cloud environments.
  • Design, architect and implement next-gen Cloud-Native Infrastructure to enable cost-efficient, easy-to-use and secure ML platforms for latest ML workloads including but not limited to LLM training and inference that reduce time-to-market (TTM) for ByteDance customers.
  • Write high quality, product level code that is easy to maintain and test.
  • Keep up with the latest state-of-the-art in the open source and the research community in AI/ML, LLM and systems, and implementing and extending best practices.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service