ByteDanceposted 17 days ago
San Jose, CA
Publishing Industries

About the position

ByteDance is seeking Software Engineers to join the Serverless Compute Infrastructure team, which is part of the core Technical Infrastructure engineering team. This team is responsible for innovating, designing, and implementing a Kubernetes-like cloud-native orchestration and cluster management system to host various online and offline workloads across ByteDance's business lines. The workloads include microservices, big data processing, machine learning, LLM training and inference, distributed storage services, and edge computing platforms. The role offers opportunities to work in a hyper-scale environment, learn cutting-edge cloud-native and Kubernetes technologies, and collaborate with innovative engineers from diverse backgrounds. The team aims to grow compute infrastructure in overseas regions, including North America, Europe, and Asia Pacific, ensuring global scaling and optimization of infrastructure.

Responsibilities

  • Build cutting edge application orchestration framework to host various types of production workloads, e.g., service management, big data jobs, distributed machine learning systems, and distributed storage services, edge computing and Public Cloud.
  • Build complex container-based cluster management to manage our hyper-scale resources and workloads, with extremely high-performance, scalability, and resilience.
  • Design and build a flexible, unified, and distributed resources/tasks scheduling framework to meet various new application requirements.
  • Design and build cluster federation, scaling, and co-location solutions to optimize resource utilization in multi-cloud environments.
  • Design, architect and implement next-gen Cloud-Native Infrastructure to enable cost-efficient, easy-to-use and secure ML platforms for latest ML workloads including but not limited to LLM training and inference that reduce time-to-market (TTM) for ByteDance customers.
  • Write high quality, product level code that is easy to maintain and test.
  • Keep up with the latest state-of-the-art in the open source and the research community in AI/ML, LLM and systems, and implementing and extending best practices.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service