About The Position

The Siri Speech team is looking for exceptional individuals to extend the core technology that let Siri understand, learn, and remember. You will be part of a cross-functional team consisting of software engineers as well as data and machine learning engineers/scientists and having a large impact on the Siri product. This is a rare opportunity to apply distributed data engineering techniques at the intersection of various areas such as speech recognition, natural language processing, and dialogue management. In this role you will - Implement backend tools for Speech data warehouses including cataloging the entire collection of Speech Data - Automate speech data annotation that runs on a self-serve platform - Deploy and implement LLM-based chatbots to make the unified speech warehouse queryable and actionable (such as derived dataset creation) via natural language - Automate onboarding of new speech datasets from various sources onto a unified speech warehouse for easier discoverability and inclusion in training and evaluation of Siri - Collaborate with other Data and infrastructure teams across Apple to implement querying and speech dataset creation improvements

Requirements

  • Deep expertise in Python software development, CI/CD, unit and integration testing
  • Distributed data processing tools and frameworks (Beam, Spark, Dask, Ray)
  • Strong software engineering abilities in Python

Nice To Haves

  • M.S. or Ph.D. degree in Computer Science, or equivalent experience
  • Strong data engineering background in speech and/or language/text/dialogue processing field
  • Speech and/or Machine Learning experience a plus
  • Real passion for building research demo data solution prototypes and turning them into production quality design/implementation
  • Strong interpersonal skills to work well with engineering teams
  • Excellent problem solving and critical thinking
  • Ability to work in a fast-paced environment with rapidly changing priorities
  • Passionate about building extraordinary products and experiences for our users

Responsibilities

  • Implement backend tools for Speech data warehouses including cataloging the entire collection of Speech Data
  • Automate speech data annotation that runs on a self-serve platform
  • Deploy and implement LLM-based chatbots to make the unified speech warehouse queryable and actionable (such as derived dataset creation) via natural language
  • Automate onboarding of new speech datasets from various sources onto a unified speech warehouse for easier discoverability and inclusion in training and evaluation of Siri
  • Collaborate with other Data and infrastructure teams across Apple to implement querying and speech dataset creation improvements
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service