About The Position

Come be a part of Red Hat's charge to democratize AI with open source! Red Hat's Global Engineering Team is looking for a Senior Machine Learning Engineer to join our newly formed AI Engineering organization. This role will be located within the AI Innovation team, which conducts customer- and science-driven research to drive innovation for Red Hat's customers. The team focuses on a pattern of "research → open-source software → product" as the way we operate our engineering work. This role will be focused on building the core logic and enhancements for our model fine-tuning and post-training libraries. In this role, you will work directly with research scientists and open source AI communities to build and improve implementations of novel training methods, ranging from SFT, continual learning, and offline preference tuning to online reinforcement learning methods like GRPO and RLHF. You will develop working relationships across multiple teams, contributing to both upstream open source projects and our internal Training Hub. The ideal candidate will be a highly collaborative individual with a passion for working on complex ML projects in an open organization where contributions are valued and expected from all levels. As this is a fast-moving area of opportunity for Red Hat, the ability to communicate productively and effectively with team members, stakeholders, and Red Hat leadership is critical. Success in this role would be delivering robust, scalable training libraries that bridge cutting-edge research with production needs. This position reports directly to the Manager of AI Innovation. This position may require occasional travel to partner collaboratively in our Boston, MA office multiple times per quarter. Successful applicants must reside in a state where Red Hat is registered to do business.

Requirements

  • Bachelor's degree in computer science or equivalent.
  • 3+ years of experience in Python development.
  • Significant background in AI/ML projects or coursework (neural networks, deep learning, language models, reinforcement learning).
  • Experience in research engineering, machine learning engineering, or applied ML roles.
  • Strong experience with common model architecture development and adapter frameworks (e.g. PyTorch, Transformers, PEFT, etc.).
  • Familiarity with distributed training frameworks (e.g. FSDP, DeepSpeed) and inference runtimes (e.g. vLLM).
  • Experience in open-source projects and collaborative development workflows.
  • Existing background in software development or engineering, building robust and consumable libraries and implementations.
  • Experience with unit testing, integration testing, and performance testing.
  • Strong self-motivation and organizational skills.
  • Excellent written and verbal communication skills.
  • Positive attitude and willingness to share ideas openly.

Nice To Haves

  • Masters or PhD in Machine Learning (ML) / Natural Language Processing (NLP).
  • Experience with MLOps and deployment systems (e.g., Kubeflow, MLflow, Kubernetes, CI/CD pipelines).
  • Experience writing functional, end-to-end or coverage tests in Python.
  • Experience with GitHub Actions, GitHub automation, or CI/CD practices.
  • Experience reading/writing, publishing, and/or implementing research papers.
  • Experience in Red Hat products.
  • Experience in large language models.

Responsibilities

  • Develop core libraries for various model post-training methods and innovations.
  • Work directly on upstream, open source projects and engage with community needs and contributions.
  • Contribute to core post-training algorithm research and engineering, introducing new methods both to community efforts and our own Training Hub.
  • Understand and adapt novel architectures and techniques to work with various post-training algorithms, across distributed training frameworks.
  • Optimize, enhance, and improve robustness and usability of both existing and in-flight projects, working closely with researchers to validate prototype logic.
  • Maintain and expand library feature pool, and address core algorithm bugs and blockers.
  • Work closely with software engineers on interface and testing designs.
  • Participate in code reviews and collaborate on best practices within the engineering team.
  • Document system designs, processes, and model performance for transparency and future reference.
  • Report on project status, challenges, and results to stakeholders.

Benefits

  • Comprehensive medical, dental, and vision coverage
  • Flexible Spending Account - healthcare and dependent care
  • Health Savings Account - high deductible medical plan
  • Retirement 401(k) with employer match
  • Paid time off and holidays
  • Paid parental leave plans for all new parents
  • Leave benefits including disability, paid family medical leave, and paid military leave
  • Additional benefits including employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, employee assistance program, and more!
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service