Software Development Engineer

Advanced Micro Devices, IncSeattle, WA
2dHybrid

About The Position

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: Do you want to develop communication libraries to enable high performance computing and machine learning workloads at Exascale? AMD is searching for talented and motivated mathematicians, scientists and engineers to develop GPU libraries as part of the AMD Radeon Open Ecosystem (ROCm). THE PERSON: You are accustomed to working in a dynamic, geographically distributed agile team, where partnership and collaboration are paramount. You possess excellent written and verbal communication skills, strong attention to detail, and the ability to express your work in a clear, cohesive fashion. You are results-oriented and accustomed to tight deadlines and changing priorities. Most importantly, you are constantly thinking of ways to improve performance of software and hardware. THE ROLE: In this role, you will provide our development team Quality support for a library enabling GPU and multicore operations powering AI, LLM, and deep learning applications. You will be responsible for developing and executing comprehensive test strategies for our open-source, C++-based library, leveraging your expertise in test automation, continuous integration, and quality assurance processes. You will work closely with developers to ensure stability, reliability, and performance of the library via both automated tests, as well as hands-on testing. THE PERSON: We are seeking an talented and motivated Developer with a eye for Quality to join our team. If you're passionate about high-quality code and test-driven development, this is an excellent opportunity to make a significant impact.

Requirements

  • Strong background developing applications and libraries in C, C++, and Python
  • GPU software development using HIP, CUDA, or OpenCL
  • Experience with communication middleware
  • Experience with data transfer technologies, such as RDMA, Infiniband, and libfabric
  • Understanding of CPU and GPU architectures and low-level optimization techniques including assembly programming and/or vectorization
  • Parallel programming experience using OpenMP, MPI
  • In-depth knowledge of best-practices in software development, including testing, profiling, debugging, documentation, version control, issue tracking, and planning
  • Contributions to open source libraries and applications
  • Proven experience as an SDET or in a similar role with a focus on C++ development and testing.
  • Strong experience with GoogleTest (gtest) for unit and integration testing in C++ environments.
  • Hands-on experience with Jenkins for automating test execution and integrating tests into the CI/CD pipeline.
  • In-depth knowledge of software testing methodologies, frameworks, and tools for automated testing.
  • Proficiency in C++ programming, with a strong understanding of memory management, data structures, and algorithms.
  • Experience working in Linux-based environments for development and testing.
  • Familiarity with Docker and containerization technologies for managing test environments and ensuring consistent test execution.
  • Familiarity with version control systems like Git, as well as development tools and practices used in open-source communities.
  • Solid understanding of performance testing, including profiling, benchmarking, and analyzing results.
  • Excellent problem-solving skills and a proactive approach to testing and debugging.
  • Strong written and verbal communication skills with the ability to collaborate effectively with both technical and non-technical teams.

Responsibilities

  • Support AMD’s RCCL, an open source, GPU-accelerated communication collective middleware and related technologies
  • Design, implement, and test algorithms for multi-GPU and multi-node communication libraries.
  • Benchmark, profile and optimize code to maximize throughput on single-GPU, multi-GPU and clustered systems
  • Deliver high-quality code and documentation following best practices for open source software development
  • Work with key technical experts across AMD and with our partners and customers to improve ROCm applications, libraries, and tools
  • Test Automation Development: Design, implement, and maintain automated test suites using Google Test (gtest) for an open-source, C++-based library.
  • CI/CD Integration: Integrate test automation frameworks into the Jenkins pipeline, ensuring seamless execution of tests and rapid feedback for developers.
  • Performance Testing: Conduct performance testing to ensure the library meets necessary performance benchmarks and can scale as needed. Investigate performance regressions, and help establish baseline performance tests
  • Bug Detection & Reporting: Identify, isolate, and report defects found during testing and work with developers to prioritize and resolve issues.
  • Continuous Improvement: Continuously improve the test infrastructure and methodologies, proposing tools or techniques that can improve the testability of the codebase.
  • Collaboration & Documentation: Work with cross-functional teams, document test results, and assist in creating user-friendly reports that communicate the quality status of the project.
  • Test Planning: Collaborate with developers and product teams to define test strategies, test cases, and acceptance criteria for new features and enhancements in the library.
  • Code Coverage: Develop and analyze solutions, identify gaps, and drive improvements in both test coverage and quality.

Benefits

  • AMD benefits at a glance.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service