About The Position

AWS Agentic AI is seeking a world-class Science leader with deep expertise in deep learning to help build industry-leading Agentic AI solutions spanning models, systems, and applications. Building on our proven track record with frontier agents like our DevOps Agent and Kiro Autonomous Coding Agent, the Agentic AI organization at AWS is tackling high-risk, high-reward projects grounded in real-world challenges across cloud observability and security. Our research agenda centers on three transformative areas that push the boundaries of what AI agents can accomplish in production environments. First, we're developing Site Reliability Engineering Autonomous Agents that can automatically detect, diagnose, and resolve incidents in production systems. This work advances the state-of-the-art in multi-step planning, reasoning, and the integration of domain-specific knowledge into agent architectures. Second, we're building Proactive Code Repair Agents that leverage diverse signals—including code, logs, runtime data, and telemetry—to identify and fix issues, and even proactively detect problems before they manifest. These agents represent a fundamental shift from reactive to anticipatory software reliability. Third, we're creating Next-Generation Timeseries Foundation Models that enable advanced forecasting, anomaly detection, and multi-modal telemetry analysis across logs, metrics, and traces. These models serve as the cognitive foundation for our agents, enabling them to natively understand and reason about complex telemetry data at scale. This role offers the opportunity to shape the future of autonomous systems in cloud computing, working at the intersection of cutting-edge research and high-impact production applications that serve millions of customers worldwide.

Requirements

  • PhD, or Master's degree in CS, CE, ML or related field experience
  • Experience programming in Java, C++, Python or related language
  • Experience in any of the following areas: AI (NLP or multi-modality models), Deep Learning, LLM model training, graph learning, software engineering.

Nice To Haves

  • 15+ years of relevant, broad research experience after PhD degree or equivalent.
  • First author publications at Tier-1 ML conferences such as NeurIPS, ICLR, ICML, CVPR.

Responsibilities

  • Develop breakthroughs in agentic architecture across areas such as autonomy, self-validation, continuous learning and timeseries foundation models.
  • Mentor and guide junior scientists.
  • Develop technology strategy and roadmap supporting product development
  • Develop prototypes for key technology, oversee the productization of the prototype, and develop critical assessment for the prototype
  • Write technical reports summarizing R&D progresses and communicate such progresses with stakeholders

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Principal

Education Level

Ph.D. or professional degree

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service