Sr Advanced Software Engineer - (DevOps, SRE & AI)

HoneywellAtlanta, GA
2dHybrid

About The Position

We are seeking a highly skilled and experienced Senior Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in DevOps and SRE practices, with at least 5 years of hands-on experience in designing, implementing, and maintaining scalable, reliable, and secure infrastructure for cloud-native applications. You will report directly to the Sr Software Engineering Manager and work out of our Atlanta, GA location on a hybrid work schedule. For the first 90 days, New Hires must be prepared to work 100% onsite M-F.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
  • 5+ years of software engineering experience, with 3+ years in ML Ops, agentic AI, Databricks, data lake, or cloud platforms.
  • Minimum 5 years of experience in DevOps, SRE, or related roles.
  • Strong expertise in cloud platforms (GCP, AWS, Azure).
  • Proficient in infrastructure as code (Terraform, Ansible, CloudFormation).
  • Experience with containerization and orchestration (Docker, Kubernetes).
  • Deep understanding of CI/CD pipelines and automation tools (Jenkins, GitHub Actions, GitLab CI).
  • Strong scripting skills (Python, Bash, Go, etc.).
  • Experience with monitoring, logging, and alerting tools (Prometheus, Grafana, ELK, Datadog).
  • Experience with security best practices and compliance.

Nice To Haves

  • Certifications in cloud platforms or DevOps/SRE.
  • Experience in high-availability, distributed systems.
  • Familiarity with database management and performance tuning.
  • Experience with configuration management tools.
  • Excellent troubleshooting and problem-solving skills.
  • Strong communication and collaboration skills.

Responsibilities

  • Design, build, and maintain scalable infrastructure on cloud platforms (GCP, AWS, Azure).
  • Develop and implement CI/CD pipelines for automated deployment and testing.
  • Monitor, troubleshoot, and optimize system performance, reliability, and availability.
  • Lead incident response, root cause analysis, and post-mortem reviews.
  • Implement and manage infrastructure as code (IaC) using tools such as Terraform, Ansible, or CloudFormation.
  • Develop and maintain observability solutions (monitoring, logging, alerting) using tools like Prometheus, Grafana, ELK, Datadog, etc.
  • Collaborate with development teams to ensure best practices in application reliability, scalability, and security.
  • Automate operational tasks and improve system efficiency through scripting and tooling.
  • Mentor and guide junior engineers in SRE and DevOps practices.
  • Ensure compliance with security standards and participate in audits.

Benefits

  • In addition to a performance-driven salary, cutting-edge work, and developing solutions side-by-side with dedicated experts in their fields, Honeywell employees are eligible for a comprehensive benefits package.
  • This package includes employer-subsidized Medical, Dental, Vision, and Life Insurance; Short-Term and Long-Term Disability; 401(k) match, Flexible Spending Accounts, Health Savings Accounts, EAP, and Educational Assistance; Parental Leave, Paid Time Off (for vacation, personal business, sick time, and parental leave), and 12 Paid Holidays.
  • For more information: https://benefits.honeywell.com/
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service