Sr Advanced Software Engineer - (DevOps, SRE & AI)

Honeywell•Atlanta, GA

2d•Hybrid

About The Position

We are seeking a highly skilled and experienced Senior Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in DevOps and SRE practices, with at least 5 years of hands-on experience in designing, implementing, and maintaining scalable, reliable, and secure infrastructure for cloud-native applications. You will report directly to the Sr Software Engineering Manager and work out of our Atlanta, GA location on a hybrid work schedule. For the first 90 days, New Hires must be prepared to work 100% onsite M-F.

Requirements

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
5+ years of software engineering experience, with 3+ years in ML Ops, agentic AI, Databricks, data lake, or cloud platforms.
Minimum 5 years of experience in DevOps, SRE, or related roles.
Strong expertise in cloud platforms (GCP, AWS, Azure).
Proficient in infrastructure as code (Terraform, Ansible, CloudFormation).
Experience with containerization and orchestration (Docker, Kubernetes).
Deep understanding of CI/CD pipelines and automation tools (Jenkins, GitHub Actions, GitLab CI).
Strong scripting skills (Python, Bash, Go, etc.).
Experience with monitoring, logging, and alerting tools (Prometheus, Grafana, ELK, Datadog).
Experience with security best practices and compliance.

Nice To Haves

Certifications in cloud platforms or DevOps/SRE.
Experience in high-availability, distributed systems.
Familiarity with database management and performance tuning.
Experience with configuration management tools.
Excellent troubleshooting and problem-solving skills.
Strong communication and collaboration skills.

Responsibilities

Design, build, and maintain scalable infrastructure on cloud platforms (GCP, AWS, Azure).
Develop and implement CI/CD pipelines for automated deployment and testing.
Monitor, troubleshoot, and optimize system performance, reliability, and availability.
Lead incident response, root cause analysis, and post-mortem reviews.
Implement and manage infrastructure as code (IaC) using tools such as Terraform, Ansible, or CloudFormation.
Develop and maintain observability solutions (monitoring, logging, alerting) using tools like Prometheus, Grafana, ELK, Datadog, etc.
Collaborate with development teams to ensure best practices in application reliability, scalability, and security.
Automate operational tasks and improve system efficiency through scripting and tooling.
Mentor and guide junior engineers in SRE and DevOps practices.
Ensure compliance with security standards and participate in audits.

Benefits

In addition to a performance-driven salary, cutting-edge work, and developing solutions side-by-side with dedicated experts in their fields, Honeywell employees are eligible for a comprehensive benefits package.
This package includes employer-subsidized Medical, Dental, Vision, and Life Insurance; Short-Term and Long-Term Disability; 401(k) match, Flexible Spending Accounts, Health Savings Accounts, EAP, and Educational Assistance; Parental Leave, Paid Time Off (for vacation, personal business, sick time, and parental leave), and 12 Paid Holidays.
For more information: https://benefits.honeywell.com/

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume