About The Position

Duties: Troubleshoot, maintain, identify, escalate, and resolve application issues. Enable telemetry and alerts for complex enterprise applications for proactive monitoring. Develop tools and accelerators to reduce toil and process improvements. Ensure that production changes are made in light of best practices, lifecycle methodology, and overall risk. Partner with multiple teams for applications' performance or functional issues, troubleshooting, infrastructure service support, and change management. Collaborate with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt. Perform site reliability principles and practices every day and adopt site reliability across multiple applications. Lead a team of Site Reliability Engineers (SREs), overseeing the product portfolio focused on reliability, observability, production support, and application maintenance.

Requirements

  • Bachelor's degree in Information Technology, Computer Science, or related field of study plus 5 years of experience in the job offered or as Lead Site Reliability Engineer, Infrastructure Engineer, IT Project Manager, Test Lead, IT Consultant, or related occupation
  • Troubleshooting application issues with distributed IT infrastructure
  • Implementing end to end monitoring using AppDynamics and Dyna Trace
  • Implementing automated solution using python scripts
  • Analyzing Non Functional Requirements to identify targeted response time as Service Level Objectives (SLO)
  • Log Analysis and Visualization using Splunk
  • Performance Testing using Performance Center, JMeter and Blazemeter
  • Performance Testing results analysis and reporting
  • Defect tracking and analysis using Quality Center
  • Amazon Web Services (AWS) platform

Responsibilities

  • Troubleshoot, maintain, identify, escalate, and resolve application issues
  • Enable telemetry and alerts for complex enterprise applications for proactive monitoring
  • Develop tools and accelerators to reduce toil and process improvements
  • Ensure that production changes are made in light of best practices, lifecycle methodology, and overall risk
  • Partner with multiple teams for applications' performance or functional issues, troubleshooting, infrastructure service support, and change management
  • Collaborate with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt
  • Perform site reliability principles and practices every day and adopt site reliability across multiple applications
  • Lead a team of Site Reliability Engineers (SREs), overseeing the product portfolio focused on reliability, observability, production support, and application maintenance

Benefits

  • competitive total rewards package including base salary determined based on the role, experience, skill set and location
  • Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions
  • comprehensive health care coverage
  • on-site health and wellness centers
  • a retirement savings plan
  • backup childcare
  • tuition reimbursement
  • mental health support
  • financial coaching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service