Data Center Reliability Engineer

OracleAbilene, TX
1dOnsite

About The Position

As a Reliability Engineer, you will apply data-driven analysis and engineering problem-solving to improve availability and reduce risk across mission-critical facility systems. You will identify failure patterns early, drive corrective actions, and build tooling and metrics that improve reliability at scale. This role manages ongoing critical environment maintenance by completing standard diagnostics and repairs and resolving issues. Manages incidents impacting services and conducts root cause analysis to mitigate recurrence and improve system resilience. Conducts data center build site reviews and assessments in collaboration with other teams to evaluate suitability for data center builds. Supports and validates on-site data centers operations in relation to the electrical or mechanical infrastructure. Coordinates with internal and external project team members in delivering specific aspects of data centers or part-data centers for Oracle.

Requirements

  • Experience in reliability or systems analysis in data centers or other uptime-critical environments (utilities, telecom, manufacturing).
  • Engineering degree or equivalent applied experience; comfort with data and tooling is required for this to be real.
  • Strong analytical and visualization skills; disciplined technical documentation.
  • Able to influence outcomes through evidence, clarity, and structured thinking.

Responsibilities

  • Monitor and analyze operational telemetry, alarms, and performance trends to identify emerging risks and reliability degradation.
  • Define and track reliability KPIs; deliver concise analysis and recommendations that drive operational and engineering decisions.
  • Develop and maintain analytics and reporting tools using Python, SQL, and/or DCIM/BMS/SCADA data sources.
  • Support and/or lead RCAs and corrective action tracking for recurring or high-impact issues, ensuring follow-through and verification.
  • Partner with operations and engineering teams to improve preventive strategies, automation opportunities, and compliance execution.
  • Contribute to reliability standards and documentation that improve repeatability across sites.

Benefits

  • Medical, dental, and vision insurance, including expert medical opinion
  • Short term disability and long term disability
  • Life insurance and AD&D
  • Supplemental life insurance (Employee/Spouse/Child)
  • Health care and dependent care Flexible Spending Accounts
  • Pre-tax commuter and parking benefits
  • 401(k) Savings and Investment Plan with company match
  • Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
  • 11 paid holidays
  • Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
  • Paid parental leave
  • Adoption assistance
  • Employee Stock Purchase Plan
  • Financial planning and group legal
  • Voluntary benefits including auto, homeowner and pet insurance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service