The Home Depot logo

Staff Software Engineer - Reliability Engineer (Remote)

The Home Depot

Atlanta, GA
Full Time
Senior
about 1 month ago

Job Description

About the Role

The Staff Reliability Engineer is responsible for leading a team of engineers focused on ensuring the reliability, availability, and performance for the Customer Order Management Operations team. They will build and grow technical and leadership skills while creating, deploying, and supporting production systems. The role involves assisting in tool selection, configuration, security, resilience, performance tuning, and production monitoring, as well as contributing to foundational infrastructure and system documentation.

Key Responsibilities

  • Develop, test, deploy, and maintain software with a clear understanding of its value; achieve results even under tough circumstances.
  • Create test suites (functional, destructive, etc) to enable success and rapid deployment of code to production.
  • Seek opportunities for growth and learning through formal and informal channels, learning from both successes and failures.
  • Collaborate with the Product Team to ensure user stories are developer ready, easy to understand, and testable.
  • Communicate effectively with diverse groups and adapt approaches to meet shifting demands.
  • Support and assist product and engineering teams by fielding questions and providing guidance on modern software development frameworks.
  • Help grow junior engineers by providing guidance and leading technical discussions.
  • Identify gaps within the team and suggest improvements to enhance productivity.

Requirements

  • Must be eighteen years of age or older and legally permitted to work in the United States.
  • Minimum of 3 years of relevant work experience.
  • Experience with infrastructure automation tools such as Terraform, Ansible, or Chef.
  • Experience architecting solutions in Google Cloud, AWS, or similar cloud platforms.
  • Experience with monitoring and observability tools like Prometheus, Grafana, or Datadog.
  • Extensive experience and competence in core Reliability Engineering Principles and Practices.
  • Familiarity with Unix and Windows operating systems.
  • Experience with security frameworks for user and services authorization and authentication.
  • Experience in creating and executing unit, functional, destructive, and performance tests.
  • Experience with modern debugging and root cause analysis techniques in distributed systems.
  • Experience with version control systems.
  • Experience designing systems for High Availability, Disaster Recovery, Performance, Efficiency, and Security.
  • Operational support experience focused on system reliability.
  • Ability to lead teams and share knowledge across engineering functions.
  • Experience creating Standard Operating Procedures (SOPs) and collaborating with Principal Engineering.

Nice to Have

  • Extensive experience with infrastructure automation tools such as Terraform, Ansible, or Chef.
  • Experience architecting solutions in Google Cloud, AWS, or similar cloud platforms.
  • Experience with monitoring and observability tools like Prometheus, Grafana, or Datadog.
  • Experience and competence in core Reliability Engineering Principles and Practices.
  • Familiarity with both Unix and Windows operating systems.
  • Experience with security frameworks for user and services authorization and authentication.
  • Experience in creating and executing various types of tests (unit, functional, destructive, performance).
  • Experience with modern debugging and root cause analysis techniques in distributed systems.
  • Experience with version control systems.
  • Designing systems for High Availability, Disaster Recovery, Performance, Efficiency, and Security.
  • Operational support experience with a focus on system reliability.
  • Ability to lead teams and share knowledge across engineering functions.
  • Experience creating SOPs and collaborating with Principal Engineering.

Qualifications

  • The knowledge, skills and abilities typically acquired through the completion of a bachelor's degree program or equivalent degree in a related field.

Benefits & Perks

  • No travel required.

Working at The Home Depot

The role involves working in a dynamic team environment with engineers of all experience levels who support each other's growth. The position emphasizes collaboration, continuous learning, technical excellence, and fostering stability and continuous improvement across engineering functions.

Apply Now

Job Details

Posted AtJun 25, 2025
Job CategoryQA Engineering
SalaryCompetitive salary
Job TypeFull Time
Work ModeRemote
ExperienceSenior

Job Skills

AI Insights

Key skills identified from this job posting

Sign upto access all insights for this job

About The Home Depot

Website

homedepot.com

Company Size

10000+ employees

Location

Atlanta, GA

Industry

Home Centers

Get job alerts

Set up personalized alerts for your job search and get tailored job digests for close matches