Kroger logo

Advanced Site Reliability/DevOps Engineer - (Multiple positions available)

Kroger

Cincinnati, OH
Full Time
Senior
about 1 month ago

Job Description

About the Role

Specialize in developing scalable methods for building, deploying, and supporting cloud, on-prem and store focused enterprise services and systems. Work closely with Software Engineers to deploy and operate solutions, automate and streamline processes, build and maintain tools for deployment, perform monitoring of platform, and troubleshoot and resolve issues in all environments while guiding and mentoring other members on the team. Design and build infrastructure & systems that provide high levels of scalability, reliability, and performance for Kroger's stack, while balancing security, maintainability, reliability and operational excellence. Work with the engineering team to continuously implement and improve reliable and speedy build environments for DEV & QA, provide timely build status updates, and automate as much as possible to improve efficiency and quality. Promote innovation, outside-of-the-box thinking, teamwork, & self-organization. Ensure traceability, observability, and retrievability of system behavior. Build logging, monitoring, and alerting systems to identify bottlenecks and assist with debugging, analysis, and optimization in cloud, on-prem & store environments. Improve operational efficiency through automation and deployment or development of new tools. Experiment with and recommend new technologies that simplify or improve Kroger's stack. Craft solid and clearly explained designs, playbooks, and documentation, for consumption by teammates and the larger engineering organization. Participate in an off-hours on-call rotation, and perform periodic off-hours work during maintenance windows. Duties may be located at any Kroger Co. office throughout U.S. Telecommuting from home office is authorized pursuant to company policy.

Key Responsibilities

  • Develop scalable methods for building, deploying, and supporting enterprise services and systems in cloud, on-prem, and store environments.
  • Work closely with Software Engineers to deploy and operate solutions, automate processes, and build deployment tools.
  • Monitor platform performance and troubleshoot issues across all environments.
  • Design and build infrastructure and systems that ensure high scalability, reliability, and performance.
  • Implement and improve build environments for development and QA, automate processes, and provide build status updates.
  • Build logging, monitoring, and alerting systems to identify bottlenecks and assist with debugging and optimization.
  • Promote innovation and recommend new technologies to improve the stack.
  • Create documentation, playbooks, and designs for team and organizational use.
  • Participate in off-hours on-call rotation and perform maintenance work during scheduled windows.

Requirements

  • Bachelor's Degree in Computer Science or a closely related STEM field plus at least 6 years of experience in cloud Site Reliability Engineering, DevOps, or Infrastructure.
  • OR, a Master's degree in Computer Science or a closely related STEM field plus at least 3 years of experience in cloud Site Reliability Engineering, DevOps, or Infrastructure.
  • 3+ years of experience with message technologies such as Kafka, RabbitMQ, or SQS.
  • 3+ years of experience with infrastructure software tools such as Ansible or Terraform.
  • 3+ years of experience with containerization tools such as Docker or Kubernetes.
  • 3+ years of experience with CI/CD tools such as Jenkins, Spinnaker, Azure DevOps, or TeamCity.
  • 3+ years of experience managing system observability using ELK, Datadog, New Relic, Azure Monitor, or Grafana.
  • 2+ years of experience implementing automation and monitoring using shell scripting and related tools.
  • Experience supporting high-volume web server stacks, Azure/GCP PaaS, and networking, provisioning native Managed Apps & CI/CD pipelines.
  • Experience supporting omni-channel experiences (preferred but not required).

Qualifications

  • Experience with cloud platforms such as Azure or Google Cloud Platform (GCP).
  • Experience with high-volume web server stacks and networking.
  • Experience with provisioning native Managed Apps and CI/CD pipelines.

Working at Kroger

Encourages innovation, outside-the-box thinking, teamwork, and self-organization. Values operational excellence, automation, continuous improvement, and clear documentation. Supports remote work and flexible scheduling within company policy.

Apply Now

Job Details

Posted AtJun 14, 2025
Job CategoryDevOps
SalaryCompetitive salary
Job TypeFull Time
Work ModeHybrid
ExperienceSenior

Job Skills

AI Insights

Key skills identified from this job posting

Sign upto access all insights for this job

About Kroger

Website

kroger.com

Company Size

10000+ employees

Location

Cincinnati, OH

Industry

Supermarkets and Other Grocery (except Convenience) Stores

Get job alerts

Set up personalized alerts for your job search and get tailored job digests for close matches