Spar Information Systems logo

AIops Engineer

Spar Information Systems

Frisco, TX
Full Time
Senior
23 days ago

Job Description

About the Role

The AIOps Engineer is responsible for integrating machine learning and advanced analytics into existing monitoring and logging systems. This role leverages artificial intelligence to automate routine operational tasks, detect anomalies proactively, and implement self-healing frameworks to enhance the stability and performance of the infrastructure. The ideal candidate will be proactive in identifying gaps, creating strategic roadmaps, and implementing phased improvements to achieve operational excellence.

Key Responsibilities

  • Apply machine learning algorithms to existing operational data (logs, metrics, events) to predict system failures and proactively address potential incidents.
  • Implement automation for routine DevOps practices including automated scaling, resource optimization, and controlled restarts.
  • Develop and maintain self-healing systems to reduce manual intervention and enhance system reliability.
  • Build anomaly detection models to quickly identify and address unusual operational patterns.
  • Collaborate closely with SREs, developers, and infrastructure teams to continuously enhance the operational stability and performance of the system.
  • Provide insights and improvements through visualizations and reports leveraging AI-driven analytics.
  • Create a phased roadmap to incrementally enhance operational capabilities and align with strategic business goals.

Requirements

  • Strong experience with AI/ML frameworks and tools (e.g., TensorFlow, PyTorch, scikit-learn).
  • Proficiency in data processing and analytics tools (e.g., Splunk, Prometheus, Grafana, ELK stack).
  • Solid background in scripting and automation (Python, Bash, Ansible, etc.).
  • Experience with cloud environments and infrastructure automation.
  • Proven track record in implementing proactive monitoring, anomaly detection, and self-healing techniques.
  • Excellent analytical, problem-solving, and strategic planning skills.
  • Strong communication skills and the ability to effectively collaborate across teams.

Nice to Have

  • Background in DevOps/Site Reliability Engineering.
  • Familiarity with containerization and orchestration platforms (Kubernetes, Docker).
  • Experience in building scalable, distributed systems.
Apply Now

Job Details

Posted AtJul 14, 2025
Job CategoryDevOps
SalaryCompetitive salary
Job TypeFull Time
ExperienceSenior

Job Skills

AI Insights

Key skills identified from this job posting

Sign upto access all insights for this job

About Spar Information Systems

Website

sparinfosys.com

Location

Frisco, TX

Industry

Computer Systems Design Services

Get job alerts

Set up personalized alerts for your job search and get tailored job digests for close matches