Together AI logo

Senior DevOps Engineer

Together AI

San Francisco, CA
Full Time
Senior
160k-230k
1 day ago

Job Description

About the Role

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey in building the next generation AI infrastructure. We are hiring a talented Senior DevOps Engineer to develop the software and processes for orchestration of AI workloads over large fleets of distributed GPU hardware. In this role, you'll be part of a cloud engineering organization that aims to automate everything and build failure-resistant and horizontally scalable cloud infrastructure for GPU-resident applications. As a Senior DevOps Engineer, you'll build deep understanding of Together AI's services and use that knowledge to optimize and evolve our infrastructure's reliability, availability, serviceability, and profitability. The best applicants for this role are deeply technical, enthusiastic, great collaborators, and intrinsically motivated to deliver high quality infrastructure.

Key Responsibilities

  • Introduce tools to facilitate greater automation and operability of services
  • Design, build, and maintain CI/CD infrastructure
  • Architect, deploy, and scale observability infrastructure
  • Create runtime tools/processes that optimize cloud triaging and limit downtime
  • Define best practices to make our systems and services measurable
  • Work closely with internal teams to ensure best practices are appropriately applied
  • Build tools to help engineering and research teams measure and improve their velocity
  • Analyze and decompose complex software systems
  • Collaborate with and influence others to improve the overall design

Requirements

  • Minimum of 5 years of prior relevant experience in DevOps, cloud computing, data center operations and Linux systems administration
  • Experience in programming in at least one of the following languages: Go, Python, Java, and C++
  • Experience designing and building advanced CI/CD pipeline frameworks
  • Experience with cloud computing toolsets like Terraform, Vault, and Packer
  • Experience with configuration management tools like Ansible, Pulumi, Chef and Puppet
  • Experience with Kubernetes and containerization
  • Strong sense of ownership and desire to build great tools for others

Qualifications

  • Experience in DevOps, cloud computing, data center operations, and Linux systems administration

Benefits & Perks

  • Competitive compensation
  • Startup equity
  • Health insurance
  • Other competitive benefits
  • US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits

Working at Together AI

Together AI is a research-driven artificial intelligence company committed to open and transparent AI systems, innovation, and societal benefit. The company values passionate researchers and engineers working collaboratively to build the next generation AI infrastructure.

Apply Now

Job Details

Posted AtJun 19, 2025
Job CategoryDevOps
Salary160k-230k
Job TypeFull Time
ExperienceSenior

About Together AI

Website

together.ai

Company Size

51-100 employees

Location

San Francisco, CA

Get job alerts

Set up personalized alerts for your job search and get tailored job digests for close matches