Ridgeline logo

Senior Staff Software Engineer - Site Reliability Engineering

Ridgeline

San Ramon, CA
Full Time
Senior
200k-250k
15 days ago

Job Description

About the Role

Senior Staff Software Engineer - Site Reliability Engineering at Ridgeline is a strategic, hands-on role responsible for scaling reliability across the company's cloud-native platform. The engineer will design and improve systems like Health Manager, Incident Command, and observability infrastructure, while driving FinOps tooling and AI-assisted automation to reduce operational burden and surface critical insights. This role is central to delivering high-performance, zero-downtime services and empowering product, infrastructure, and customer-facing teams to operate faster without sacrificing reliability. Ridgeline is an industry cloud platform for investment management, founded by Dave Duffield, focused on security, agility, and usability, with offices in Reno, New York, Lake Tahoe, and the Bay Area.

Key Responsibilities

  • Build and evolve systems like Health Manager, Incident Command, and observability platforms that support zero-downtime deployments and operational readiness
  • Partner with development and infrastructure teams to embed reliability into services and processes
  • Participate in the SRE on-call rotation and lead incident response as needed
  • Design metrics, tooling, and workflows that enable zero-downtime deployments, fast detection, and proactive issue resolution
  • Develop and maintain FinOps tooling to drive cost visibility, usage transparency, and financially-informed engineering decisions
  • Lead incident triage and retrospectives with a blameless, data-driven approach
  • Define observability signals that make system health visible, actionable, and reliable
  • Write production-quality code and ship real improvements—measured by impact, not just effort
  • Drive initiatives that reduce risk, increase visibility, or improve operational resilience across services
  • Foster an outcomes-focused team culture through honest communication, clarity, and accountability
  • Think creatively, own problems, seek solutions, and communicate clearly along the way
  • Contribute to a collaborative environment rooted in learning, teaching, and transparency

Requirements

  • 10+ years in a software engineering position or similar function, with experience operating large-scale, mission-critical systems
  • Proficiency in one or more of: Kotlin, Java, JavaScript, Python
  • Experience with observability platforms (e.g., Datadog, Prometheus) and monitoring best practices
  • Strong familiarity with infrastructure-as-code tools (e.g., Terraform, CDKTF) and CI/CD systems
  • Experience leading or participating in incident response and service ownership
  • Experience deploying, monitoring, and maintaining multi-tenant architectures
  • Ability to work effectively across teams and communicate technical concepts with clarity
  • Strong written and verbal communication skills, especially in facilitating incident response and working sessions with service teams
  • Comfortable navigating ambiguity and working toward measurable outcomes
  • Proven ability to balance individual contribution with cross-functional impact
  • Willingness to learn about cutting-edge technologies and cultivate expertise in a business domain/problem space
  • An aptitude for problem solving and effective communication
  • Serious interest in having fun at work

Nice to Have

  • Experience or interest in FinOps, cost-aware system design, or cloud usage optimization
  • Familiarity with AI-assisted tooling or workflows

Qualifications

  • Experience operating large-scale, mission-critical systems
  • Proficiency in programming languages such as Kotlin, Java, JavaScript, or Python

Benefits & Perks

  • Targeted cash compensation of $200,000-$250,000, final amount based on experience
  • Participation in the Company Stock Plan
  • Unlimited vacation
  • Educational and wellness reimbursements
  • $0 cost employee insurance plans
  • Opportunities for career advancement and making a meaningful impact

Working at Ridgeline

Ridgeline is a community-minded, discrimination-free, equal opportunity workplace that values security, agility, and usability. It fosters a fast-growing, people-first environment recognized for innovation and workplace excellence. The company emphasizes honest communication, transparency, collaboration, learning, and fun at work.

Apply Now

Job Details

Posted AtJul 10, 2025
Job CategoryDevOps
Salary200k-250k
Job TypeFull Time
Work ModeHybrid
ExperienceSenior

Job Skills

AI Insights

Key skills identified from this job posting

Sign upto access all insights for this job

About Ridgeline

Website

ridgelineapps.com

Company Size

251-500 employees

Location

San Ramon, CA

Industry

Software Publishers

Get job alerts

Set up personalized alerts for your job search and get tailored job digests for close matches