Ridgeline logo

Senior Staff Software Engineer - Site Reliability Engineering

Ridgeline

New York, NY
Full Time
Senior
200k-250k
1 day ago

Job Description

About the Role

Senior Staff Software Engineer - Site Reliability Engineering at Ridgeline is a strategic, hands-on role responsible for scaling reliability across a cloud-native platform. The engineer will design and improve systems like Health Manager, Incident Command, and observability infrastructure, while driving FinOps tooling and AI-assisted automation to reduce operational burden and surface critical insights. This role is central to delivering high-performance, zero-downtime services and empowering product, infrastructure, and customer-facing teams to operate faster without sacrificing reliability. The position requires work authorization in the United States without employer sponsorship.

Key Responsibilities

  • Build and evolve systems like Health Manager, Incident Command, and observability platforms that support zero-downtime deployments and operational readiness
  • Partner with development and infrastructure teams to embed reliability into services and processes
  • Participate in the SRE on-call rotation and lead incident response as needed
  • Design metrics, tooling, and workflows that enable zero-downtime deployments, fast detection, and proactive issue resolution
  • Develop and maintain FinOps tooling to drive cost visibility, usage transparency, and financially-informed engineering decisions
  • Lead incident triage and retrospectives with a blameless, data-driven approach
  • Define observability signals that make system health visible, actionable, and reliable
  • Write production-quality code and ship real improvements-measured by impact, not just effort
  • Drive initiatives that reduce risk, increase visibility, or improve operational resilience across services
  • Foster an outcomes-focused team culture through honest communication, clarity, and accountability
  • Think creatively, own problems, seek solutions, and communicate clearly along the way
  • Contribute to a collaborative environment rooted in learning, teaching, and transparency

Requirements

  • 10+ years in a software engineering position or similar, with experience operating large-scale, mission-critical systems
  • Proficiency in one or more of: Kotlin, Java, JavaScript, Python
  • Experience with observability platforms (e.g., Datadog, Prometheus) and monitoring best practices
  • Strong familiarity with infrastructure-as-code tools (e.g., Terraform, CDKTF) and CI/CD systems
  • Experience leading or participating in incident response and service ownership
  • Experience deploying, monitoring, and maintaining multi-tenant architectures
  • Ability to work effectively across teams and communicate technical concepts with clarity
  • Strong written and verbal communication skills, especially in facilitating incident response and working sessions with service teams
  • Comfortable navigating ambiguity and working toward measurable outcomes
  • Proven ability to balance individual contribution with cross-functional impact
  • Willingness to learn about cutting-edge technologies and cultivate expertise in a business domain/problem space
  • An aptitude for problem solving and effective communication
  • Serious interest in having fun at work

Nice to Have

  • Experience or interest in FinOps, cost-aware system design, or cloud usage optimization
  • Familiarity with AI-assisted tooling or workflows

Qualifications

  • Experience operating large-scale, mission-critical systems
  • Proficiency in programming languages such as Kotlin, Java, JavaScript, or Python

Benefits & Perks

  • Targeted cash compensation of $200,000-$250,000, final amount based on experience
  • Participation in the Company Stock Plan
  • Unlimited vacation
  • Educational and wellness reimbursements
  • $0 cost employee insurance plans
  • Opportunities for career advancement and making a meaningful impact

Working at Ridgeline

Ridgeline is a community-minded, discrimination-free, equal opportunity workplace. It values security, agility, usability, and innovation, with a focus on building a modern platform in the public cloud for the investment management industry. The company emphasizes a fast-growing, people-first environment recognized for its innovation, workplace culture, and commitment to transparency, learning, and collaboration.

Apply Now

Job Details

Posted AtAug 3, 2025
Job CategoryDevOps
Salary200k-250k
Job TypeFull Time
Work ModeHybrid
ExperienceSenior

Job Skills

AI Insights

Key skills identified from this job posting

Sign upto access all insights for this job

About Ridgeline

Website

ridgelineapps.com

Company Size

251-500 employees

Location

New York, NY

Industry

Software Publishers

Get job alerts

Set up personalized alerts for your job search and get tailored job digests for close matches