Nvidia logo

Senior Full-Stack Software Engineer

Nvidia

Santa Clara, CA
Full Time
Senior
184k-357k
18 days ago

Job Description

About the Role

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Today, we're at the forefront of AI innovation powering breakthroughs in research, autonomous vehicles, robotics, and more. The DGX Cloud team builds and operates the AI infrastructure that fuels this progress. We're looking for a Senior Full-Stack Software Engineer to join our DGX Cloud AI Infrastructure team and help deliver the next-generation user experience for NVIDIA's GPU clusters and AI infrastructure. In this role, you'll design and build a unified, self-service portal that serves as the front door to our AI compute platform enabling researchers to efficiently manage, monitor, and optimize their use of GPU clusters.

Key Responsibilities

  • Design, develop, and deploy full-stack web applications to support large-scale AI infrastructure operations and workflows
  • Collaborate with AI and ML research teams to identify pain points and deliver tools that accelerate their work
  • Develop APIs, backend services, and UIs to improve visibility, observability, and control over large-scale GPU clusters
  • Develop backend services to manage job schedulers and cluster operations
  • Define and track metrics that measure efficiency, resiliency, and developer productivity across the platform
  • Drive engineering excellence in testing, CI/CD, code quality, and performance
  • Lead architectural discussions and mentor junior engineers on design and implementation
  • Stay ahead of AI/ML infrastructure trends and drive adoption of best practices within the team

Requirements

  • 8+ years of experience in developing software infrastructure for large scale AI systems
  • Bachelor's degree or higher in Computer Science or a related technical field (or equivalent experience)
  • Proficiency with full-stack development: JavaScript (Vue or React), Node.js, Python, and/or Golang, script languages
  • Experience with distributed systems and cloud-native technologies (Docker, Kubernetes, microservices)
  • Familiarity with observability stacks: ELK, OpenSearch, Prometheus, Grafana, or Loki
  • Strong debugging and root cause analysis skills across application and infrastructure layers
  • Experience with large-scale AI training, inference, or data infrastructure services
  • Excellent communication, collaboration, problem solving and a growth mindset

Nice to Have

  • Experience building developer platforms or self-service internal infrastructure tools for efficiency, resiliency, or observability
  • Hands-on experience as a Machine Learning Engineer (MLE) or deep familiarity with DL frameworks (e.g., PyTorch, TensorFlow, JAX, Ray)
  • Hands-on experience operating at datacenter scale, including GPU cluster debugging and root cause analysis
  • Experience with MongoDB, Hadoop, or Spark

Qualifications

  • Educational background in Computer Science or a related field, or equivalent experience

Benefits & Perks

  • Eligible for equity and benefits (https://www.nvidia.com/en-us/benefits/)
  • Base salary range from 184,000 USD to 356,500 USD, determined by location, experience, and similar roles

Working at Nvidia

At NVIDIA, you'll be immersed in a diverse, supportive environment where you're empowered to do your best work. The company is committed to fostering a diverse work environment and is proud to be an equal opportunity employer, valuing diversity in current and future employees.

Apply Now

Job Details

Posted AtJul 10, 2025
Salary184k-357k
Job TypeFull Time
ExperienceSenior

Job Skills

AI Insights

Key skills identified from this job posting

Sign upto access all insights for this job

About Nvidia

Website

nvidia.com

Location

Santa Clara, CA

Industry

Semiconductor and Related Device Manufacturing

Get job alerts

Set up personalized alerts for your job search and get tailored job digests for close matches