Meta
In this role, you will be a member of the Network AI Software team and part of the bigger DC networking organization. The team develops and owns the software stack around collective communication libraries around Meta. At the high level, the team aims to enable Meta-wide ML products and innovations to leverage our large-scale training and inference fleet through an observable, reliable and high-performance distributed AI communication stack. Currently, one of the team's focus is on building customized features, SW benchmarks, performance tuners and SW stacks around PyTorch to improve the full-stack distributed ML reliability and performance (e.g. Large-Scale GenAI/LLM training) from the trainer down to the network communication layer. And we are seeking for leaders to work on the space of GenAI/LLM scaling reliability and performance.
Key skills identified from this job posting
Sign upto access all insights for this job
Website
meta.com
Company Size
10000+ employees
Location
New York, NY
Industry
Media Streaming Distribution Services, Social Networks, and Other Media Networks and Content Providers
Other opportunities you might be interested in
Actalent
Boeing
Infosys
Adobe
Adobe