Congregation Ohr Tzafon logo

Member of Technical Staff Data Infra

Congregation Ohr Tzafon

San Francisco, CA
Full Time
Senior
150k-425k
10 days ago

Job Description

About the Role

Tzafon is a research firm building scalable compute systems and advancing machine intelligence, with offices in San Francisco & Tel Aviv. We recently raised $9.7m in pre-seed funding to expand the frontiers of machine intelligence. Our team consists of engineers and scientists with deep backgrounds in ML infrastructure & research, including IOI and IMO medalists, PhDs, and alumni from leading tech companies. We train models and build infrastructure for swarms of agents to automate work across real-world environments. This role involves working closely with researchers on collecting and preparing data for training foundation models, developing the data engine that powers our models, and ensuring data quality and diversity.

Key Responsibilities

  • Build and maintain scalable data pipelines for training and fine-tuning LLMs and agent models
  • Create and optimize distributed computing systems for processing web-scale datasets
  • Clean, deduplicate, normalize, and cluster diverse datasets across structured and unstructured sources
  • Design robust pipelines using tools like Spark, BigQuery, DBT, and Airflow
  • Collaborate with researchers and engineers to develop reproducible dataset curation workflows
  • Monitor data quality and build tools for versioning, observability, and auditing
  • Help define what “great data” looks like for real-world intelligent agents
  • Develop and maintain core processing primitives (e.g., tokenization, deduplication, chunking) with a focus on scalability

Requirements

  • 3+ years of full-time experience as a data engineer
  • 6+ years of any software engineering experience, including data engineering
  • Proficiency in Python, Scala, or Java
  • Solid understanding of Spark and ability to write, debug, and optimize Spark code
  • Familiarity with GCP, BigQuery, DBT, Trino, Hex, and other cloud-based data and analytics platforms
  • Experience with ML datasets and data preparation for model training

Nice to Have

  • Experience designing and implementing distributed computing architecture for web-scale data processing
  • Building scalable infrastructure for model training data preparation
  • Creating comprehensive monitoring and alerting systems
  • Optimizing tokenization infrastructure for improved throughput
  • Developing fault-tolerant distributed processing systems
  • Implementing new infrastructure components based on research requirements
  • Building automated testing frameworks for distributed systems

Qualifications

  • Proficiency in Python, Scala, or Java
  • Experience with Spark and cloud-based data platforms
  • Experience with ML datasets and data preparation

Benefits & Perks

  • Full medical, dental, and vision coverage
  • 401(k) retirement plan
  • Office in San Francisco and Tel Aviv
  • Early-stage equity in a future-defining company
  • Visa sponsorship (with efforts to assist successful candidates)
  • Referral bonus of $20,000 for successful hires

Working at Congregation Ohr Tzafon

We are a fast-moving research team dedicated to shaping the quality of intelligence from the ground up, with a focus on innovation, collaboration, and advancing machine learning infrastructure.

Apply Now

Job Details

Posted AtJul 14, 2025
Job CategoryData Engineering
Salary150k-425k
Job TypeFull Time
Work ModeOnsite
ExperienceSenior

Job Skills

AI Insights

Key skills identified from this job posting

Sign upto access all insights for this job

About Congregation Ohr Tzafon

Website

congregationohrtzafon.org

Location

San Francisco, CA

Industry

Religious Organizations

Get job alerts

Set up personalized alerts for your job search and get tailored job digests for close matches