The University of Texas M.D. Anderson Cancer Center logo

Senior Data Engineer - Data Impact & Governance

The University of Texas M.D. Anderson Cancer Center

Houston, TX
Full Time
Senior
123k-185k
12 days ago

Job Description

About the Role

The University of Texas M. D. Anderson Cancer Center is dedicated to eliminating cancer through outstanding programs that integrate patient care, research, prevention, and education. The role of Senior Data Engineer involves leading enterprise-scale AI/ML data engineering efforts, architecting and optimizing data pipelines, and collaborating with healthcare stakeholders to integrate advanced AI/ML capabilities into healthcare environments. The position focuses on turning data into lasting impact for patients by supporting AI applications in clinical and business operations.

Key Responsibilities

  • Build and scale AI/ML data pipelines by designing, implementing, and maintaining batch and streaming data pipelines supporting ML training, deployment, inference, and monitoring using Azure, Dataiku, and open-source tools.
  • Deploy and manage raw data, feature, and vector stores to enable fast, reliable access to feature data for production AI/ML systems.
  • Use Infrastructure as Code (IaC) and CI/CD workflows to automate infrastructure and pipeline deployments, improving reliability and efficiency across environments.
  • Implement validation, lineage, anomaly detection, and drift monitoring to ensure data quality, freshness, accuracy, and compliance.
  • Enforce security and compliance measures such as encryption, RBAC, tokenization, and audit logging to meet HIPAA/HITRUST standards while enabling scalable AI operations.
  • Partner with healthcare data engineers, ML engineers, data scientists, product teams, and application owners to deliver scalable AI solutions, providing mentorship and fostering best practices.
  • Own and operate pipelines and infrastructure end-to-end, including monitoring, alerting, incident management, and continuous improvement.

Requirements

  • Bachelor's degree in a relevant field.
  • Five years of relevant information technology experience (or three years with a master's degree).
  • Proficiency in Python and SQL for large-scale data engineering, and Spark for distributed processing.
  • Experience with Azure cloud services (Fabric OneLake, Synapse, blob storage) and on-prem RDBMS, NEO4J, Mongo.
  • Knowledge of pipeline orchestration tools such as Airflow and Dataiku, and high-volume processing with Spark/Fabric.
  • Experience with feature and vector stores like Feast, Pinecone, PGVector, Azure Feature Store.
  • Familiarity with data integration and streaming technologies such as CDC, Kafka, Event Hubs, HL7, FHIR, DICOM.
  • Deployment automation skills using Terraform, Bicep, Helm, GitHub Actions, and Azure DevOps.
  • Monitoring and observability expertise including data validation, lineage, anomaly detection, and pipeline monitoring.
  • Understanding of security and compliance practices including encryption, RBAC, audit logging, HIPAA/HITRUST.
  • Experience with exploratory and descriptive analytics, data profiling, statistical inference, and trend analysis.
  • Deep understanding of healthcare datasets, standards (HL7, FHIR, DICOM), and experience with extraction, normalization, and de-identification for analytics.
  • Strong communication skills for collaborating with research scientists, ML engineers, and stakeholders, and documenting processes.

Nice to Have

  • Master's level degree.
  • Epic Data Model certification (Clinical, Access, or Revenue) within 180 days.
  • Azure Data Engineer Associate (DP-203).
  • EPIC Cogito Certification.
  • HIPAA Privacy & Security Certification.
  • HL7/FHIR Certification.
  • Experience in a Senior Data Scientist role in AI/ML space.
  • Knowledge of data privacy, security, and HIPAA compliance in healthcare.

Qualifications

  • Bachelor's degree required.
  • Preferred Master's degree.
  • Minimum of five years of relevant IT experience; three years if holding a master's degree.

Benefits & Perks

  • Salary range from USD 123,000 to USD 185,000 depending on experience.
  • Full-time, regular employment with workweek during days.
  • Remote work within Texas.
  • Referral bonus and relocation assistance available.
  • Pivotal position with opportunities for impact in healthcare AI.

Working at The University of Texas M.D. Anderson Cancer Center

The University of Texas M. D. Anderson Cancer Center values diversity and is committed to providing equal employment opportunities regardless of race, color, religion, age, national origin, sex, gender, sexual orientation, gender identity/expression, disability, veteran status, genetic information, or other protected categories. The organization fosters a high-performance, learning-oriented culture with a focus on innovation in healthcare and research.

Apply Now

Job Details

Posted AtJul 14, 2025
Job CategoryData Engineering
Salary123k-185k
Job TypeFull Time
Work ModeRemote
ExperienceSenior

Job Skills

AI Insights

Key skills identified from this job posting

Sign upto access all insights for this job

About The University of Texas M.D. Anderson Cancer Center

Website

mdanderson.org

Company Size

10000+ employees

Location

Houston, TX

Industry

Offices of Physicians

Get job alerts

Set up personalized alerts for your job search and get tailored job digests for close matches