Lead Data Engineer at McKesson

Job Description

About the Role

McKesson is an impact-driven, Fortune 10 company that touches virtually every aspect of healthcare. We are known for delivering insights, products, and services that make quality care more accessible and affordable. Here, we focus on the health, happiness, and well-being of you and those we serve - we care. We foster a culture where you can grow, make an impact, and are empowered to bring new ideas. Together, we thrive as we shape the future of health for patients, our communities, and our people. If you want to be part of tomorrow's health today, we want to hear from you.

Key Responsibilities

Lead the design and development of enterprise data assets including data models, feature stores, and analytical datasets using Azure modern data stack
Architect scalable data pipelines and ETL/ELT processes leveraging Azure Data Factory, Azure Synapse Analytics, and Azure Data Lake Storage
Implement advanced data processing solutions using Apache Spark on Azure Databricks for large-scale data transformation and analytics
Develop reusable data frameworks and libraries in Python to accelerate data asset creation and ensure consistency across the organization
Establish data asset governance including versioning, lineage tracking, and quality monitoring to ensure enterprise-grade reliability
Lead and mentor a team of 8-12 data engineers focused on data asset development and optimization
Provide technical guidance on complex data engineering challenges, architectural decisions, and best practices
Foster collaborative development environment emphasizing code quality, testing, and continuous improvement
Drive knowledge sharing initiatives and technical training to elevate team capabilities in modern data engineering practices
Collaborate with cross-functional teams including Data Science, Analytics, and Business Intelligence to deliver integrated data solutions
Optimize Azure Synapse Analytics workflows for high-performance data processing and analytical workloads
Implement efficient data storage strategies using Azure Data Lake Storage Gen2 with appropriate partitioning and compression techniques
Leverage Azure Data Factory for orchestrating complex data workflows and managing data pipeline dependencies
Utilize Azure Databricks for advanced Spark-based data processing, machine learning pipelines, and real-time analytics
Integrate with Azure services including Cosmos DB, Event Hubs, and Service Bus for comprehensive data ecosystem solutions
Develop sophisticated data processing applications using Python with emphasis on performance, scalability, and maintainability
Implement advanced Spark programming techniques including RDD operations, DataFrame API, and Spark SQL for optimal data processing
Leverage PySpark for large-scale data transformations, aggregations, and complex analytical computations
Utilize Spark Streaming for real-time data processing and event-driven analytics solutions
Implement Delta Lake patterns for reliable data lakes with ACID transactions and time travel capabilities
Establish comprehensive data quality frameworks including validation, profiling, and anomaly detection
Implement performance monitoring and optimization strategies for data pipelines and processing workflows
Design and implement data testing strategies including unit testing, integration testing, and data validation
Optimize Spark jobs for cost efficiency and performance including cluster sizing, caching strategies, and partition optimization
Ensure data asset documentation, metadata management, and knowledge transfer processes

Requirements

Degree or equivalent with typically 10+ years of relevant experience; fewer years if holding relevant Master's or Doctorate qualifications
Expert-level proficiency in Python programming including advanced libraries (Pandas, NumPy, Scikit-learn, PyTorch/TensorFlow)
Deep expertise in Apache Spark ecosystem including Spark Core, Spark SQL, PySpark, and Spark Streaming
Extensive experience with Microsoft Azure data services including Azure Data Factory, Azure Synapse Analytics, Azure Data Lake Storage, and Azure Databricks
Strong background in SQL and database technologies including data modeling, query optimization, and performance tuning
Proficiency in version control systems (Git), CI/CD pipelines, and infrastructure-as-code practices
Proven experience leading technical teams of 5+ data engineers with focus on mentorship and skill development
Strong project management skills with ability to deliver complex data engineering projects on time and within scope
Advanced Python programming with focus on data processing, analysis, and pipeline development
Experience with batch processing, real-time data processing, streaming analytics, and event-driven architectures
Knowledge of data governance, data quality, and metadata management best practices

Nice to Have

Azure certifications including Azure Data Engineer Associate or Azure Solutions Architect Expert
Databricks certifications (Spark Developer, Data Engineer Professional)
Experience with additional Azure services including Azure Machine Learning, Azure Cognitive Services, and Azure Functions
Knowledge of container technologies (Docker, Kubernetes) and serverless computing patterns
Understanding of data security, privacy, and compliance requirements in enterprise environments
Deep understanding of data architecture patterns including data lakes, data warehouses, and modern data platform design
Knowledge of async programming, multiprocessing, and performance optimization techniques
Familiarity with testing frameworks (pytest, unittest) and code quality tools (Black, Flake8, MyPy)
Advanced Azure Synapse Analytics usage including dedicated SQL pools, serverless SQL, and Spark pools

Qualifications

Degree or equivalent with typically 10+ years of relevant experience; fewer years if holding relevant Master's or Doctorate qualifications

Benefits & Perks

Competitive compensation package including base salary, performance bonus, and equity participation
Comprehensive benefits including health, dental, vision, and retirement planning
Professional development opportunities including training, certifications, and conference attendance
Opportunity to work with cutting-edge data technologies at Fortune 10 scale
Collaborative culture emphasizing technical excellence, innovation, and continuous learning
Clear career advancement path within our growing data engineering organization

Lead Data Engineer

Job Description

About the Role

Key Responsibilities

Requirements

Nice to Have

Qualifications

Benefits & Perks

Working at McKesson

Job Details

Job Skills

About McKesson

Get job alerts

Similar Jobs

Staff Software Engineer, Data

Data & AI Software Engineer

Data Engineer 2

Azure Data Engineer with Python & SQL Experience

Lead Data Engineer

Data Architect Python