Allata - Data Engineer (Databricks + Python + Azure)
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• Design, develop, and maintain scalable data pipelines using Databricks (PySpark) and Python. • Build and optimize ETL/ELT processes within Azure cloud environments. • Implement data models following modern Data Lakehouse principles (e.g., Medallion architecture). • Ensure data quality, consistency, and performance across ingestion, staging, and curated layers. • Collaborate with data architects, analysts, and business stakeholders to translate healthcare data requirements into technical solutions. • Develop reusable data transformation logic and modular processing components. • Support deployment processes following CI/CD and DevOps best practices. • Monitor and optimize data workflows for performance, scalability, and reliability. • Contribute to data governance, security, and compliance practices relevant to healthcare environments. • Current knowledge of an using modern data tools like (Databricks,FiveTran, Data Fabric and others); Core experience with data architecture, data integrations, data warehousing, and ETL/ELT processes • Applied experience with developing and deploying custom whl and or in session notebook scripts for custom execution across parallel executor and worker nodes • Applied experience in SQL, Stored Procedures, and Pysparkbased on area of data platform specialization. • Strong knowledge of cloud and hybrid relational database systems, such as MS SQL Server, PostgresSQL, Oracle, Azure SQL, AWS RDS, Auroraor a comparable engine. • Strong experience with batch and streaming data processing techniques and file compactization strategies. • Strong hands-on experience with Databricks in Azure environments. • Advanced proficiency in Python and PySpark for distributed data processing. • Experience building and optimizing data pipelines in Azure (Azure Data Factory, Azure SQL, Data Lake Storage, etc.). • Solid understanding of data warehousing, data lakehouse concepts, and ETL/ELT frameworks. • Experience working with relational databases such as SQL Server, PostgreSQL, Oracle, or similar. • Knowledge of batch and streaming data processing patterns. • Experience working with large, complex datasets in cloud-based distributed environments. • Strong analytical and problem-solving skills. • Ability to work effectively in cross-functional and distributed teams. • Clear communication skills, with the ability to explain technical concepts to non-technical stakeholders. • Proactive mindset with a strong sense of ownership. • Commitment to delivering high-quality, reliable data solutions.
Responsibilities
• Design, develop, and maintain scalable data pipelines using Databricks (PySpark) and Python. • Build and optimize ETL/ELT processes within Azure cloud environments. • Implement data models following modern Data Lakehouse principles (e.g., Medallion architecture). • Ensure data quality, consistency, and performance across ingestion, staging, and curated layers. • Collaborate with data architects, analysts, and business stakeholders to translate healthcare data requirements into technical solutions. • Develop reusable data transformation logic and modular processing components. • Support deployment processes following CI/CD and DevOps best practices. • Monitor and optimize data workflows for performance, scalability, and reliability. • Contribute to data governance, security, and compliance practices relevant to healthcare environments.