Responsibilities
- Build Data Ingestion pipelines by using Python/Spark
- Analyze and trouble shoot Data quality and Data discrepancies in the Source System and provide data profiling results.
- Conduct data cleanup address DQ issues for Data Ingestion
- Interpret business questions translate in technical requirement and its underlying data requirements.
- Effectively Interact with QA and UAT team for code testing and migrate to different regions.
- Data Engineer with hands-on experience in PySpark to build data pipes in Azure environments.
- Should be able to understand design documents and independently build the Data Pipes based on the defined Source to Target mappings.
- The candidate should have good exposure to RDBMS and able to convert complex Stored Procedures SQL Triggers etc. logic using PySpark in the Cloud platform.
- The candidate should be open to learn new technologies and implement solutions quickly in the cloud platform.
Requirements:
- At least over 4 years of experience in Azure development.
- Good understanding of Spark architecture with Databricks and SparkSQL.
- Having PySpark exposure, or intent to learn it.
- Experience in building ETL/ Data Warehouse transformation process.
- Very good knowledge of SQL.
- Hands-on experience designing and delivering solutions using Azure DataLake.
- Direct experience in building data pipelines using Azure Data Factory
- Experience working with structured and unstructured data.
- Experience working in an Agile environment (preferably Scrum).
- Experience in cloud-based solutions.
- Knowledge of data management principles.
- Excellent communication and interpersonal skills.