Data Processing Pipeline Deployment
This section provides an overview of generating a new data processing pipeline workload and deploying it into a Ensono Stacks Data Platform, using the Datastacks CLI.
This guide assumes the following are in place:
- A deployed Ensono Stacks Data Platform
- Development environment set up
- Deployed shared resources
- Data ingested into the bronze layer of the data lake
This process will deploy the following resources into the project:
- Azure Data Factory Pipeline resource (defined in Terraform / ARM)
- Boilerplated script for performing data processing activities using PySpark (Python).
- Azure DevOps CI/CD pipeline (YAML)
- (optional) Spark job and config file for data quality tests (Python)
- Template unit tests (Python)