Data workloads generated in Ensono Stacks feature a framework for testing. Automated tests are essential at every stage of a data pipeline for reasons including:

  • Early detection of issues: Catch errors and issues early in the data pipeline development process.
  • Regression testing: As new code is added, ensure that changes do not introduce regressions or break existing functionality.
  • Support CI/CD: Running automated tests are an essential part of the CI/CD process.
  • Collaboration: A shared set of expectations and a common framework for testing can help collaboration between developers and end users.

In addition to the automated tests below, Ensono Stacks also has a framework for data quality testing.

Unit tests

Unit tests are implemented throughout the Ensono Stacks solution and recommended for all data workloads. Python code is tested using Pytest.

Within a data workload generated through Datastacks, a placeholder for unit tests is found under tests/unit. These tests will be executed as part of its Azure DevOps CI/CD pipeline.

End-to-end tests

Template end-to-end tests are implemented for data workloads in Ensono Stacks. These tests are implemented in a behaviour-driven-development (BDD) style using behave.

This framework allows test cases to be written in a natural language style. Data workloads generated through Datastacks will have placeholder end-to-end BDD tests under tests/end_to_end. The .feature file located here contains the test cases in BDD style. The tests provided perform a number of actions to prepare a test scenario, execute the workload, check the results, and clean up.

Example content of a .feature file is shown below:

Feature:Azure Data Ingest
I want to ingest data
so that it is available in Azure data lake storage

Scenario Outline: Data Factory Ingest SQL Database into ADLS
Given the ADF pipeline ingest_azure_sql_example has been triggered with <parameters>
And I poll the pipeline every 10 seconds until it has completed
Then the ADF pipeline ingest_azure_sql_example has finished with state Succeeded
And the ADF pipeline ingest_azure_sql_example completed in less than 900 seconds
And the files <output_files> are present in the ADLS container raw in the directory ingest_azure_sql_example

Examples: Output files
|{"window_start" : "2010-01-01", "window_end": "2010-01-31"}|["movies.keywords", "movies.keywords_dq", "movies.links", "movies.movies_metadata", "movies.movies_metadata_dq", "movies.ratings_small"]|

The tests will be executed as part of the workload's Azure DevOps CI/CD pipeline. You may extend or change the test cases dependent on the particular requirements of you workload.

Code quality and linting

Code quality and linting checks are also included as part of workload's CI/CD pipeline. This ensures consistent standards are maintained across the project.

Pre-commit is used to assist developers in this process. The .pre-commit-config.yaml file in the project defines the checks and standards used.