Data Engineering Overview

Ensono Stacks accelerates the generation of production-ready data engineering workloads and pipelines for a data lakehouse. New data engineering workloads can be generated through the Datastacks CLI, while a range of example data workloads and pipelines are also provided. These workloads cover all stages from ingesting data from source, applying data transformations across data lake layers, ultimately enabling end-user data visualisations and analytics.

Lakehouse approach

The Ensono Stacks Data Platform is based upon a Lakehouse architecture. This approach combines the benefits of both data warehouses and data lakes, to provide a platform that is fully scalable, flexible and performant, along with governance and management capabilities. It can support all use cases, from complex machine learning to standard BI reporting and analysis.

Medallion architecture

The data lake structure in Ensono Stacks is based upon the medallion architecture design pattern. The default data lake deployed through Ensono Stacks contains the following data layers:

Data lake layer	Description	Default container name	Data format	Stacks workload type
Bronze	The initial landing area where data is stored as per its original source, prior to any transformations.	raw	Parquet	Data ingest
Silver	The data has been cleansed, validated and stored in an optimal format to support downstream analytic use-cases.	staging	Delta	Data processing
Gold	Reliable data entities prepared for specific use-cases. These typically combine and aggregate datasets from the silver layer.	conformance	Delta	Data processing

Fabric lakehouse

Microsoft Fabric can be used as the lakehouse layer in an Ensono Stacks Data Platform. Fabric provides an all-in-one analytics experience, with a range of tools for data analysts, data engineers and data scientists. It also closely integrates with Power BI. Full details on getting a Fabric lakehouse up and running with your Ensono Stacks Data Platform are in the getting started section.

Stacks Data utilities

The deployed platform utilises the Stacks Data Python library, which provides a range of utilities to enhance the process for developing and deploying production-ready data pipelines and workloads. Central to this is the Datastacks CLI, which enables automatic generation of new data workloads.

Quality assurance

Quality and reliability of data workloads are at the core of Ensono Stacks Data Platform. This is ensured through frameworks for data quality and automated testing.

Sample dataset

The example workloads included in the solution are based upon an example Azure SQL data source and dataset. This data source may optionally be deployed as part of the data platform, to allow full demonstration testing of these example workloads.

Lakehouse approach​

Medallion architecture​

Fabric lakehouse​

Stacks Data utilities​

Quality assurance​

Sample dataset​