Skip to main content

Data Engineering Overview

Ensono Stacks accelerates the generation of production-ready data engineering workloads and pipelines for a data lakehouse. New data engineering workloads can be generated through the Datastacks CLI, while a range of example data workloads and pipelines are also provided. These workloads cover all stages from ingesting data from source, applying data transformations across data lake layers, ultimately enabling end-user data visualisations and analytics.

Lakehouse approach

The Ensono Stacks Data Platform is based upon a Lakehouse architecture. This approach combines the benefits of both data warehouses and data lakes, to provide a platform that is fully scalable, flexible and performant, along with governance and management capabilities. It can support all use cases, from complex machine learning to standard BI reporting and analysis.

Medallion architecture

The data lake structure in Ensono Stacks is based upon the medallion architecture design pattern. The default data lake deployed through Ensono Stacks contains the following data layers:

Data lake layerDescriptionDefault container nameData formatStacks workload type
BronzeThe initial landing area where data is stored as per its original source, prior to any transformations.rawParquetData ingest
SilverThe data has been cleansed, validated and stored in an optimal format to support downstream analytic use-cases.stagingDeltaData processing
GoldReliable data entities prepared for specific use-cases. These typically combine and aggregate datasets from the silver layer.conformanceDeltaData processing

Fabric lakehouse

Microsoft Fabric can be used as the lakehouse layer in an Ensono Stacks Data Platform. Fabric provides an all-in-one analytics experience, with a range of tools for data analysts, data engineers and data scientists. It also closely integrates with Power BI. Full details on getting a Fabric lakehouse up and running with your Ensono Stacks Data Platform are in the getting started section.

Stacks Data utilities

The deployed platform utilises the Stacks Data Python library, which provides a range of utilities to enhance the process for developing and deploying production-ready data pipelines and workloads. Central to this is the Datastacks CLI, which enables automatic generation of new data workloads.

Quality assurance

Quality and reliability of data workloads are at the core of Ensono Stacks Data Platform. This is ensured through frameworks for data quality and automated testing.

Sample dataset

The example workloads included in the solution are based upon an example Azure SQL data source and dataset. This data source may optionally be deployed as part of the data platform, to allow full demonstration testing of these example workloads.