Data Engineering Overview
Ensono Stacks accelerates the generation of production-ready data engineering workloads and pipelines for a data lakehouse. New data engineering workloads can be generated through the Datastacks CLI, while a range of example data workloads and pipelines are also provided. These workloads cover all stages from ingesting data from source, applying data transformations across data lake layers, ultimately enabling end-user data visualisations and analytics.
Lakehouse approach
The Ensono Stacks Data Platform is based upon a Lakehouse architecture. This approach combines the benefits of both data warehouses and data lakes, to provide a platform that is fully scalable, flexible and performant, along with governance and management capabilities. It can support all use cases, from complex machine learning to standard BI reporting and analysis.
Medallion architecture
The data lake structure in Ensono Stacks is based upon the medallion architecture design pattern. The default data lake deployed through Ensono Stacks contains the following data layers:
Data lake layer | Description | Default container name | Data format | Stacks workload type |
---|---|---|---|---|
Bronze | The initial landing area where data is stored as per its original source, prior to any transformations. | raw | Parquet | Data ingest |
Silver | The data has been cleansed, validated and stored in an optimal format to support downstream analytic use-cases. | staging | Delta | Data processing |
Gold | Reliable data entities prepared for specific use-cases. These typically combine and aggregate datasets from the silver layer. | conformance | Delta | Data processing |
Fabric lakehouse
Microsoft Fabric can be used as the lakehouse layer in an Ensono Stacks Data Platform. Fabric provides an all-in-one analytics experience, with a range of tools for data analysts, data engineers and data scientists. It also closely integrates with Power BI. Full details on getting a Fabric lakehouse up and running with your Ensono Stacks Data Platform are in the getting started section.
Stacks Data utilities
The deployed platform utilises the Stacks Data Python library, which provides a range of utilities to enhance the process for developing and deploying production-ready data pipelines and workloads. Central to this is the Datastacks CLI, which enables automatic generation of new data workloads.
Quality assurance
Quality and reliability of data workloads are at the core of Ensono Stacks Data Platform. This is ensured through frameworks for data quality and automated testing.
Sample dataset
The example workloads included in the solution are based upon an example Azure SQL data source and dataset. This data source may optionally be deployed as part of the data platform, to allow full demonstration testing of these example workloads.