Skip to main content

Prerequisites

Local development

The following tools are recommended for developing while using the Ensono Stacks data solution:

ToolNotes
Python 3.9 - 3.11Use of Python 3.12+ is not currently supported. You may wish to use a utility such as pyenv to manage your local versions of Python.
PoetryUsed for Python dependency management in Stacks.
(Windows users) a Linux distribution, e.g. WSLA Unix-based environment is recommended for developing the solution (e.g. macOS, Linux, or WSL for Windows users).
Java 8/11/17 runtimeOptional: Java is required to develop and run tests using PySpark locally - see Spark documentation.
Azure CLIOptional: Azure CLI allows you to interact with Azure resources locally, including running end-to-end tests.

See development quickstart for further details on getting start with developing the solution.

Git repository

A remote Git repository is required for storing and managing a data project's code. This can be in either GitHub or Azure DevOps. When scaffolding a new data project, you will need the HTTPS URL of the repo.

While Ensono Stacks supports storing code in both GitHub and Azure DevOps, it does not currently support CI/CD pipelines using GitHub Actions. Requirements for Azure DevOps are detailed in the CI/DC - Azure DevOps section below.

The examples and quickstart documentation assume that main is the primary branch in the repo.

Azure subscription

In order to deploy an Ensono Stacks Data Platform into Azure, you will need:

  • One or more Azure subscriptions – for deploying the solution into
  • Azure service principal (Application) – must have Contributor access to deploy and configure all required resources into the target subscription(s)

Terraform state storage

Deployment of Azure resources in Ensono Stacks is done through Terraform. Within your Azure subscription, you must provision a storage container to hold Terraform state data. Details regarding this storage are required when you first scaffold the project using the Ensono Stacks CLI. Therefore, once you have provisioned the storage container, make note of the following:

  • Storage account name
  • Resource group name
  • Container name

CI/CD - Azure DevOps

CI/CD processes within the Ensono Stacks Data Platform are currently designed to be run in Azure DevOps Pipelines1. Therefore, it is a requirement to create a project in Azure DevOps.

Azure Pipelines variable groups

Our blueprint solution expects the following variable groups to exist in your Azure DevOps project's Pipelines Library:

  • amido-stacks-de-pipeline-env
  • amido-stacks-euw-de-env-network
  • stacks-credentials-env-kv

Where env can be either nonprod or prod.

Please refer to the following lists to identify the necessary variables for your project. The specifics regarding when each variable is required have also been provided. Generally, the variables fall into one of two categories based on the time of requirement: 'Project Start', denoting variables required at the very outset of the project, and 'After Core Infrastructure Deployment', referring to variables required after the fundamental infrastructure has been deployed.

Networking variables

The variables under amido-stacks-euw-de-env-network are only required if you want to provision the infrastructure within a private network.

amido-stacks-de-pipeline-env
Variable NameWhen NeededDescription
ADLS_DataLake_URLAfter core infraAzure Data Lake Storage Gen2 URL
blob_adls_storageAfter core infraAzure Data Lake Storage Gen2 name
blob_configStorageAfter core infraBlob storage name
Blob_ConfigStore_serviceEndpointAfter core infraBlob service URL
databricksHostAfter core infraDatabricks URL
databricksWorkspaceResourceIdAfter core infraDatabricks workspace resource id
datafactorynameAfter core infraAzure Data Factory name
github_tokenAfter core infraGitHub PAT token, see below for more details
integration_runtime_nameAfter core infraAzure Data Factory integration runtime name
KeyVault_baseURLAfter core infraVault URI
keyvault_nameAfter core infraKey Vault name
locationProject startAzure region
resource_groupProject startName of the resource group
sql_connectionAfter core infraConnection string to Azure SQL database
amido-stacks-euw-de-env-network
Variable NameWhen NeededDescription
databricks_private_subnet_nameProject startName of the private databricks subnet
databricks_public_subnet_nameProject startName of the public databricks subnet
pe_resource_group_nameProject startName of the resource group to provision private VNet to
pe_subnet_nameProject startName of the subnet to provision private endpoints into
pe_subnet_prefixProject startSubnet CIDR, e.g. ["10.3.1.0/24"]
pe_vnet_nameProject startPrivate VNet name
private_subnet_prefixProject startSubnet CIDR, e.g. ["10.3.4.0/24"]
public_subnet_prefixProject startSubnet CIDR, e.g. ["10.3.3.0/24"]
stacks-credentials-env-kv
Variable NameWhen NeededDescription
azure-client-idProject startApplication ID for Azure Active Directory application
azure-client-secretProject startService principal secret
azure-subscription-idProject startSubscription ID
azure-tenant-idProject startDirectory ID for Azure Active Directory application

Github PAT Token

Within the pipelines we use an Azure DevOps task called UsePythonVersion@0 which allows us to install a specific version of Python onto the build agent. If the Python version doesn't exist on the build agent, it will download it from Github Actions however this requires a Github PAT Token otherwise you may hit by a GitHub anonymous download limit. You can create a token by following this guide. You do not require any permissions on this token because GitHub only needs to read your public profile.

Azure Pipelines Service Connections

Service Connections are used in Azure DevOps Pipelines to connect to external services, like Azure and GitHub. You must create the following Service Connections:

NameWhen NeededDescription
Stacks.Pipeline.BuildsProject startThe Service Connection to Azure. The service principal or managed identity that is used to create the connection must have contributor access to the Azure Subscription. See here for more information.
GitHubReleasesProject startThe Service Connection to Github for releases. The access token that is used to create the connection must have read/write access to the GitHub repository. See here for more information.

Footnotes

  1. More general information on using Azure Pipelines in Stacks is also available.