0 / 0
Microsoft Azure Data Lake Storage connection
Last updated: Nov 27, 2024
Microsoft Azure Data Lake Storage connection

To access your data in Microsoft Azure Data Lake Storage, create a connection asset for it.

Azure Data Lake Storage (ADLS) is a scalable data storage and analytics service that is hosted in Azure, Microsoft's public cloud. The Microsoft Azure Data Lake Storage connection supports access to both Gen1 and Gen2 Azure Data Lake Storage repositories.

Create a connection to Microsoft Azure Data Lake Storage

To create the connection asset, you need these connection details:

Note: Prerequisite for Entra ID authentication:

Microsoft Entra ID is a cloud-based identity and access management service. To obtain connection values for the Entra ID authentication method, sign in to the Microsoft Azure portal and go to your storage account. For information about Microsoft Entra ID, see What is Microsoft Entra ID?.

Entra ID client secret credential

  • Tenant ID: The Microsoft Entra tenant ID. To find the Tenant ID, go to Microsoft Entra ID> Properties. Scroll down to the Tenant ID field. For more information, see How to find your Microsoft Entra tenant ID.
  • Client ID: The client ID for authorizing access to Microsoft Azure Data Lake Storage. To find the Client ID for your application, select Microsoft Entra ID. From App registrations, select your application. Click Copy to copy the Client ID of your application. For more information, see Register a Microsoft Entra app and create a service principal.
  • Client secret: The authentication key that is associated with the client ID for authorizing access to Microsoft Azure Data Lake Storage. To find the Client secret for your application, select Microsoft Entra ID. From App registrations, select your application. Go to Certificates & secrets > Client secrets. Click Copy to copy the existing Client secret or click New client secret to create a new Client secret and copy it. For more information, see Register a Microsoft Entra app and create a service principal.
  • Storage account URL: Storage account URL.

Entra ID username password credential

  • Client ID: The client ID for authorizing access to Microsoft Azure Data Lake Storage. To find the Client ID for your application, select Microsoft Entra ID. From App registrations, select your application. Click Copy to copy the Client ID of your application. For more information, see Register a Microsoft Entra app and create a service principal.
  • Username and Password: Username and password for the Microsoft Azure Data Lake Storage account. You need permission to access the file without multi-factor authentication.
  • Storage account URL: Storage account URL.
  • WebHDFS URL: The WebHDFS URL for accessing HDFS.
    To connect to a Gen 2 ADLS, use the format, https://<account-name>.dfs.core.windows.net/<file-system>
    Where <account-name> is the name you used when you created the ADLS instance.
    For <file-system>, use the name of the container you created. For more information, see the Microsoft Data Lake Storage Gen2 documentation.

  • Tenant ID: The Azure Active Directory tenant ID
  • Client ID: The client ID for authorizing access to Microsoft Azure Data Lake Storage
  • Client secret: The authentication key that is associated with the client ID for authorizing access to Microsoft Azure Data Lake Storage

Select Server proxy to access the Azure Data Lake Storage data source through a proxy server. Depending on its setup, a proxy server can provide load balancing, increased security, and privacy. The proxy server settings are independent of the authentication credentials and the personal or shared credentials selection.

  • Proxy host: The proxy URL. For example, https://proxy.example.com.
  • Proxy port number: The port number to connect to the proxy server. For example, 8080 or 8443.
  • The Proxy protocol selection for HTTP or HTTPS is optional.

For Private connectivity, to connect to a database that is not externalized to the internet (for example, behind a firewall), you must set up a secure connection.

Choose the method for creating a connection based on where you are in the platform

In a project
Click Assets > New asset > Connect to a data source. See Adding a connection to a project.
In a catalog
Click Add to catalog > Connection. See Adding a connection asset to a catalog.
In a deployment space
Click Import assets > Data access > Connection. See Adding data assets to a deployment space.
In the Platform assets catalog
Click New connection. See Adding platform connections.

Next step: Add data assets from the connection

Where you can use this connection

You can use Microsoft Azure Data Lake Storage connections in the following workspaces and tools:

Projects

  • Data quality rules (IBM Knowledge Catalog)
  • DataStage (DataStage service). See Connecting to a data source in DataStage.
  • Decision Optimization (watsonx.ai Studio and watsonx.ai Runtime)
  • Metadata enrichment (IBM Knowledge Catalog)
  • Metadata import (IBM Knowledge Catalog)
  • SPSS Modeler (watsonx.ai Studio)

Catalogs

  • Platform assets catalog

  • Other catalogs (IBM Knowledge Catalog)

Azure Data Lake Storage authentication setup

To set up authentication, you need a tenant ID, client (or application) ID, and client secret.

Supported file types

The Microsoft Azure Data Lake Storage connection supports these file types: Avro, CSV, Delimited text, Excel, JSON, ORC, Parquet, SAS, SAV, SHP, and XML.

Table formats

In addition to Flat file, the Microsoft Azure Data Lake Storage connection supports these Data Lake table formats: Delta Lake and Iceberg.

Learn more

Azure Data Lake

Parent topic: Supported connections