Microsoft Azure Databricks connection

Last updated: Apr 15, 2025
Microsoft Azure Databricks connection

To access your data in Microsoft Azure Databricks, create a connection asset for it.

Databricks is a big data analytics tool that is based on Apache Spark.

Supported Databricks Runtime versions

The Microsoft Azure Databricks connection runs on the Azure Cloud runtimes.

Create a connection to Microsoft Azure Databricks

To create the connection asset, you need to enter the connection details and to select an authentication method.

Connection details

  • Hostname or IP address of the database
  • Port number of the database
  • HTTP path: Path of the endpoint for which the server is configured in HTTP transport mode.

Credentials

Choose an authentication method:

  • Entra ID token
Note: Prerequisite for Entra ID authentication:

Microsoft Entra ID is a cloud-based identity and access management service. To obtain connection values for the Entra ID authentication method, sign in to the Microsoft Azure portal. For information about Microsoft Entra ID, see What is Microsoft Entra ID? and Get Microsoft Entra ID tokens for service principals.

  • Service principal credentials
    Client ID and client secret of the service principal.
Note: Prerequisite for service client principal authentication:

A service client principal is a credential created for Microsoft Azure Databricks that is used for automated tools, jobs and applications. For more inforation, see Manage service principals. To create a service client principal, see Use a service principal to authenticate with Azure Databricks.

  • Username and password
    Username and password for accessing the database.

Choose the method for creating a connection based on where you are in the platform

In a project
Click Assets > New asset > Connect to a data source. See Adding a connection to a project.


In a deployment space
Click Import assets > Data access > Connection. See Adding data assets to a deployment space.

In the Platform assets catalog
Click New connection. See Adding platform connections.

Next step: Add data assets from the connection

Microsoft Azure Databricks setup

Get started: Account and workspace setup

Running SQL statements

To ensure that your SQL statements run correctly, refer to the Azure Databricks SQL language reference for the correct syntax.

Configuring lineage metadata import for Microsoft Azure Databricks

When you create a metadata import for the Microsoft Azure Databricks connection, you can set options specific to this data source, and define the scope of data for which lineage is generated. For details about metadata import, see Designing metadata imports.

To import lineage metadata for Microsoft Azure Databricks, complete these steps:

  1. Create a data source definition. Select Microsoft Azure Databricks as the data source type.
  2. Create a connection to the data source in a project.
  3. Create a metadata import. Learn more about options that are specific to Microsoft Azure Databricks data source:
    • When you define a scope, you can analyze the entire data source or use the include and exclude options to define the exact catalogs and schemas that you want to be analyzed. See Include and exclude lists.
    • Optionally, you can provide external input in the form of a .zip file. You add this file in the Add inputs from file field. The file must have a supported structure. See External inputs.
    • Specify advanced import options.

Include and exclude lists

You can include or exclude assets up to the schema level. Provide catalogs and schemas in the format catalog/schema. Each part is evaluated as a regular expression. Assets which are added later in the data source will also be included or excluded if they match the conditions specified in the lists. Example values:

  • myCatalog/: all schemas in myCatalog,
  • myCatalog/.*: all schemas in myCatalog,
  • myCatalog3/mySchema1: mySchema1 from myCatalog3,
  • myCatalog4/mySchema[1-5]: any schema in my myCatalog4 with a name that starts with mySchema and ends with a digit between 1 and 5

External inputs

If you use external Microsoft Azure Databricks dll archives, you can add them in a .zip file as an external input. You can organize the structure of the .zip file as the dll folder with subfolders or archives that represent the workspace structure. The .zip file can have the following structure:

<dll>
    <catalog_name_folder>
      <schema_name_folder> 
        <tables> 
          <table_name.sql> 
        <views> 
          <view_name.sql>

Advanced import options

Display table lineage
Generate edges between tables for which the column-level lineage information was not found.

Learn more

Parent topic: Supported connections