Microsoft Azure Databricks connection
To access your data in Microsoft Azure Databricks, create a connection asset for it.
Databricks is a big data analytics tool that is based on Apache Spark.
Supported Databricks Runtime versions
The Microsoft Azure Databricks connection runs on the Azure Cloud runtimes.
Create a connection to Microsoft Azure Databricks
To create the connection asset, you need to enter the connection details and to select an authentication method.
Connection details
- Hostname or IP address of the database
- Port number of the database
- HTTP path: Path of the endpoint for which the server is configured in HTTP transport mode.
Credentials
Choose an authentication method:
- Entra ID token
Microsoft Entra ID is a cloud-based identity and access management service. To obtain connection values for the Entra ID authentication method, sign in to the Microsoft Azure portal. For information about Microsoft Entra ID, see What is Microsoft Entra ID? and Get Microsoft Entra ID tokens for service principals.
- Service principal credentials
Client ID and client secret of the service principal.
A service client principal is a credential created for Microsoft Azure Databricks that is used for automated tools, jobs and applications. For more inforation, see Manage service principals. To create a service client principal, see Use a service principal to authenticate with Azure Databricks.
- Username and password
Username and password for accessing the database.
Choose the method for creating a connection based on where you are in the platform
- In a project
- Click Assets > New asset > Connect to a data source. See Adding a connection to a project.
- In a deployment space
- Click Import assets > Data access > Connection. See Adding data assets to a deployment space.
- In the Platform assets catalog
- Click New connection. See Adding platform connections.
Next step: Add data assets from the connection
Microsoft Azure Databricks setup
Get started: Account and workspace setup
Running SQL statements
To ensure that your SQL statements run correctly, refer to the Azure Databricks SQL language reference for the correct syntax.
Configuring lineage metadata import for Microsoft Azure Databricks
When you create a metadata import for the Microsoft Azure Databricks connection, you can set options specific to this data source, and define the scope of data for which lineage is generated. For details about metadata import, see Designing metadata imports.
To import lineage metadata for Microsoft Azure Databricks, complete these steps:
- Create a data source definition. Select Microsoft Azure Databricks as the data source type.
- Create a connection to the data source in a project.
- Create a metadata import. Learn more about options that are specific to Microsoft Azure Databricks data source:
- When you define a scope, you can analyze the entire data source or use the include and exclude options to define the exact catalogs and schemas that you want to be analyzed. See Include and exclude lists.
- Optionally, you can provide external input in the form of a .zip file. You add this file in the Add inputs from file field. The file must have a supported structure. See External inputs.
- Specify advanced import options.
Include and exclude lists
You can include or exclude assets up to the schema level. Provide catalogs and schemas in the format catalog/schema. Each part is evaluated as a regular expression. Assets which are added later in the data source will also be included or excluded if they match the conditions specified in the lists. Example values:
: all schemas inmyCatalog/
,myCatalog
: all schemas inmyCatalog/.*
,myCatalog
:myCatalog3/mySchema1
frommySchema1
,myCatalog3
: any schema in mymyCatalog4/mySchema[1-5]
with a name that starts withmyCatalog4
and ends with a digit between 1 and 5mySchema
External inputs
If you use external Microsoft Azure Databricks dll archives, you can add them in a .zip file as an external input. You can organize the structure of the .zip file as the dll folder with subfolders or archives that represent the workspace structure. The .zip file can have the following structure:
<dll> <catalog_name_folder> <schema_name_folder> <tables> <table_name.sql> <views> <view_name.sql>
Advanced import options
- Display table lineage
- Generate edges between tables for which the column-level lineage information was not found.
Learn more
Parent topic: Supported connections