Connecting to Spark SQL

Last updated: Mar 17, 2025

Spark SQL provides a programming interface for working with structured data by using SQL, data frames, and data sets. Spark SQL supports batch and streaming processing for optimized performance.

The Spark SQL connector requires specific information to create a connection to it in Data Virtualization. For more information, see Data sources in object storage in Data Virtualization.

Before you begin

You will need the following connection details for this connection:

Hostname
Port number
Target database
Username and password

Procedure

To connect to Spark SQL in Data Virtualization, follow these steps.

On the navigation menu, click Data > Data virtualization. The Data sources page appears.
Click Add connection > New connection to view a list of data sources.
Select the Spark SQL data source connection.
Enter the connection name and description.
Enter the hostname, port number, target database, and authentication credentials (username and password) for the connection.
Spark SQL has two options for authentication to set the connection:
- Authenticate by using your username and password credentials.
- Authenticate by using Kerberos with Service Principal Name (SPN), user principal, and keytab.
  
  Note:
  To use the Kerberos authentication method, you must configure Kerberos Authentication in Data Virtualization beforehand. See Kerberos authentication on Cloud for Data Virtualization for more information.
If the connection requires a custom SSL certificate, enter the certificate in the SSL certificate field.
Click Create to add the connection to the data source environment.