About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Connecting to Spark SQL
Last updated: Mar 17, 2025
Spark SQL provides a programming interface for working with structured data by using SQL, data frames, and data sets. Spark SQL supports batch and streaming processing for optimized performance.
The Spark SQL connector requires specific information to create a connection to it in Data Virtualization. For more information, see Data sources in object storage in Data Virtualization.
Before you begin
You will need the following connection details for this connection:
- Hostname
- Port number
- Target database
- Username and password
Procedure
To connect to Spark SQL in Data Virtualization, follow these steps.
On the navigation menu, click Data sources page appears.
. TheClick
to view a list of data sources.-
Select the Spark SQL data source connection.
-
Enter the connection name and description.
-
Enter the hostname, port number, target database, and authentication credentials (username and password) for the connection.
-
Spark SQL has two options for authentication to set the connection:
-
Authenticate by using your username and password credentials.
-
Authenticate by using Kerberos with Service Principal Name (SPN), user principal, and keytab.
Note:To use the Kerberos authentication method, you must configure Kerberos Authentication in Data Virtualization beforehand. See Kerberos authentication on Cloud for Data Virtualization for more information.
-
-
If the connection requires a custom SSL certificate, enter the certificate in the SSL certificate field.
-
Click Create to add the connection to the data source environment.
Results
You can now use your Spark SQL database as a data source in Data Virtualization.Was the topic helpful?
0/1000