Connecting to Spark SQL
Spark SQL provides a programming interface for working with structured data by using SQL, data frames, and data sets. Spark SQL supports batch and streaming processing for optimized performance.
The Spark SQL connector requires specific information to create a connection to it in Data Virtualization. For more information, see Data sources in object storage in Data Virtualization.
Before you begin
- Hostname
- Port number
- Target database
- Username and password
Procedure
To connect to Spark SQL in Data Virtualization, follow these steps.
On the navigation menu, click Data sources page appears.
. TheClick
to view a list of data sources.-
Select the Spark SQL data source connection.
-
Enter the connection name and description.
-
Enter the hostname, port number, target database, and authentication credentials (username and password) for the connection.
-
Spark SQL has two options for authentication to set the connection:
-
Authenticate by using your username and password credentials.
-
Authenticate by using Kerberos with Service Principal Name (SPN), user principal, and keytab.
Note:To use the Kerberos authentication method, you must configure Kerberos Authentication in Data Virtualization beforehand. See Kerberos authentication on Cloud for Data Virtualization for more information.
-
-
If the connection requires a custom SSL certificate, enter the certificate in the SSL certificate field.
-
Click Create to add the connection to the data source environment.