Analytics Engine HDFS connection
Use the Analytics Engine HDFS connection to connect to IBM Analytics Engine with the WebHDFS API.
IBM Analytics Engine is a Hadoop and Spark service on IBM Cloud that provides an environment to develop and deploy advanced analytics applications. Data is stored in IBM Cloud Object Storage (COS). The Analytics Engine service starts clusters
of compute nodes when needed. Analytics Engine HDFS was formerly known as "IBM BigInsights on Cloud."
Create a connection to IBM Analytics Engine
To create the connection asset, you need these connection details:
- WebHDFS URL: Required.
- Username. Required.
- Password
- SSL certificate if required by the Apache Hive server
Select Use Home As Root to use the username's home directory for the root for browsing.
For Private connectivity, to connect to a database that is not externalized to the internet (for example, behind a firewall), you must set up a secure connection.
Hive properties
The Hive properties are only for when you want to use the Analytics Engine HDFS connection for target (write) data. If you specify Hive properties and you write a file into the target HDFS, then a Hive connection
will be established that creates a Hive table for the associated file. If you want to browse the Hive tables of Analytics Engine, use the Apache Hive connection.
- Hive host: The hostname or IP address of the Apache Hive server.
- Hive database: The database in Apache Hive.
- Hive port number: The port number of the Apache Hive server. The default is
10000
. - Hive HTTP path: The path of the endpoint such as
gateway/default/hive
when the Apache Hive server is configured for HTTP transport mode. - Hive user
- Hive password
Choose the method for creating a connection based on where you are in the platform
- In a project
- Click New asset > Connection. See Adding a connection to a project.
- In a catalog
- Click Add to catalog > Connection. See Adding a connection asset to a catalog.
- In a deployment space
- Click Add to space > Connection. See Adding data assets to a deployment space.
- In the Platform assets catalog
- Click New connection. See Adding platform connections.
Next step: Add data assets from the connection
Where you can use this connection
You can use Analytics Engine HDFS connections in the following workspaces and tools:
Projects
- Data Refinery (Watson Studio or Watson Knowledge Catalog)
- Metadata enrichment (Watson Knowledge Catalog)
- Metadata import (Watson Knowledge Catalog)
- SPSS Modeler (Watson Studio)
Catalogs
-
Platform assets catalog
-
Other catalogs (Watson Knowledge Catalog)
Analytics Engine setup
Supported file types
The Analytics Engine HDFS connection supports these file types: Avro, CSV, Delimited text, Excel, JSON, ORC, Parquet, SAS, SAV, SHP, and XML.
Learn more
Parent topic: Supported connections