Accessing data sources by using remote connectors in Watson Query
Use remote connectors in Watson Query, along with IBM® Cloud
Secure Gateway, to access data sources and
files that are located in protected networks.
- Access remote data source or services
- Remote connectors provide access to data sources or other data services that are not directly accessible from the Cloud Pak for Data cluster. Additionally, remote connectors facilitate data source discovery with remote port scanning. For more information, see Discovering remote data sources.
- Access data stored in files
- You can access file data, in formats such as CSV, TSV, and XLS, on remote file systems. Additionally, connectors provide remote browsing and data preview to facilitate virtualization configuration.
- Improve query performance
- Remote connectors enable distributed aggregations and join filters, and accelerate query processing on multiple worker pods. Connectors also enable greater numbers of data source connections and enhance parallelism during processing. As the number of connected sources increases, the distribution and parallelism of processing benefits query performance. Thus, moving the connector closer to the data source moves that processing closer to the data source.
Recommendations:
- Locate the remote connector as close as possible to the data source. When it is on the same machine as the data source, you eliminate network latency between the data source and the remote connector. If it is located within the same data center, you have a stable high-speed network between them. The latency increases the further the remote connector moves from the data source. Latencies still exist along the connector communications path, but the connector performs more operations on the result data from the data source.
- Adjust the number of data sources on each remote connector. The maximum recommended number of data sources per remote connector is 10 because of the memory settings that are defined for each connector.
- Ensure that you have IBM Java 8 installed on the data source where the remote connector will be located.
How to access data on remote data sources
Use the following workflow to understand how to access data on remote data sources.
To try it out, see Improve performance for your data virtualization data sources with remote connectors.