ibm-watson-studio-lib for R

Last updated: Oct 09, 2024

The ibm-watson-studio-lib library for R provides access to assets. It can be used in notebooks that are created in the notebook editor or in RStudio in a project. ibm-watson-studio-lib provides support for working with data assets and connections, as well as browsing functionality for all other asset types.

There are two kinds of data assets:

Stored data assets refer to files in the storage associated with the current project. The library can load and save these files. For data larger than one megabyte, this is not recommended. The library requires that the data is kept in memory in its entirety, which might be inefficient when processing huge data sets.
Connected data assets represent data that must be accessed through a connection. Using the library, you can retrieve the properties (metadata) of the connected data asset and its connection. The functions do not return the data of a connected data asset. You can either use the code that is generated for you when you click Read data on the Code snippets panel to access the data or you must write your own code.

Note: The `ibm-watson-studio-lib` functions do not encode or decode data when saving data to or getting data from a file. Additionally, the `ibm-watson-studio-lib` functions can't be used to access connected folder assets (files on a path to the project storage).

Setting up the `ibm-watson-studio-lib` library

The ibm-watson-studio-lib library for R is pre-installed and can be imported directly in a notebook in the notebook editor. To use the ibm-watson-studio-lib library in your notebook, you need the ID of the project and the project token.

To insert the project token to your notebook:

Click the More icon on your notebook toolbar and then click Insert project token.

If a project token exists, a cell is added to your notebook with the following information:
```
library(ibmWatsonStudioLib)
wslib <- access_project_or_space(list("token"="<ProjectToken>"))
```
<ProjectToken> is the value of the project token.

If you are told in a message that no project token exists, click the link in the message to be redirected to the project's Access Control page where you can create a project token. You must be eligible to create a project token. For details, see Manually adding the project token.

To create a project token:
1. From the Manage tab, select the Access Control page, and click New access token under Access tokens.
2. Enter a name, select Editor role for the project, and create a token.
3. Go back to your notebook, click the More icon on the notebook toolbar and then click Insert project token.

The `ibm-watson-studio-lib` functions

The ibm-watson-studio-lib library exposes a set of functions that are grouped in the following way:

Get project information
Get authentication token
Fetch data
Save data
Get connection information
Get connected data information
Access assets by ID instead of name
Access project storage directly
Spark support
Browse project assets

Get project information

While developing code, you might not know the exact names of data assets or connections. The following functions provide lists of assets, from which you can pick the relevant ones. In all examples, you can use wslib$show(assets) to pretty-print the list. The index of each item is printed in front of the item.

list_connections()

This function returns a list of the connections. The list of returned connections is not sorted by any criterion and can change when you call the function again. You can pass a list item instead of a name to get_connection function.
```
# Import the lib
library("ibmWatsonStudioLib")
wslib <- access_project_or_space(list("token"="<ProjectToken>"))

assets <- wslib$list_connections()
wslib$show(assets)
connprops <- wslib$get_connection(assets[0])
```
list_connected_data()

This function returns the connected data assets. The list of returned connected data assets is not sorted by any criterion and can change when you call the function again. You can pass a list item instead of a name to the get_connected_data function.
list_stored_data()

This function returns a list of the stored data assets (data files). The list of returned data assets is not sorted by any criterion and can change when you call the function again. You can pass a list item instead of a name to the load_data and save_data functions.

Note: A heuristic is applied to distinguish between connected data assets and stored data assets. However, there may be cases where a data asset of the wrong kind appears in the returned lists.
wslib$here By using this entry point, you can retrieve metadata about the project that the lib is working with. The entry point wslib$here provides the following functions:
- get_name()
  
  This function returns the name of the project.
- get_description()
  
  This function returns the description of the project.
- get_ID()
  
  This function returns the ID of the project.
- get_storage()
  
  This function returns storage information for the project.

Get authentication token

Some tasks require an authentication token. For example, if you want to run your own requests against the Watson Data API, you need an authentication token.

You can use the following function to get the bearer token:

get_current_token()

For example:

library("ibmWatsonStudioLib")
wslib <- access_project_or_space(list("token"="<ProjectToken>"))
token <- wslib$auth$get_current_token()

This function returns the bearer token that is currently used by the ibm-watson-studio-lib library.

Fetch data

You can use the following functions to fetch data from a stored data asset (a file) in your project.

load_data(asset_name_or_item, attachment_type_or_item = NULL)

This function loads the data of a stored data asset into a bytes buffer. The function is not recommended for very large files.

The function takes the following parameters:
- asset_name_or_item: (Required) Either a string with the name of a stored data asset or an item like those returned by list_stored_data().
- attachment_type_or_item: (Optional) Attachment type to load. A data asset can have more than one attachment with data. Without this parameter, the default attachment type, namely data_asset is loaded. Specify this parameter if the attachment type is not data_asset. For example, if a plain text data asset has an attached profile from Natural Language Analysis, this can be loaded as attachment type data_profile_nlu.
  
  Here is an example that shows you how to load the data of a data asset:
```
# Import the lib
library("ibmWatsonStudioLib")
wslib <- access_project_or_space(list("token"="<ProjectToken>"))

# Fetch the data from a file
my_file <- wslib$load_data("MyFile.csv")

# Read the CSV data file into a data frame
df <-  read.csv(text = rawToChar(my_file))
head(df)
```
download_file(asset_name_or_item, file_name = NULL, attachment_type_or_item = NULL)

This function downloads the data of a stored data asset and stores it in the specified file in the file system of your runtime. The file is overwritten if it already exists.

The function takes the following parameters:
- asset_name_or_item: (Required) Either a string with the name of a stored data asset or an item like those returned by list_stored_data().
- file_name: (Optional) The name of the file that the downloaded data is stored to. It defaults to the asset's attachment name.
- attachment_type_or_item: (Optional) The attachment type to download. A data asset can have more than one attachment with data. Without this parameter, the default attachment type, namely data_asset is downloaded. Specify this parameter if the attachment type is not data_asset. For example, if a plain text data asset has an attached profile from Natural Language Analysis, this can be downlaoded loaded as attachment type data_profile_nlu.
  
  Here is an example that shows you how to you can use download_file to make your custom R script available in your notebook:
```
# Import the lib
library("ibmWatsonStudioLib")
wslib <- access_project_or_space(list("token"="<ProjectToken>"))

# Let's assume you have a R script "helpers.R" with helper functions on your local machine.
# Upload the script to your project using the Data Panel on the right.

# Download the script to the file system of your runtime
wslib$download_file("helpers.R")

# Source the script to use the contained functions, e.g. ‘my_func’, in your notebook.
source("helpers.R")
my_func()
```

Save data

The functions to store data in your project storage do multiple things:

Store the data in project storage
Add the data as a data asset (by creating an asset or overwriting an existing asset) to your project so you can see the data in the data assets list in your project.
Associate the asset with the file in the storage.

You can use the following functions to save data:

save_data(asset_name_or_item, data, overwrite = NULL, mime_type = NULL, file_name = NULL)

This function saves data in memory to the project storage.

The function takes the following parameters:
- asset_name_or_item: (Required) The name of the created asset or list item that is returned by list_stored_data(). You can use the item if you like to overwrite an existing file.
- data: (Required) The data to upload. The expected data type is raw.
- overwrite: (Optional) Overwrites the data of a stored data asset if it already exists. Defaults to FALSE. If an asset item is passed instead of a name, the behavior is to overwrite the asset.
- mime_type: (Optional) The MIME type for the created asset. By default the MIME type is determined from the asset name suffix. If you use asset names without a suffix, specify the MIME type here. For example mime_type=application/text for plain text data. This parameter is ignored when overwriting an asset.
- file_name: (Optional) The file name to be used in the project storage. The data is saved in the storage associated with the project. When creating a new asset, the file name is derived from the asset name, but might be different. If you want to access the file directly, you can specify a file name. This parameter is ignored when overwriting an asset.
  
  Here is an example that shows you how to save data to a file:
```
# Import the lib
library("ibmWatsonStudioLib")
wslib <- access_project_or_space(list("token"="<ProjectToken>"))

# let's assume you have a data frame df which contains the data
# you want to save as a csv file
csv <- capture.output(write.csv(df, row.names=FALSE), type="output")
csv_raw <- charToRaw(paste0(csv, collapse='\n'))
wslib$save_data("my_asset_name.csv", csv_raw)

# the function returns a list which contains the asset_name, asset_id, file_name and additional information upon successful saving of the data
```
upload_file(file_path, asset_name = NULL, file_name = NULL, overwrite = FALSE, mime_type = NULL)

This function saves data in the file system in the runtime to a file associated with your project.

The function takes the following parameters:
- file_path: (Required) The path to the file in the file system.
- asset_name: (Optional) The name of the data asset that is created. It defaults to the name of the file to be uploaded.
- file_name: (Optional) The name of the file that is created in the storage associated with the project. It defaults to the name of the file to be uploaded.
- overwrite: (Optional) Overwrites an existing file in storage. Defaults to FALSE.
- mime_type: (Optional) The MIME type for the created asset. By default the MIME type is determined from the asset name suffix. If you use asset names without a suffix, specify the MIME type here. For example mime_type='application/text' for plain text data. This parameter is ignored when overwriting an asset.
  
  Here is an example that shows you how you can upload a file to the project:
```
# Import the lib
library("ibmWatsonStudioLib")
wslib <- access_project_or_space(list("token"="<ProjectToken>"))

# Let's assume you have downloaded a file and want to save it
# in your project.
download.file("https://some/url/data_file.csv", "data_file.csv")
wslib$upload_file("data_file.csv")

# The function returns a list which contains the asset_name, asset_id, file_name and additional information upon successful saving of the data.
```

Get connection information

You can use the following function to access the connection metadata of a given connection.

get_connection(name_or_item)

This function returns the properties (metadata) of a connection which you can use to fetch data from the connection data source. Use wslib$show(connprops) to view the properties. The special key "." in the returned list item provides information about the connection asset.

The function takes the following required parameter:
- name_or_item: Either a string with the name of a connection or an item like those returned by list_connections().
Note that when you work with notebooks, you can click Read data on the Code snippets panel to generate code to load data from a connection into a pandas DataFrame for example.

Get connected data information

You can use the following function to access the metadata of a connected data asset.

get_connected_data(name_or_item)

This function returns the properties of a connected data asset, including the properties of the underlying connection. Use wslib$show() to view the properties. The special key "." in the returned list provides information about the data and the connection assets.

The function takes the following required parameter:
- name_or_item: Either a string with the name of a connected data asset or an item like those returned by list_connected_data().
Note that when you work with notebooks, you can click Read data on the Code snippets panel to generate code to load data from a connected data asset into a pandas DataFrame for example.

Access asset by ID instead of name

You should preferably always access data assets and connections by a unique name. Asset names are not necessarily always unique and the ibm-watson-studio-lib functions will raise an exception when a name is ambiguous. You can rename data assets in the UI to resolve the conflict.

Accessing assets by a unique ID is possible but is discouraged as IDs are valid only in the current project and will break code when transferred to a different project. This can happen for example, when projects are exported and re-imported. You can get the ID of a connection, connected or stored data asset by using the corresponding list function, for example list_connections().

The entry point wslib$by_id provides the following functions:

get_connection(asset_id)

This function accesses a connection by the connection asset ID.
get_connected_data(asset_id)

This function accesses a connected data asset by the connected data asset ID.
load_data(asset_id, attachment_type_or_item = NULL)

This function loads the data of a stored data asset by passing the asset ID. See load_data() for a decsription of the other parameters you can pass.
save_data(asset_id, data, overwrite = NULL, mime_type = NULL, file_name = NULL)

This function saves data to a stored data asset by passing the asset ID. This implies overwrite=TRUE. See save_data() for a description of the other parameters you can pass.
download_file(asset_id, file_name = NULL, attachment_type_or_item = NULL)

This function downloads the data of a stored data asset by passing the asset ID. See download_file() for a description of the other parameters you can pass.

Access project storage directly

You can fetch data from project storage and store data in project storage without synchronizing the project assets using the entry point wslib$storage.

The entry point wslib$storage provides the following functions:

fetch_data(filename)

This function returns the data in a file as a bytes buffer. The file does not need to be registered as data asset.

The function takes the following required parameter:
- filename: The name of the file in the project.
store_data(filename, data, overwrite = FALSE)

This function saves data in memory to storage, but does not create a new data asset. The function returns a list which contains the file name, file path and additional information. Use Use wslib$show() to print the information.

The function takes the following parameters:
- filename: (Required) The name of the file in the project storage.
- data: (Required) The data to save as a raw object.
- overwrite: (Optional) Overwrites the data of a file in storage if it already exists. By default, this is set to false.
download_file(storage_filename, local_filename = NULL)

This function downloads the data in a file in storage and stores it in the specified local file. The local file is overwritten if it already existed.

The function takes the following parameters:
- storage_filename: (Required) The name of the file in storage to download.
- local_filename: (Optional) The name of the file in the local file system of your runtime to downloaded the file to. Omit this parameter to use the storage file name.
register_asset(storage_path, asset_name = NULL, mime_type = NULL)

This function registers the file in storage as a data asset in your project. This operation fails if a data asset with the same name already exists. You can use this function if you have very large files that you cannot upload via save_data(). You can upload large files directly to the IBM Cloud Object Storage bucket of your project, for example via the UI, and then register them as data assets using register_asset().

The function takes the following parameters:
- storage_path: (Required) The path of the file in storage.
- asset_name: (Optional) The name of the created asset. It defaults to the file name.
- mime_type: (Optional) The MIME type for the created asset. By default the MIME type is determined from the asset name suffix. Use this parameter to specify a MIME type if your file name does not have a file extension or if you want to set a different MIME type.
Note: You can register a file several times as a different data asset. Deleting one of those assets in the project also deletes the file in storage, which means that other asset references to the file might be broken.

Spark support

The entry point wslib$spark provides functions to access files in storage with Spark.

The entry point wslib$spark provides the following functions:

provide_spark_context(sc)

Use this function to enable Spark support.

The function takes the following required parameter:
- sc: The SparkContext. It is provided in the notebook runtime.
  
  The following example shows you how to set up Spark support:
```
library(ibmWatsonStudioLib)
wslib <- access_project_or_space(list("token"="<ProjectToken>"))
wslib$spark$provide_spark_context(sc)
```
get_data_url(asset_name)

This function returns a URL to access a file in storage from Spark via Hadoop.

The function takes the following required parameter:
- asset_name: The name of the asset.
storage.get_data_url(file_name)

This function returns a URL to access a file in storage from Spark via Hadoop. The function expects the file name and not the asset name.

The function takes the following required parameter:
- file_name: The name of a file in the project storage.

Browse project assets

The entry point wslib$assets provides generic, read-only access to assets of any type. For selected asset types, there are dedicated functions that provide additional data.

The following naming conventions apply:

Functions named list_<something> return a list of named lists. Each contained list represents one asset and includes a small set of properties (metadata) that identifies the asset.
Functions named get_<something> return a single named list with the properties for the asset.

To pretty-print a list or list of named lists, use wslib$show().

The functions expect either the name of an asset, or an item from a list as the parameter. By default, the functions return only a subset of the available asset properties. By setting the parameter raw_info=TRUE, you can get the full set of asset properties.

The entry point wslib$assets provides the following functions:

list_assets(asset_type, name = NULL, query = NULL, selector = NULL, raw_info = FALSE)

This function lists all assets for the given type with respect to the given constraints.

The function takes the following parameters:
- asset_type: (Required) The type of the assets to list, for example data_asset. See list_asset_types() for a list of the available asset types. Use asset type asset for the list of all available assets in the project.
- name: (Optional) The name of the asset to list. Use this parameter if more than one asset with the same name exists. You can only specify either name and query.
- query: (Optional) A query string that is passed to the Watson Data API to search for assets. You can only specify either name and query.
- selector: (Optional) A custom filter function on the candidate asset list items. If the selector function returns TRUE, the asset is included in the returned asset list.
- raw_info: (Optional) Returns all of the available metadata. By default, the parameter is set to FALSE and only a subset of the properties is returned.
  
  Examples of using the list_assets function:
```
# Import the lib
library("ibmWatsonStudioLib")
wslib <- access_project_or_space(list("token"="<ProjectToken>"))

# List all assets in the project
all_assets <- wslib$assets$list_assets("asset")
wslib$show(all_assets)

# List all data assets with name 'MyFile.csv'
assets_by_name <- wslib$assets$list_assets("data_asset", name = "MyFile.csv")

# List all data assets whose name starts with "MyF"
assets_by_query <- wslib$assets$list_assets("data_asset", query = "asset.name:(MyF*)")

# List all data assets which are larger than 1MB
sizeFilter <- function(asset) asset$metadata$size > 1000000
large_assets <- wslib$assets$list_assets("data_asset", selector = sizeFilter, raw_info = TRUE)
wslib$show(large_assets)

# List all notebooks
notebooks <- wslib$assets$list_assets("notebook")
```
list_asset_types(raw_info = FALSE)

This function lists all available asset types.

The function can take the following parameter:
- raw_info: (Optional) Returns the full set of metadata. By default, the parameter is FALSE and only a subset of the properties is returned.
list_datasource_types(raw_info = FALSE)

This function lists all available data source types.

The function can take the following parameter:
- raw_info: (Optional) Returns the full set of metadata. By default, the parameter is FALSE and only a subset of the properties is returned.
get_asset(name_or_item, asset_type=None, raw_info = FALSE)

The function returns the metadata of an asset.

The function takes the following parameters:
- name_or_item: (Required) The name of the asset or an item like those returned by list_assets()
- asset_type: (Optional) The type of the asset. If the parameter name_or_item contains a string for the name of the asset, setting asset_type is required.
- raw_info: (Optional) Returns the full set of metadata. By default, the parameter is FALSE and only a subset of the properties is returned.
  
  Example of using the list_assets and get_asset functions:
```
notebooks <- wslib$assets$list_assets("notebook")
wslib$show(notebooks)

notebook <- wslib$assets$get_asset(notebooks[[1]])
wslib$show(notebook)
```
get_connection(name_or_item, with_datasourcetype=False, raw_info = FALSE)

This function returns the metadata of a connection.

The function takes the following parameters:
- name_or_item: (Required) The name of the connection or an item like those returned by list_connections()
- with_datasourcetype: (Optional) Returns additional information about the data source type of the connection.
- raw_info: (Optional) Returns the full set of metadata. By default, the parameter is FALSE and only a subset of the properties is returned.
get_connected_data(name_or_item, with_datasourcetype=False, raw_info = FALSE)

This function returns the metadata of a connected data asset.

The function takes the following parameters:
- name_or_item: (Required) The name of the connected data asset or an item like those returned by list_connected_data()
- with_datasourcetype: (Optional) Returns additional information about the data source type of the associated connected data asset.
- raw_info: (Optional) Returns the full set of metadata. By default, the parameter is FALSE and only a subset of the properties is returned.
get_stored_data(name_or_item, raw_info = FALSE)

This function returns the metadata of a stored data asset.

The function takes the following parameters:
- name_or_item: (Required) The name of the stored data asset or an item like those returned by list_stored_data()
- raw_info: (Optional) Returns the full set of metadata. By default, the parameter is FALSE and only a subset of the properties is returned.
list_attachments(name_or_item_or_asset, asset_type=None, raw_info = FALSE)

This function returns a list of the attachments of an asset.

The function takes the following parameters:
- name_or_item_or_asset: (Required) The name of the asset or an item like those returned by list_stored_data() or get_asset().
- asset_type: (Optional) The type of the asset. It defaults to type data_asset.
- raw_info: (Optional) Returns the full set of metadata. By default, the parameter is FALSE and only a subset of the properties is returned.
  
  Example of using the list_attachments function to read an attachment of a stored data asset:
```
assets <- wslib$list_stored_data()
wslib$show(assets)

asset <- assets[[1]]
attachments <- wslib$assets$list_attachments(asset)
wslib$show(attachments)
buffer <- wslib$load_data(asset, attachments[[1]])
```

Parent topic: Using ibm-watson-studio-lib

Setting up the ibm-watson-studio-lib library

The ibm-watson-studio-lib functions