The ibm-watson-studio-lib
library for Python provides access to assets. It can be used in notebooks that are created in the notebook editor. ibm-watson-studio-lib
provides support for working with data assets and connections,
as well as browsing functionality for all other asset types.
There are two kinds of data assets:
- Stored data assets refer to files in the storage associated with the current project. The library can load and save these files. For data larger than one megabyte, this is not recommended. The library requires that the data is kept in memory in its entirety, which might be inefficient when processing huge data sets.
- Connected data assets represent data that must be accessed through a connection. Using the library, you can retrieve the properties (metadata) of the connected data asset and its connection. The functions do not return the data of a connected data asset. You can either use the code that is generated for you when you click Read data on the Code snippets pane to access the data or you must write your own code.
Setting up the ibm-watson-studio-lib
library
The ibm-watson-studio-lib
library for Python is pre-installed and can be imported directly in a notebook in the notebook editor. To use the ibm-watson-studio-lib
library in your notebook, you need the ID of the project
and the project token.
To insert the project token to your notebook:
-
Click the More icon on your notebook toolbar and then click Insert project token.
If a project token exists, a cell is added to your notebook with the following information:
from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space({"token":"<ProjectToken>"})
<ProjectToken>
is the value of the project token.If you are told in a message that no project token exists, click the link in the message to be redirected to the project's Access Control page where you can create a project token. You must be eligible to create a project token. For details, see Manually adding the project token.
To create a project token:
- From the Manage tab, select the Access Control page, and click New access token under Access tokens.
- Enter a name, select Editor role for the project, and create a token.
- Go back to your notebook, click the More icon on the notebook toolbar and then click Insert project token.
Helper functions
You can get information about the supported functions in the ibm-watson-studio-lib
library programmatically by using help(wslib)
, or for an individual function by using help(wslib.<function_name>
,
for example help(wslib.get_connection)
.
You can use the helper function wslib.show(...)
for formatted printing of Python dictionaries and lists of dictionaries, which are the common result output type of the ibm-watson-studio-lib
functions.
The ibm-watson-studio-lib
functions
The ibm-watson-studio-lib
library exposes a set of functions that are grouped in the following way:
Get project information
While developing code, you might not know the exact names of data assets or connections. The following functions provide lists of assets, from which you can pick the relevant ones. In all examples, you can use wslib.show(assets)
to pretty-print the list. The index of each item is printed in front of the item.
-
list_connections()
This function returns a list of the connections. The list of returned connections is not sorted by any criterion and can change when you call the function again. You can pass a dictionary item instead of a name to the
get_connection
function.For example:
# Import the lib from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space({"token":"<ProjectToken>"}) assets = wslib.list_connections() wslib.show(assets) connprops = wslib.get_connection(assets[0]) wslib.show(connprops)
-
list_connected_data()
This function returns the connected data assets. The list of returned connected data assets is not sorted by any criterion and can change when you call the function again. You can pass a dictionary item instead of a name to the
get_connected_data
function. -
list_stored_data()
This function returns a list of the stored data assets (data files). The list of returned data assets is not sorted by any criterion and can change when you call the function again. You can pass a dictionary item instead of a name to the
load_data
andsave_data
functions.Note: A heuristic is applied to distinguish between connected data assets and stored data assets. However, there may be cases where a data asset of the wrong kind appears in the returned lists. -
wslib.here
By using this entry point, you can retrieve metadata about the project that the lib is working with. The entry point
wslib.here
provides the following functions:-
get_name()
This function returns the name of the project.
-
get_description()
This function returns the description of the project.
-
get_ID()
This function returns the ID of the project.
-
get_storage()
This function returns storage information for the project.
-
Get authentication token
Some tasks require an authentication token. For example, if you want to run your own requests against the Watson Data API, you need an authentication token.
You can use the following function to get the bearer token:
get_current_token()
For example:
from ibm_watson_studio_lib import access_project_or_space
wslib = access_project_or_space({"token":"<ProjectToken>"})
token = wslib.auth.get_current_token()
This function returns the bearer token that is currently used by the ibm-watson-studio-lib
library.
Fetch data
You can use the following functions to fetch data from a stored data asset (a file) in your project.
-
load_data(asset_name_or_item, attachment_type_or_item=None)
This function loads the data of a stored data asset into a BytesIO buffer. The function is not recommended for very large files.
The function takes the following parameters:
-
asset_name_or_item
: (Required) Either a string with the name of a stored data asset or an item like those returned bylist_stored_data()
. -
attachment_type_or_item
: (Optional) Attachment type to load. A data asset can have more than one attachment with data. Without this parameter, the default attachment type, namelydata_asset
is loaded. Specify this parameter if the attachment type is notdata_asset
. For example, if a plain text data asset has an attached profile from Natural Language Analysis, this can be loaded as attachment typedata_profile_nlu
.Here is an example that shows you how to load the data of a data asset:
# Import the lib from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space({"token":"<ProjectToken>"}) # Fetch the data from a file my_file = wslib.load_data("MyFile.csv") # Read the CSV data file into a pandas DataFrame my_file.seek(0) import pandas as pd pd.read_csv(my_file, nrows=10)
-
-
download_file(asset_name_or_item, file_name=None, attachment_type_or_item=None)
This function downloads the data of a stored data asset and stores it in the specified file in the file system of your runtime. The file is overwritten if it already exists.
The function takes the following parameters:
-
asset_name_or_item
: (Required) Either a string with the name of a stored data asset or an item like those returned bylist_stored_data()
. -
file_name
: (Optional) The name of the file that the downloaded data is stored to. It defaults to the asset's attachment name. -
attachment_type_or_item
: (Optional) The attachment type to download. A data asset can have more than one attachment with data. Without this parameter, the default attachment type, namelydata_asset
is downloaded. Specify this parameter if the attachment type is notdata_asset
. For example, if a plain text data asset has an attached profile from Natural Language Analysis, this can be downlaoded loaded as attachment typedata_profile_nlu
.Here is an example that shows you how to you can use
download_file
to make your custom Python script available in your notebook:# Import the lib from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space({"token":"<ProjectToken>"}) # Let's assume you have a Python script "helpers.py" with helper functions on your local machine. # Upload the script to your project using the Data Panel on the right of the opened notebook. # Download the script to the file system of your runtime wslib.download_file("helpers.py") # import the required functions to use them in your notebook from helpers import my_func my_func()
-
Save data
The functions to save data in your project storage do multiple things:
- Store the data in project storage
- Add the data as a data asset (by creating an asset or overwriting an existing asset) to your project so you can see the data in the data assets list in your project.
- Associate the asset with the file in the storage.
You can use the following functions to save data:
-
save_data(asset_name_or_item, data, overwrite=None, mime_type=None, file_name=None)
This function saves data in memory to the project storage.
The function takes the following parameters:
-
asset_name_or_item
: (Required) The name of the created asset or list item that is returned bylist_stored_data()
. You can use the item if you like to overwrite an existing file. -
data
: (Required) The data to upload. This can be any object of typebytes-like-object
, for example a byte buffer.Note if using Python 3.9: The data to load is not allowed to exceed 2 GB in size.
-
overwrite
: (Optional) Overwrites the data of a stored data asset if it already exists. By default, this is set to false. If an asset item is passed instead of a name, the behavior is to overwrite the asset. -
mime_type
: (Optional) The MIME type for the created asset. By default the MIME type is determined from the asset name suffix. If you use asset names without a suffix, specify the MIME type here. For examplemime_type=application/text
for plain text data. This parameter is ignored when overwriting an asset. -
file_name
: (Optional) The file name to be used in the project storage. The data is saved in the storage associated with the project. When creating a new asset, the file name is derived from the asset name, but might be different. If you want to access the file directly, you can specify a file name. This parameter is ignored when overwriting an asset.Here is an example that shows you how to save data to a file:
# Import the lib from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space({"token":"<ProjectToken>"}) # let's assume you have the pandas DataFrame pandas_df which contains the data # you want to save as a csv file wslib.save_data("my_asset_name.csv", pandas_df.to_csv(index=False).encode()) # the function returns a dict which contains the asset_name, asset_id, file_name and additional information upon successful saving of the data
-
-
upload_file(file_path, asset_name=None, file_name=None, overwrite=False, mime_type=None)
This function saves data in the file system in the runtime to a file associated with your project.Note if using Python 3.9: The size of the file, referenced by the parameter
file_name
is not allowed to exceed 2 GB.The function takes the following parameters:
-
file_path
: (Required) The path to the file in the file system. -
asset_name
: (Optional) The name of the data asset that is created. It defaults to the name of the file to be uploaded. -
file_name
: (Optional) The name of the file that is created in the storage associated with the project. It defaults to the name of the file to be uploaded. -
overwrite
: (Optional) Overwrites an existing file in storage. Defaults to false. -
mime_type
: (Optional) The MIME type for the created asset. By default the MIME type is determined from the asset name suffix. If you use asset names without a suffix, specify the MIME type here. For examplemime_type='application/text'
for plain text data. This parameter is ignored when overwriting an asset.Here is an example that shows you how you can upload a file to the project:
# Import the lib from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space({"token":"<ProjectToken>"}) # Let's assume you have downloaded a file and want to save it # in your project. import urllib.request urllib.request.urlretrieve("https://some/url/data_file.csv", "data_file.csv") wslib.upload_file("data_file.csv") # The function returns a dictionary which contains the asset_name, asset_id, file_name and additional information upon successful saving of the data.
-
Get connection information
You can use the following function to access the connection metadata of a given connection.
-
get_connection(name_or_item)
This function returns the properties (metadata) of a connection which you can use to fetch data from the connection data source. Use
wslib.show(connprops)
to view the properties. The special key"."
in the returned dictionary provides information about the connection asset.The function takes the following required parameter:
name_or_item
: Either a string with the name of a connection or an item like those returned bylist_connections()
.
Note that when you work with notebooks, you can click Read data on the Code snippets pane to generate code to load data from a connection into a pandas DataFrame for example.
Get connected data information
You can use the following function to access the metadata of a connected data asset.
-
get_connected_data(name_or_item)
This function returns the properties of a connected data asset, including the properties of the underlying connection. Use
wslib.show()
to view the properties. The special key"."
in the returned dictionary provides information about the data and the connection assets.The function takes the following required parameter:
name_or_item
: Either a string with the name of a connected data asset or an item like those returned bylist_connected_data()
.
Note that when you work with notebooks, you can click Read data on the Code snippets pane to generate code to load data from a connected data asset into a pandas DataFrame for example.
Access asset by ID instead of name
You should preferably always access data assets and connections by a unique name. Asset names are not necessarily always unique and the ibm-watson-studio-lib
functions will raise an exception when a name is ambiguous. You can
rename data assets in the UI to resolve the conflict.
Accessing assets by a unique ID is possible but is discouraged as IDs are valid only in the current project and will break code when transferred to a different project. This can happen for example, when projects are exported and re-imported.
You can get the ID of a connection, connected or stored data asset by using the corresponding list function, for example list_connections()
.
The entry point wslib.by_id
provides the following functions:
-
get_connection(asset_id)
This function accesses a connection by the connection asset ID.
-
get_connected_data(asset_id)
This function accesses a connected data asset by the connected data asset ID.
-
load_data(asset_id, attachment_type_or_item=None)
This function loads the data of a stored data asset by passing the asset ID. See
load_data()
for a description of the other parameters you can pass. -
save_data(asset_id, data, overwrite=None, mime_type=None, file_name=None)
This function saves data to a stored data asset by passing the asset ID. This implies
overwrite=True
. Seesave_data()
for a description of the other parameters you can pass. -
download_file(asset_id, file_name=None, attachment_type_or_item=None)
This function downloads the data of a stored data asset by passing the asset ID. See
download_file()
for a description of the other parameters you can pass.
Access project storage directly
You can fetch data from project storage and store data in project storage without synchronizing the project assets using the entry point wslib.storage
.
The entry point wslib.storage
provides the following functions:
-
fetch_data(filename)
This function returns the data in a file as a BytesIO buffer. The file does not need to be registered as a data asset.
The function takes the following required parameter:
filename
: The name of the file in the projectstorage.
-
store_data(filename, data, overwrite=False)
This function saves data in memory to storage, but does not create a new data asset. The function returns a dictionary which contains the file name, file path and additional information. Use
wslib.show()
to print the information.The function takes the following parameters:
filename
: (Required) The name of the file in the project storage.data
: (Required) The data to save as a bytes-like object.overwrite
: (Optional) Overwrites the data of a file in storage if it already exists. By default, this is set to false.
-
download_file(storage_filename, local_filename=None)
This function downloads the data in a file in storage and stores it in the specified local file. The local file is overwritten if it already existed.
The function takes the following parameters:
storage_filename
: (Required) The name of the file in storage to download.local_filename
: (Optional) The name of the file in the local file system of your runtime to downloaded the file to. Omit this parameter to use the storage file name.
-
register_asset(storage_path, asset_name=None, mime_type=None)
This function registers the file in storage as a data asset in your project. This operation fails if a data asset with the same name already exists.
You can use this function if you have very large files that you cannot upload via save_data(). You can upload large files directly to the IBM Cloud Object Storage bucket of your project, for example via the UI, and then register them as data assets using
register_asset()
.The function takes the following parameters:
storage_path
: (Required) The path of the file in storage.asset_name
: (Optional) The name of the created asset. It defaults to the file name.mime_type
: (Optional) The MIME type for the created asset. By default the MIME type is determined from the asset name suffix. Use this parameter to specify a MIME type if your file name does not have a file extension or if you want to set a different MIME type.
Note: You can register a file several times as a different data asset. Deleting one of those assets in the project also deletes the file in storage, which means that other asset references to the file might be broken.
Spark support
The entry point wslib.spark
provides functions to access files in storage with Spark. To get help information about the available functions, use help(wslib.spark.API)
.
The entry point wslib.spark
provides the following functions:
-
provide_spark_context(sc)
Use this function to enable Spark support.
The function takes the following required parameter:
- sc: The SparkContext. It is provided in the notebook runtime.
The following example shows you how to set up Spark support:
from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space({"token":"<ProjectToken>"}) wslib.spark.provide_spark_context(sc)
-
get_data_url(asset_name)
This function returns a URL to access a file in storage from Spark via Hadoop.
The function takes the following required parameter:
asset_name
: The name of the asset.
-
storage.get_data_url(file_name)
This function returns a URL to access a file in storage from Spark via Hadoop. The function expects the file name and not the asset name.
The function takes the following required parameter:
file_name
: The name of a file in the project storage.
Browse project assets
The entry point wslib.assets
provides generic, read-only access to assets of any type. For selected asset types, there are dedicated functions that provide additional data. To get help on the available functions, use help(wslib.assets.API)
.
The following naming conventions apply:
- Functions named
list_<something>
return a list of Python dictionaries. Each dictionary represents one asset and includes a small set of properties (metadata) that identifies the asset. - Functions named
get_<something>
return a single Python dictionary with the properties for the asset.
To pretty-print a dictionary or list of dictionaries, use wslib.show()
.
The functions expect either the name of an asset, or an item from a list as the parameter. By default, the functions return only a subset of the available asset properties. By setting the parameter raw=True
, you can get the full
set of asset properties.
The entry point wslib.assets
provides the following functions:
-
list_assets(asset_type, name=None, query=None, selector=None, raw=False)
This function lists all assets for the given type with respect to the given constraints.
The function takes the following parameters:
asset_type
: (Required) The type of the assets to list, for exampledata_asset
. Seelist_asset_types()
for a list of the available asset types. Use asset typeasset
for the list of all available assets in the project.name
: (Optional) The name of the asset to list. Use this parameter if more than one asset with the same name exists. You can only specify eithername
andquery
.query
: (Optional) A query string that is passed to the Watson Data API to search for assets. You can only specify eithername
andquery
.selector
: (Optional) A custom filter function on the candidate asset dictionary items. If the selector function returnsTrue
, the asset is included in the returned asset list.raw
: (Optional) Returns all of the available metadata. By default, the parameter is set toFalse
and only a subset of the properties is returned.
Examples of using the
list_assets
function:# Import the lib from ibm_watson_studio_lib import access_project_or_space wslib = access_project_or_space({"token":"<ProjectToken>"}) # List all assets in the project all_assets = wslib.assets.list_assets("asset") wslib.show(all_assets) # List all data assets with name 'MyFile.csv' assets_by_name = wslib.assets.list_assets("data_asset", name="MyFile.csv") # List all data assets whose name starts with "MyF" assets_by_query = wslib.assets.list_assets("data_asset", query="asset.name:(MyF*)") # List all data assets which are larger than 1MB sizeFilter = lambda x: x['metadata']['size'] > 1000000 large_assets = wslib.assets.list_assets("data_asset", selector=sizeFilter, raw=True) # List all notebooks notebooks = wslib.assets.list_assets("notebook")
-
list_asset_types(raw=False)
This function lists all available asset types.
The function can take the following parameter:
raw
: (Optional) Returns the full set of metadata. By default, the parameter isFalse
and only a subset of the properties is returned.
-
list_datasource_types(raw=False)
This function lists all available data source types.
The function can take the following parameter:
raw
: (Optional) Returns the full set of metadata. By default, the parameter isFalse
and only a subset of the properties is returned.
-
get_asset(name_or_item, asset_type=None, raw=False)
The function returns the metadata of an asset.
The function takes the following parameters:
name_or_item
: (Required) The name of the asset or an item like those returned bylist_assets()
asset_type
: (Optional) The type of the asset. If the parametername_or_item
contains a string for the name of the asset, settingasset_type
is required.raw
: (Optional) Returns the full set of metadata. By default, the parameter isFalse
and only a subset of the properties is returned.
Example of using the
list_assets
andget_asset
functions:notebooks = wslib.assets.list_assets('notebook') wslib.show(notebooks) notebook = wslib.assets.get_asset(notebooks[0]) wslib.show(notebook)
-
get_connection(name_or_item, with_datasourcetype=False, raw=False)
This function returns the metadata of a connection.
The function takes the following parameters:
name_or_item
: (Required) The name of the connection or an item like those returned bylist_connections()
with_datasourcetype
: (Optional) Returns additional information about the data source type of the connection.raw
: (Optional) Returns the full set of metadata. By default, the parameter isFalse
and only a subset of the properties is returned.
-
get_connected_data(name_or_item, with_datasourcetype=False, raw=False)
This function returns the metadata of a connected data asset.
The function takes the following parameters:
name_or_item
: (Required) The name of the connected data asset or an item like those returned bylist_connected_data()
with_datasourcetype
: (Optional) Returns additional information about the data source type of the associated connected data asset.raw
: (Optional) Returns the full set of metadata. By default, the parameter isFalse
and only a subset of the properties is returned.
-
get_stored_data(name_or_item, raw=False)
This function returns the metadata of a stored data asset.
The function takes the following parameters:
name_or_item
: (Required) The name of the stored data asset or an item like those returned bylist_stored_data()
raw
: (Optional) Returns the full set of metadata. By default, the parameter isFalse
and only a subset of the properties is returned.
-
list_attachments(name_or_item_or_asset, asset_type=None, raw=False)
This function returns a list of the attachments of an asset.
The function takes the following parameters:
name_or_item_or_asset
: (Required) The name of the asset or an item like those returned bylist_stored_data()
orget_asset()
.asset_type
: (Optional) The type of the asset. It defaults to typedata_asset
.raw
: (Optional) Returns the full set of metadata. By default, the parameter isFalse
and only a subset of the properties is returned.
Example of using the
list_attachments
function to read an attachment of a stored data asset:assets = wslib.list_stored_data() wslib.show(assets) asset = assets[0] attachments = wslib.assets.list_attachments(asset) wslib.show(attachments) buffer = wslib.load_data(asset, attachments[0])
Parent topic: Using ibm-watson-studio-lib