Using Cloud Object Storage for deep learning with Watson Machine Learning

Training a deep learning model using IBM Watson Machine Learning relies on using IBM Cloud Object Storage for reading input (such as training data) as well as for storing results (such as log files.) This topic describes how to perform deep learning-related administrative tasks with Cloud Object Storage.

About this topic

Working in a project in Watson Studio simplifies the tasks described in this topic:

  • An instance of the Cloud Object Storage service is automatically created, if needed.
  • Cloud Object Storage HMAC credentials are automatically generated.
  • When you use the experiment builder, buckets for training data and for result are created and used in Cloud Object Storage automatically.

However, you can train deep learning models using Watson Machine Learning without using a Watson Studio project. For example, you can use the IBM Cloud graphical interface to perform service-related administrative tasks, and you can use the Watson Machine Learning CLI to train and deploy your model.

This topic describes how to perform Cloud Object Storage tasks outside of a Watson Studio project:

 

Creating a Cloud Object Storage instance

When you create a machine learning project in Watson Studio, an instance of the Cloud Object Storage service is created for you. If you are not working in a Watson Studio project, there are two ways to create a new Cloud Object Storage service instance, depending on where you start:

  • Watson Studio
  • IBM Cloud

Option 1: From Watson Studio

  1. From the Services menu in Watson Studio, choose "Data Services".
  2. Click Add service.
  3. Click Add on the Cloud Object Storage tile.
  4. Choose a plan and click Create.

Option 2: From IBM Cloud

You can create a new instance of the Cloud Object Storage service from the IBM Cloud catalog:

 

Credentials

Service credentials are what apps and tools use to authenticate with a service like Cloud Object Storage.

Cloud Object Storage GUI

You can look up credentials or generate new credentials on the Cloud Object Storage GUI.

There are two ways to get to the Cloud Object Storage GUI, depending on where you start:

  • Option A: From Watson Studio

    1. From the Services menu in Watson Studio, choose "Data Services".
    2. Select "Manage in IBM Cloud" from the ACTIONS menu beside the Cloud Object Storageservice instance for which you want to create credentials. (This opens the Cloud Object Storage GUI.)

  • Option B: From IBM Cloud

    1. Log in to IBM Cloud external link. (This takes you to your IBM Cloud dashboard.)
    2. In your IBM Cloud dashboard, click the Watson Machine Learning service instance for which you want to retrieve credentials. (This opens the Watson Machine Learning GUI.)

Generating HMAC credentials

Watson Machine Learning uses HMAC authentication external link to integrate with Cloud Object Storage. When you generate Cloud Object Storage service credentials for use with Watson Machine Learning, you need to generate HMAC-compatible credentials as described here.

  1. In the Cloud Object Storage GUI, click Service credentials.

  2. Click the New credential button.

  3. Fill in the fields of the Add new credential window:

    • Enter a name that indicates what these credentials will be used for, such as: "WML-tutorial-credentials".

    • Select "Writer" from the drop-down menu labeled Access role. (Choose that role because Watson Machine Learning needs permission to read and write data, as well as create and destroy buckets and objects in them.)
      See also: Bucket permissions external link

    • In the text box labeled Add Inline Configuration Parameters, enter the following text:

      {"HMAC":true}
      
      See also: Using HMAC credentials external link

 

Creating a bucket

In Cloud Object Storage files are grouped in buckets. (You could think of buckets like directories, except there are no subdirectories in buckets.) When training machine learning models, keep things organized by creating one bucket to contain training data and one bucket for training results output.

  1. In the Cloud Object Storage GUI, click Buckets.

  2. Click Create bucket.

  3. Give the bucket a name that is unique and helpful for remembering what the bucket is being used for (over time, you might create many buckets!)

  4. Set Resiliency to "Regional". In general, regional resiliency will have the best performance with the lowest cost. However if the ability to survive a regional outage is essential to you, set the resiliency to "Cross Region".

  5. For best performance, set Location to the same location as your Watson Machine Learning service instance. (You can determine the location of your Watson Machine Learning service instance by looking at the "url" field of your service credentials. See: Looking up your service credentials )

  6. Usually, the default for Storage class is suitable for use with Watson Machine Learning. And usually, you won't need to use ADVANCED CONFIGURATION.

See also:

Note: You can also create a bucket using the AWS CLI.

 

Configuring training runs to use your buckets

You need to specify details of your training data bucket and your results bucket to Watson Machine Learning:

  • In the training run manifest file:

    • Specify details of your training data bucket in training_data_reference
    • Specify details of your results bucket in training_results_reference

  • In your model-building code, use the environment variable $RESULT_DIR to send additional output to the result bucket.

 

Looking up the endpoint URL for a bucket

Files are stored in Cloud Object Storage grouped in buckets. To read training data or write log files in a bucket in Cloud Object Storage, the Watson Machine Learning service locates the bucket using an endpoint URL. You can look up the endpoint URL for a Cloud Object Storage bucket in the Cloud Object Storage GUI.

  1. In the Cloud Object Storage GUI, click Buckets.

  2. Search or scroll to find the bucket you want the endpoint URL for, and then click the bucket in the list.

  3. In the navigation menu, click Configuration.

  4. In the Overview panel of the bucket configuration information, scroll down to view the Endpoints list for the bucket.

  5. When using a bucket with Watson Machine Learning, you should use the private, regional endpoint URL for the region closest to you.
    See also: Cloud Object Storage regions and endpoints external link

 

Uploading data files

The Watson Machine Learning service reads training data from Cloud Object Storage according to training configuration you set. After creating a bucket for training data, upload your training data to your training data bucket.

  1. In the Cloud Object Storage GUI, click Buckets.

  2. Search or scroll to find the bucket that you want to upload training data into, and click the bucket in the list.

  3. Click Add objects.

  4. Click Add files, follow the prompts, and then click Upload.

Note: You can also upload files to a bucket using the AWS CLI.

 

Viewing training results

The Watson Machine Learning service writes output (such as log files) to Cloud Object Storage according to training configuration you set. After training your model, you can download log files and other output from the results bucket.

  1. In the Cloud Object Storage GUI of your Cloud Object Storage instance, click Buckets.

  2. Search or scroll to find the results bucket, and click the bucket in the list.

  3. Hover over the row of the file you want to view to cause a three-dots menu to appear: ( Three-dots menu icon )

  4. Click the three-dots menu, and then select "Download".

Note: You can also view the contents of a bucket using the AWS CLI.

File naming conventions

There are no "subdirectories" in buckets. Buckets contain only objects, which you can think of as equivalent to files.

You can use naming schemes to identify files that are logically related to each other. And you can include the forward-slash character (/) in file names to give the appearance of a familiar directory-like structure.

For example after running a training run with identifier "training-HrlzIHskg", the following files appeared in the training results bucket:

training-HrlzIHskg/learner-1/load-data.log
training-HrlzIHskg/learner-1/load-model.log
training-HrlzIHskg/learner-1/training-log.txt
training-HrlzIHskg/model/saved_model.pb
training-HrlzIHskg/model/variables/variables.data-00000-of-00001
training-HrlzIHskg/model/variables/variables.index
training-HrlzIHskg/saved_model.tar.gz

In this example, you can see these naming conventions in use:

  • The file names all start with "training-HrlzIHskg"
  • Log files have "/learner-1/" in their name
  • Model artifact files have "/model" in their name

 

AWS CLI

The instructions in this topic have all described how to perform Cloud Object Storage tasks using the Cloud Object Storage graphical interface. You can also perform Cloud Object Storage tasks using the AWS CLI.

See: Using the AWS CLI with Cloud Object Storage external link

Examples

Note: These examples use the private, region endpoint "s3-api.dal-us-geo.objectstorage.service.networklayer.com". You can look up the correct endpoint for your Cloud Object Storage service instance by clicking Endpoint in the Cloud Object Storage GUI.

Creating a bucket called "my-training-data"

aws --endpoint-url=s3-api.dal-us-geo.objectstorage.service.networklayer.com --profile ibm_cos s3 mb s3://my-training-data

Uploading a file called "my-file.txt" to a bucket called "my-training-data"

aws --endpoint-url=s3-api.dal-us-geo.objectstorage.service.networklayer.com --profile ibm_cos s3 cp my-file.txt s3://my-training-data

Listing the files in a bucket called "my-training-data"

aws --endpoint-url=s3-api.dal-us-geo.objectstorage.service.networklayer.com --profile ibm_cos s3 ls s3://my-training-data

Downloading a file called "training--i9JH47mR/learner-1/training-log.txt" from a bucket called "my-results"

aws --endpoint-url=s3-api.dal-us-geo.objectstorage.service.networklayer.com --profile ibm_cos s3 cp s3://my-results/training--i9JH47mR/learner-1/training-log.txt training-log.txt

See also: AWS CLI command reference external link