CLI tutorial: Build a TensorFlow model to recognize handwritten digits using the MNIST data set

This tutorial guides you through using the MNIST computer vision data set to train a TensorFlow model to recognize handwritten digits. In this tutorial, you will train, deploy, and test the model using the IBM Watson Machine Learning command line interface (CLI).

 

Prerequisite

 

Steps overview

This tutorial presents the basic steps for training a deep learning model with Watson Machine Learning:

  1. Set up data files in IBM Cloud Object Storage
  2. Download sample code
  3. Train the model
  4. Monitor training progress and results
  5. Deploy the trained model
  6. Test the deployed model

This tutorial does not demonstrate running multiple training runs in an experiment, distributed deep learning, or using the Watson Machine Learning hyperparameter optimization feature.

 

Step 1: Set up data files in Cloud Object Storage

Training a deep learning model using Watson Machine Learning relies on using Cloud Object Storage for reading input (such as training data) as well as for storing results (such as log files.)

  1. Download MNIST sample data files to your local computer from here: MNIST sample files external link
    Note: Some browsers automatically uncompress the sample data files, which causes errors later in this tutorial. Follow instructions on the MNIST download page for verifying how your browser handled the files.

  2. Create an instance of Cloud Object Storage external link .
    See: Creating a Cloud Object Storage service instance

  3. Perform these steps in the Cloud Object Storage GUI:

    1. Create two buckets: one for storing training data, and one for storing training results.
      *For each bucket, make a note of the endpoint_url and the bucket name.
      See: Creating a Cloud Object Storage bucket

    2. Upload all of the MNIST sample data files to the training data bucket.
      See: Uploading data to Cloud Object Storage

    3. Generate new HMAC credentials for working through this tutorial (create new credentials with {"HMAC":true} in the inline configuration parameters.)
      *Make a note of the access_key_id and the secret_access_key.
      See: Creating HMAC credentials for Cloud Object Storage

 

Step 2: Download sample code

Training a deep learning model using Watson Machine Learning involves running one or more training runs. A training run requires two things: a .zip file containing model-building code, and metadata. When using the CLI, you specify the metadata in a training run manifest file.

  1. Download sample TensorFlow model-building Python code from here: tf-model.zip external link .

    tf-model.zip contains two files:

    1. convolutional_network.py - Model-building Python code
    2. input_data.py - A "helper" file for reading the MNIST data files

    Point of interest: The sample file convolutional_network.py demonstrates using the environment variable $RESULT_DIR to cause extra output to be sent to the Cloud Object Storage results bucket:

    model_path = os.environ["RESULT_DIR"]+"/model"
    ...
    builder = tf.saved_model.builder.SavedModelBuilder(model_path)
    
    In this case, the trained model is saved in protobuf format to the results bucket. You could send any output to the results bucket using the $RESULT_DIR variable like this.

  2. Download a sample training run manifest file from here: tf-train.yaml external link .

    Point of interest: The command in the execution section demonstrates using the environment variable $DATA_DIR to cause data to be read from the Cloud Object Storage training data bucket.

 

Step 3: Train the model

This tutorial demonstrates running one training run.

  1. Update the sample training run manifest file, tf-train.yaml, with your details:

    1. Specify your IBMid email address in the email field.

    2. Update both the training_data_reference section and the training_results_reference section with details of the Cloud Object Storage that you are using for this tutorial:

      • Update the endpoint_url field
      • Update the access_key_id field and the secret_access_key field

    The training_data_reference section sets up the $DATA_DIR environment variable, and the training_results_reference section sets up the $RESULT_DIR environment variable.

  2. Submit the training run using the Machine Learning CLI:

    bx ml train tf-model.zip tf-train.yaml

    Example output:

    Starting to train ...
    OK
    Model-ID is 'training-HrlzIHskg'
    

 

Step 4: Monitor training progress and results

  • You can monitor the progress of a training run using the CLI:

    bx ml monitor training-runs training-HrlzIHskg
    
    In this example, the identifier returned from the previous bx ml train command, "training-HrlzIHskg", is specified. Replace that with the identifier that was returned for you.

  • You can view the details of a training run using the CLI:

    bx ml show training-runs training-HrlzIHskg
    
    In this example, the identifier returned from the previous bx ml train command, "training-HrlzIHskg", is specified. Replace that with the identifier that was returned for you.

  • After the training run finishes, you can view log files and other output in the training results bucket of your Cloud Object Storage.
    See: Viewing results in Cloud Object Storage

 

Step 5: Deploy the trained model

You can use your trained model to classify new images only after the model has been deployed.

  1. Before you can deploy the model, you must store the trained model in the Watson Machine Learning repository:

    bx ml store training-runs training-HrlzIHskg
    
    In this example, the identifier returned from the previous bx ml train command, "training-HrlzIHskg", is specified. Replace that with the identifier that was returned for you.

    Sample output:

    OK
    Model store successful. 
    Model-ID is 'a8379aaa-ea31-4c22-824d-89a01315dd6d'
    

  2. Deploy the model:

    bx ml deploy a8379aaa-ea31-4c22-824d-89a01315dd6d "my-first-deployment"
    
    In this example, the Model-ID returned from the bx ml store command, "a8379aaa-ea31-4c22-824d-89a01315dd6d" is specified. Replace that with the Model-ID that was returned for you.

    Sample output:

    Deploying the model with MODEL-ID 'a8379aaa-ea31-4c22-824d-89a01315dd6d'...
    DeploymentId       9d6a656c-e9d4-4d89-b335-f9da40e52179
    Scoring endpoint   https://2000ab8b-7e81-41b3-ad07-b70f849594f5...
    Name               my-first-deployment
    Type               tensorflow-1.5
    Runtime            python-3.5
    Created at         2017-11-28T12:46:19.770Z
    OK
    Deploy model successful
    

 

Step 6: Test the deployed model

You can quickly test your deployed model using the Watson Machine Learning CLI.

  1. Download this sample payload JSON file with input data corresponding to the handwritten digits "5" and "4": tf-mnist-test-payload.json external link

  2. Update the sample payload file, tf-mnist-sample-payload.json, with your model details:

    • modelId: Specify the Model-ID returned from the bx ml store command
    • deploymentId: Specify the DeploymentId returned from the bx ml deploy command

  3. Test the model using the Watson Machine Learning CLI:

    bx ml score tf-mnist-test-payload.json
    
    Sample output:
    Fetching scoring results for the deployment '9d6a656c-e9d4-4d89-b335-f9da40e52179' ...
    {"classes":[5, 4]}
    OK
    Score request successful
    
    In this output, we can see: the first input data was correctly classified as belonging to the class "5", and the second input data was correctly classified as belonging to the class "4".