Watson Machine Learning Python client tutorial: Build a TensorFlow model to recognize handwritten digits using the MNIST data set

This tutorial guides you through using the MNIST computer vision data set to train a TensorFlow model to recognize handwritten digits. In this tutorial, you will train, deploy, and test the model using the IBM Watson Machine Learning Python client from a notebook in IBM Watson Studio. The training is done by running a training run on Watson Machine Learning specialized infrastructure, not by running TensorFlow code directly in the notebook environment.

 

Prerequisites

 

Steps overview

This tutorial presents the basic steps for training a deep learning model by running a training run on Watson Machine Learning infrastructure using the Watson Machine Learning Python client from a notebook in Watson Studio:

  1. Set up data files in IBM Cloud Object Storage
  2. Download sample code
  3. Train the model in a training run
  4. Monitor training progress and results
  5. Deploy the trained model
  6. Test the deployed model

This tutorial does not demonstrate distributed deep learning, or using the Watson Machine Learning hyperparameter optimization feature.

 

Step 1: Set up data files in Cloud Object Storage

Running a training run on Watson Machine Learning infrastructure relies on using Cloud Object Storage for reading input (such as training data) as well as for storing results (such as log files.)

  1. Download MNIST sample data files to your local computer from here: MNIST sample files external link
    Note: Some browsers automatically uncompress the sample data files, which causes errors later in this tutorial. Follow instructions on the MNIST download page for verifying how your browser handled the files.

  2. Open the Cloud Object Storage GUI:

    1. From the Services drop-down menu in Watson Studio, choose "Data Services".
    2. Select "Manage in IBM Cloud" from the ACTIONS menu beside the service instance of Cloud Object Storage that is associated with your deep learning project. (This opens the Cloud Object Storage GUI.)

  3. Perform these steps in the Cloud Object Storage GUI:

    1. Create two buckets: one for storing training data, and one for storing training results.
      *For each bucket, make a note of the endpoint_url and the bucket name.
      See: Creating a Cloud Object Storage bucket

    2. Upload all of the MNIST sample data files to the training data bucket.
      See: Uploading data to Cloud Object Storage

    3. Generate new HMAC credentials for working through this tutorial (create new credentials with {"HMAC":true} in the inline configuration parameters.)
      *Make a note of the access_key_id and the secret_access_key.
      See: Creating HMAC credentials for Cloud Object Storage

 

Step 2: Download sample model-building code to your notebook working directory

In this tutorial, you don't have to code the model from scratch. Instead, you'll use these sample Python files that build a TensorFlow model: tf-model.zip external link .

Note: You do not have to download the sample model-building code to your local computer to complete the tutorial. However, you can download it, if you are just interested to see how it works.

  1. Create a notebook in your Watson Studio project:

    1. Click Add to project, and then choose "NOTEBOOK".
    2. Specify a name for the notebook.
    3. Accept the default language, Python, and accept the default runtime.
    4. Click Create Notebook.

  2. Install a required library into your notebook environment by running this code in a cell in your notebook:

    !pip install --upgrade wget
    

  3. Download the sample model-building code to your notebook working directory by running this code in a cell in your notebook:

    import os
    import wget
    filename = 'tf-model.zip'
    url = 'https://github.com/pmservice/wml-sample-models/blob/master/tensorflow/hand-written-digit-recognition/definition/tf-model.zip?raw=true'
    if not os.path.isfile( filename ): wget.download( url )
    

 

Step 3: Train the model by running a training run

In this tutorial, you don't train the model by running the sample model-building code in your notebook directly. Instead, you submit a job to run the model-building code in a training run on Watson Machine Learning specialized, high-performance infrastructure.

Submitting a training run requires two things:

  • A .zip file containing model-building code
  • Metadata

When using the Python client, you can specify the metadata in your Python code, or in a training run manifest file. This tutorial demonstrates specifying metadata in the Python code. The CLI MNIST tutorial demonstrates specifying training run metadata in a manifest file.

Terminology note: The model-building code and the metadata together are referred to as a training definition. To keep those terms straight, you can think of "training definition" as being short for "training run definition".

  1. Install the Watson Machine Learning python client library into your notebook environment by running this code in a cell in your notebook:

    !pip install watson-machine-learning-client
    

  2. Instantiate a client object by running this code in a cell in your notebook:

    Note: Replace *** here with the Watson Machine Learning service credentials you collected as a prerequisite at the beginning of this tutorial. These are not the Cloud Object Storage service credentials you created in step 1.

    from watson_machine_learning_client import WatsonMachineLearningAPIClient
    wml_credentials = { "url"         : "https://ibm-watson-ml.mybluemix.net",
                        "username"    : "***",
                        "password"    : "***",
                        "instance_id" : "***"
                       }
    client = WatsonMachineLearningAPIClient( wml_credentials )
    

  3. Store the training definition in the Watson Machine Learning repository by running this code in a cell in your notebook:

    Note: Replace *** with your IBMid email.

    metadata = {
    client.repository.DefinitionMetaNames.NAME              : "python-client-tutorial_training-definition",
    client.repository.DefinitionMetaNames.AUTHOR_EMAIL      : "***",
    client.repository.DefinitionMetaNames.FRAMEWORK_NAME    : "tensorflow",
    client.repository.DefinitionMetaNames.FRAMEWORK_VERSION : "1.5",
    client.repository.DefinitionMetaNames.RUNTIME_NAME      : "python",
    client.repository.DefinitionMetaNames.RUNTIME_VERSION   : "3.5",
    client.repository.DefinitionMetaNames.EXECUTION_COMMAND : "python3 convolutional_network.py --trainImagesFile ${DATA_DIR}/train-images-idx3-ubyte.gz --trainLabelsFile ${DATA_DIR}/train-labels-idx1-ubyte.gz --testImagesFile ${DATA_DIR}/t10k-images-idx3-ubyte.gz --testLabelsFile ${DATA_DIR}/t10k-labels-idx1-ubyte.gz --learningRate 0.001 --trainingIters 20000"
    }
    definition_details = client.repository.store_definition( "tf-model.zip", meta_props=metadata )
    definition_uid     = client.repository.get_definition_uid( definition_details )
    print( "definition_uid: ", definition_uid )
    

  4. Define training run metadata by running this code in a cell in your notebook:

    Note: Replace *** here with your IBMid email, the bucket names and endpoint URLs of the buckets you created in step 1, as well as the Cloud Object Storage service credentials you created in step 1.

    metadata = {
    client.training.ConfigurationMetaNames.NAME         : "python-client-tutorial_training-run",
    client.training.ConfigurationMetaNames.AUTHOR_EMAIL : "***",
    client.training.ConfigurationMetaNames.TRAINING_DATA_REFERENCE : {
       "connection" : { 
          "endpoint_url"      : "***",
          "access_key_id"     : "***",
          "secret_access_key" : "***"
          },
       "source" : { 
          "bucket" : "***",
          },
          "type" : "s3"
       },
    client.training.ConfigurationMetaNames.TRAINING_RESULTS_REFERENCE: {
       "connection" : {
          "endpoint_url"      : "***",
          "access_key_id"     : "***",
          "secret_access_key" : "***"
          },
          "target" : {
             "bucket" : "***",
          },
          "type" : "s3"
       }
    }
    

  5. Start the training run by running this code in a cell in your notebook:

    run_details = client.training.run( definition_uid, meta_props=metadata )
    run_uid     = client.training.get_run_uid( run_details )
    print( "run_uid: ", run_uid )
    

 

Step 4: Monitor training progress and results

  • You can monitor the progress of a training run by running this code in a cell in your notebook:

    client.training.get_status( run_uid )
    

  • After the training run finishes, you can view log files and other output in the training results bucket of your Cloud Object Storage.
    See: Viewing results in Cloud Object Storage

 

Step 5: Deploy the trained model

You can use your trained model to classify new images only after the model has been deployed.

  1. Before you can deploy the model, you must store the trained model in the Watson Machine Learning repository. Store the trained model by running this code in a cell in your notebook:

    stored_model_name    = "python-client-tutorial_model"
    stored_model_details = client.repository.store_model( run_uid, stored_model_name )
    model_uid            = client.repository.get_model_uid( stored_model_details )
    print( "model_uid: ", model_uid )
    

  2. Deploy the stored model by running this code in a cell in your notebook:

    deploymnt_name   = "MNIST-handwriting"
    deployment_desc  = "Online deployment of Python client tutorial model"
    deployment       = client.deployments.create( model_uid, deploymnt_name, deployment_desc )
    scoring_endpoint = client.deployments.get_scoring_url( deployment )
    print( "scoring_endpoint: ", scoring_endpoint )
    

 

Step 6: Test the deployed model

  1. Download a sample file with images of the handwritten digits "5" and "4" by running this code in a cell in your notebook:

    filename = 'tf-mnist-test-payload.json'
    url = 'https://raw.githubusercontent.com/pmservice/wml-sample-models/master/tensorflow/hand-written-digit-recognition/test-data/tf-mnist-test-payload.json'
    if not os.path.isfile( filename ): wget.download( url )
    

  2. Load the contents of the file into a structure by running this code in a cell in your notebook:

    import json
    with open( 'tf-mnist-test-payload.json' ) as data_file: test_data = json.load( data_file )
    payload = test_data[ 'payload' ]
    

  3. Classify the images in the test data by running this code in a cell in your notebook:

    client.deployments.score( scoring_endpoint, payload )
    
    The results should be:
    {'values': [5, 4]}