CLI tutorial: Build a TensorFlow model to recognize handwritten digits using the MNIST data set
This tutorial guides you through using the MNIST computer vision data set to train a TensorFlow model to recognize handwritten digits. In this tutorial, you will train, deploy, and test the model using the IBM Watson Machine Learning command line interface (CLI).
Prerequisite
Steps overview
This tutorial presents the basic steps for training a deep learning model with Watson Machine Learning:
- Set up data files in IBM Cloud Object Storage
- Download sample code
- Train the model
- Monitor training progress and results
- Deploy the trained model
- Test the deployed model
This tutorial does not demonstrate running multiple training runs in an experiment, distributed deep learning, or using the Watson Machine Learning hyperparameter optimization feature.
Step 1: Set up data files in Cloud Object Storage
Training a deep learning model using Watson Machine Learning relies on using Cloud Object Storage for reading input (such as training data) as well as for storing results (such as log files.)
-
Download MNIST sample data files to your local computer from here: MNIST sample files
Note: Some browsers automatically uncompress the sample data files, which causes errors later in this tutorial. Follow instructions on the MNIST download page for verifying how your browser handled the files. -
Create an instance of Cloud Object Storage
.
See: Creating a Cloud Object Storage service instance -
Perform these steps in the Cloud Object Storage GUI:
-
Create two buckets: one for storing training data, and one for storing training results.
*For each bucket, make a note of theendpoint_url
and the bucket name.
See: Creating a Cloud Object Storage bucket -
Upload all of the MNIST sample data files to the training data bucket.
See: Uploading data to Cloud Object Storage -
Generate new HMAC credentials for working through this tutorial (create new credentials with
{"HMAC":true}
in the inline configuration parameters.)
*Make a note of theaccess_key_id
and thesecret_access_key
.
See: Creating HMAC credentials for Cloud Object Storage
-
Step 2: Download sample code
Training a deep learning model using Watson Machine Learning involves running one or more training runs. A training run requires two things: a .zip file containing model-building code, and metadata. When using the CLI, you specify the metadata in a training run manifest file.
-
Download sample TensorFlow model-building Python code from here: tf-model.zip
.
tf-model.zip contains two files:
- convolutional_network.py - Model-building Python code
- input_data.py - A "helper" file for reading the MNIST data files
Point of interest: The sample file convolutional_network.py demonstrates using the environment variable
$RESULT_DIR
to cause extra output to be sent to the Cloud Object Storage results bucket:
In this case, the trained model is saved in protobuf format to the results bucket. You could send any output to the results bucket using themodel_path = os.environ["RESULT_DIR"]+"/model" ... builder = tf.saved_model.builder.SavedModelBuilder(model_path)
$RESULT_DIR
variable like this. -
Download a sample training run manifest file from here: tf-train.yaml
.
Point of interest: The command in the
execution
section demonstrates using the environment variable$DATA_DIR
to cause data to be read from the Cloud Object Storage training data bucket.
Step 3: Train the model
This tutorial demonstrates running one training run.
Update the sample training run manifest file, tf-train.yaml, with your details:
Specify your IBMid email address in the
email
field.Update both the
training_data_reference
section and thetraining_results_reference
section with details of the Cloud Object Storage that you are using for this tutorial:- Update the
endpoint_url
field - Update the
access_key_id
field and thesecret_access_key
field
- Update the
The
training_data_reference
section sets up the$DATA_DIR
environment variable, and thetraining_results_reference
section sets up the$RESULT_DIR
environment variable.Submit the training run using the Machine Learning CLI:
bx ml train tf-model.zip tf-train.yaml
Example output:
Starting to train ... OK Model-ID is 'training-HrlzIHskg'
Step 4: Monitor training progress and results
-
You can monitor the progress of a training run using the CLI:
In this example, the identifier returned from the previousbx ml monitor training-runs training-HrlzIHskg
bx ml train
command, "training-HrlzIHskg", is specified. Replace that with the identifier that was returned for you. -
You can view the details of a training run using the CLI:
In this example, the identifier returned from the previousbx ml show training-runs training-HrlzIHskg
bx ml train
command, "training-HrlzIHskg", is specified. Replace that with the identifier that was returned for you. -
After the training run finishes, you can view log files and other output in the training results bucket of your Cloud Object Storage.
See: Viewing results in Cloud Object Storage
Step 5: Deploy the trained model
You can use your trained model to classify new images only after the model has been deployed.
-
Before you can deploy the model, you must store the trained model in the Watson Machine Learning repository:
In this example, the identifier returned from the previousbx ml store training-runs training-HrlzIHskg
bx ml train
command, "training-HrlzIHskg", is specified. Replace that with the identifier that was returned for you.Sample output:
OK Model store successful. Model-ID is 'a8379aaa-ea31-4c22-824d-89a01315dd6d'
-
Deploy the model:
In this example, the Model-ID returned from thebx ml deploy a8379aaa-ea31-4c22-824d-89a01315dd6d "my-first-deployment"
bx ml store
command, "a8379aaa-ea31-4c22-824d-89a01315dd6d" is specified. Replace that with the Model-ID that was returned for you.Sample output:
Deploying the model with MODEL-ID 'a8379aaa-ea31-4c22-824d-89a01315dd6d'... DeploymentId 9d6a656c-e9d4-4d89-b335-f9da40e52179 Scoring endpoint https://2000ab8b-7e81-41b3-ad07-b70f849594f5... Name my-first-deployment Type tensorflow-1.5 Runtime python-3.5 Created at 2017-11-28T12:46:19.770Z OK Deploy model successful
Step 6: Test the deployed model
You can quickly test your deployed model using the Watson Machine Learning CLI.
Download this sample payload JSON file with input data corresponding to the handwritten digits "5" and "4": tf-mnist-test-payload.json
Update the sample payload file, tf-mnist-sample-payload.json, with your model details:
modelId
: Specify the Model-ID returned from thebx ml store
commanddeploymentId
: Specify the DeploymentId returned from thebx ml deploy
command
Test the model using the Watson Machine Learning CLI:
Sample output:bx ml score tf-mnist-test-payload.json
In this output, we can see: the first input data was correctly classified as belonging to the class "5", and the second input data was correctly classified as belonging to the class "4".Fetching scoring results for the deployment '9d6a656c-e9d4-4d89-b335-f9da40e52179' ... {"classes":[5, 4]} OK Score request successful