Troubleshoot Deep Learning

Here are the answers to common troubleshooting questions about using deep learning in IBM Watson Machine Learning.

Getting help and support for deep learning

If you have problems or questions when using deep learning, you can get help by searching for information or by asking questions through a forum. You can also open a support ticket.

When using the forums to ask a question, tag your question so that it is seen by the deep learning development teams.

If you have technical questions about deep learning, post your question on Stack Overflow External link icon and tag your question with "ibm-bluemix" and "deep-learning".

For questions about the service and getting started instructions, use the IBM developerWorks dW Answers External link icon forum. Include the "deep-learning" and "bluemix" tags. See Getting help for more details about using the forums.

For information about opening an IBM support ticket, or about support levels and ticket severities, see Contacting support.

Contents

Online deployment error: DL: load_model_failure: list index out of range

What's happening

The deep learning model fails to load. The list index is out of range.

Why it's happening

The Python code that is being trained has not stored the model in the correct location in IBM Cloud Object Storage.

How to fix it

If this error occurs, ensure that the Python code that is being trained stores the model in right location in IBM Cloud Object Storage.

You can adapt the following code to ensure that this is done correctly:

###############################################################################
# Set up working directories for data, model and logs.
###############################################################################

model_filename = "mnist_cnn.h5"

# writing the train model and getting input data
if environ.get('RESULT_DIR') is not None:
    output_model_folder = os.path.join(os.environ["RESULT_DIR"], "model")
    output_model_path = os.path.join(output_model_folder, model_filename)
else:
    output_model_folder = "model"
    output_model_path = os.path.join("model", model_filename)

Starting to train a model returns a RESOURCE_EXHAUSTED error

What's happening

If the output from trying to run bx ml train training.zip training.yaml is similar to:

Starting to train ...
FAILED
RESOURCE_EXHAUSTED: grpc: received message larger than max (4418860 vs. 4194304)

Why it's happening

The size of your training.zip may exceed the limit of 4Mb.

How to fix it

If your program needs to access large artifacts you can store them in IBM Cloud Object Storage with the data files (rather than in the zip file) and your program can access them in a similar way to the data, under $DATA_DIR.