Troubleshoot Watson Machine Learning

Last updated: Sep 27, 2024

The following are the answers to common troubleshooting questions about using IBM Watson Machine Learning.

Getting help and support for Watson Machine Learning

If you have problems or questions when you use Watson Machine Learning, you can get help by searching for information or by asking questions through a forum. You can also open a support ticket.

When you use the forums to ask a question, tag your question so that it is seen by the Watson Machine Learning development teams.

If you have technical questions about Watson Machine Learning, post your question on Stack Overflow External link icon and tag your question with ibm-bluemix and machine-learning.

For questions about the service and getting started instructions, use the IBM developerWorks dW Answers External link icon forum. You must include the machine-learning and bluemix tags.

Deploying a custom foundation model from a deployment space fails
Training an AutoAI experiment fails with service ID credentials
Creating a job for an SPSS Modeler flow in a deployment space fails
Inactive Watson Machine Learning instance
The authorization token is not provided
Invalid authorization token
The authorization token and instance_id that was used in the request are not the same
Authorization token is expired
The public key that is needed for authentication is not available
Operation that is timed out after {{timeout}}
Unhandled exception of type {{type}} with {{status}}
Unhandled exception of type {{type}} with {{response}}
Unhandled exception of type {{type}} with {{json}}
Unhandled exception of type {{type}} with {{message}}
The requested object might not be found
The underlying database reported too many requests
The definition of the evaluation is not defined either in the artifactModelVersion or in the deployment. It needs to be specified " +\n "at least in one of the places
Data module not found in IBM Federated Learning
Evaluation requires a learning configuration that is specified for the model
Evaluation requires spark instance to be provided in X-Spark-Service-Instance header
Model does not contain any version
Patch operation can modify existing learning configuration only
Patch operation expects exactly one replace operation
The payload is missing the required fields: FIELD or the values of the fields are corrupted
Provided evaluation method: METHOD is not supported. Supported values: VALUE
You can have only one active evaluation per model. The request might not be completed because of existing active evaluation: {{url}}
The deployment type {{type}} is not supported
Incorrect input: ({{message}})
Insufficient data - metric {{name}} might not be calculated
For type {{type}} spark instance must be provided in X-Spark-Service-Instance header
The action {{action}} failed with message {{message}}
Path {{path}} is not allowed. The only allowed path for Patch stream is /status
Patch operation is not allowed, for instance, of type {{$type}}
Data connection {{data}} is invalid for feedback_data_ref
Path {{path}} is not allowed. The only allowed path for Patch model is /deployed_version/url or /deployed_version/href for V2
Parsing failure: {{msg}}
Runtime environment for selected model: {{env}} is not supported for learning configuration. Supported environments: - [{{supported_envs}}]
Current plan '{{plan}}' allows {{limit}} deployments only
Database connection definition is not valid ({{code}})
Problems connecting underlying {{system}}
Error extracting X-Spark-Service-Instance header: ({{message}})
This function is forbidden for nonbeta users
{{code}} {{message}}
Rate limit exceeded
Invalid query parameter {{paramName}} value: {{value}}
Invalid token type: {{type}}
Invalid token format. You must use bearer token format.
Input JSON file is missing or invalid: 400
The authorization token expired: 401
Unknown deployment identification: 404
Internal server error: 500
Invalid type for ml_artifact: Pipeline
ValueError: Training_data_ref name and connection cannot be None, if Pipeline Artifact is not given.

Follow these tips to resolve common problems you might encounter when you work with Watson Machine Learning.

Training an AutoAI experiment fails with service ID credentials

If you are training an AutoAI experiment using the API key for the serviceID, training might fail with this error:

User specified in query parameters does not match user from token.

One way to resolve this issue is to run the experiment with your user credentials. If you want to run the experiment with credentials for the service, follow these steps to update the roles and policies for the service ID.

Open your serviceID on IBM Cloud.
Create a new serviceID or update the existing ID with the following access policy:
- All IAM Account Management services with the roles API key reviewer,User API key creator, Viewer, Operator, and Editor. Ideally it is best if they create a new apikey for this ServiceId.
The updated policy will look as follows:
Run the training again with the credentials for the updated serviceID.

Deploying a custom foundation model from a deployment space fails

When you create a deployment for a custom foundation model from your deployment space, your deployment might fail due to many reasons. Follow these tips to resolve common problems that you might encounter when you deploy your custom foundation models from a deployment space.

Case 1: Parameter value is out of range

When you create a deployment for a custom foundation model from your deployment space, you must make sure that your base model parameter values are within the specified range. For more information, see Properties and parameters for custom foundation models. If you enter a value that is beyond the specified range, you might encounter an error.

For example, the value of max_new_tokens parameter must be less than max_sequence_length. When you update the base model parameter values, if you enter a value for max_new_tokens greater than or equal to the value of max_sequence_length (2048), you might encounter an error.

The following image shows an example error message: Value must be an integer between 20 and 1000000000000000 and be greater than 'Max New Tokens'.

Example error message

If the default values for your model parameters result in an error, contact your administrator to modify the model's registry in the watsonxaiifm CR.

Case 2: Unsupported data type

You must make sure that you select a data type that is supported by your custom foundation model. When you update the base model parameter values, if you update the data type for your deployed model with an unsupported data type, your deployment might fail.

For example, the LLaMA-Pro-8B-Instruct-GPTQ model supports the float16 data type only. If you deploy the LLaMA-Pro-8B-Instruct-GPTQ model with float16 Enum, then update the Enum parameter from float16 to bfloat16, your deployment fails.

If the data type that you selected for your custom foundation model results in an error, you can override the data type for the custom foundation model during deployment creation or contact your administrator to modify the model's registry in the watsonxaiifm CR.

Case 3: Parameter value is too large

If you enter a very large value for the parameter max_sequence_length and max_new_token parameters, you might encounter an error. For example, if you set the value of max_sequence_length as 1000000000000000, you encounter the following error message:

Failed to deploy the custom foundation model. The operation failed due to 'max_batch_weight (19596417433) not large enough for (prefill) max_sequence_length (1000000000000000)'. Retry the operation. Contact IBM support if the problem persists.

You must make sure that you enter a value for the parameter which is less than value defined in model configuration file (config.json).

Case 4: `model.safetensors` file is saved with unsupported libraries

If the model.safetensors file for your custom foundation model uses an unsupported data format in the metadata header, your deployment might fail.

For example, if you import the OccamRazor/mpt-7b-storywriter-4bit-128g custom foundation model from Hugging Face to your deployment space and create an online deployment, your deployment might fail. This is because the model.safetensors file for the OccamRazor/mpt-7b-storywriter-4bit-128g model is saved with the save_pretrained, which is an unsupported library. You might receive the following error message:

The operation failed due to 'NoneType' object has no attribute 'get'.

You must make sure that your custom foundation model is saved with the supported transformers library.

Case 5: Deployment of a Llama 3.1 model fails

In your Llama 3.1 model deployment fails, try editing the contents of your model's config.json file:

Find the eos_token_id entry.
Change the value of the entry from an array to an integer.

Then try redeploying your model.

Creating a job for an SPSS Modeler flow in a deployment space fails

During the process of configuring a batch job for your SPSS Modeler flow in a deployment space, the automatic mapping of data assets with their respective connection might fail.

The image shows that the automatic mapping of data assets and connections failing

To fix the error with the automatic mapping of data assets and connections, follow these steps:

Click Create and save to save your progress and exit from the New job configuration dialog box.
In your deployment space, click the Jobs tab and select your SPSS Modeler flow job to review the details of your job.
In the job details page, click the Edit icon to manually update the mapping of your data assets and connections.
After updating the mapping of data assets and connection, you can resume with the process of configuring settings your job in the New job dialog box. For more information, see Creating deployment jobs for SPSS Modeler flows

Inactive Watson Machine Learning instance

Symptoms

After you try to submit an inference request to a foundation model by clicking the Generate button in the Prompt Lab, the following error message is displayed:

'code': 'no_associated_service_instance_error',
'message': 'WML instance {instance_id} status is not active, current status: Inactive'

Possible causes

The association between your watsonx.ai project and the related Watson Machine Learning service instance was lost.

Possible solutions

Recreate or refresh the association between your watsonx.ai project and the related Watson Machine Learning service instance. To do so, complete the following steps:

From the main menu, expand Projects, and then click View all projects.
Click your watsonx.ai project.
From the Manage tab, click Services & integrations.
If the the appropriate Watson Machine Learning service instance is listed, disassociate it temporarily by selecting the instance, and then clicking Remove. Confirm the removal.
Click Associate service.
Choose the appropriate Watson Machine Learning service instance from the list, and then click Associate.

The authorization token is not provided.

What's happening

The REST API cannot be invoked successfully.

Why it's happening

The authorization token is not provided in the Authorization header.

How to fix it

Pass the authorization token in the Authorization header.

Invalid authorization token.

What's happening

The REST API cannot be invoked successfully.

Why it's happening

The authorization token that is provided cannot be decoded or parsed.

How to fix it

Pass the correct authorization token in the Authorization header.

The authorization token and instance_id that was used in the request are not the same.

What's happening

The REST API cannot be invoked successfully.

Why it's happening

The Authorization token that is used is not generated for the service instance against which it was used.

How to fix it

Pass an authorization token in the Authorization header, which corresponds to the service instance that is being used.

The authorization token is expired.

What's happening

The REST API cannot be invoked successfully.

Why it's happening

The authorization token is expired.

How to fix it

Pass not expired authorization token in the Authorization header.

The public key that is needed for authentication is not available.

What's happening

The REST API cannot be invoked successfully.

Why it's happening

This problem can happen due to internal service issues.

How to fix it

Contact the support team.

Operation that is timed out after {{timeout}}

What's happening

The REST API cannot be invoked successfully.

Why it's happening

The timeout occurred while performing the requested operation.

How to fix it

Try to invoke the operation again.

Unhandled exception of type {{type}} with {{status}}

What's happening

The REST API cannot be invoked successfully.

Why it's happening

This problem can happen due to internal service issues.

How to fix it

Try to invoke the operation again. If it happens again, contact the support team.

Unhandled exception of type {{type}} with {{response}}

What's happening

The REST API cannot be invoked successfully.

Why it's happening

This problem can happen due to internal service issues.

How to fix it

Try to invoke the operation again. If it happens again, contact the support team.

Unhandled exception of type {{type}} with {{json}}

What's happening

The REST API cannot be invoked successfully.

Why it's happening

This problem can happen due to internal service issues.

How to fix it

Try to invoke the operation again. If it happens again, contact the support team.

Unhandled exception of type {{type}} with {{message}}

What's happening

The REST API cannot be invoked successfully.

Why it's happening

This problem can happen due to internal service issues.

How to fix it

Try to invoke the operation again. If it happens again, contact the support team.

The requested object is not found.

What's happening

The REST API cannot be invoked successfully.

Why it's happening

The request resource is not found.

How to fix it

Make sure that you are referring to the existing resource.

The underlying database reported too many requests.

What's happening

The REST API cannot be invoked successfully.

Why it's happening

The user sent too many requests in a specific time.

How to fix it

Try to invoke the operation again.

The definition of the evaluation is not defined in the artifactModelVersion or deployment. It must be specified " +\n "at least in one of the places.

What's happening

The REST API cannot be invoked successfully.

Why it's happening

Learning Configuration does not contain all the required information

How to fix it

Provide definition in learning configuration

Evaluation requires a learning configuration that is specified for the model.

What's happening

It is not possible to create learning iteration.

Why it's happening

learning configuration isn't defined for the model.

How to fix it

Create learning configuration and try to create learning iteration again.

Evaluation requires spark instance to be provided in `X-Spark-Service-Instance` header

What's happening

The REST API cannot be invoked successfully.

Why it's happening

learning configuration does not have the required information.

How to fix it

Provide spark_service in Learning Configuration or in X-Spark-Service-Instance header.

Model does not contain any version.

What's happening

It is not possible to create deployment or set the learning configuration.

Why it's happening

This problem can happen due to inconsistency that is related to the persistence of the model.

How to fix it

Try to persist the model again and try to perform the action again.

Data module not found in IBM Federated Learning.

What's happening

The data handler for IBM Federated Learning is trying to extract a data module from the FL library but is unable to find it. You might see the following error message:

ModuleNotFoundError: No module named 'ibmfl.util.datasets'

Why it's happening

Possibly an outdated DataHandler.

How to fix it

Review and update your DataHandler to conform to the most recent MNIST data handler or make sure that your sample versions are up to date.

Patch operation can modify existing learning configuration only.

What's happening

It is not possible to invoke patch REST API method to patch learning configuration.

Why it's happening

learning configuration isn't set for this model or the model does not exist.

How to fix it

Endure that model exists and has already learning configuration set.

Patch operation expects exactly one replace operation.

What's happening

The deployment cannot be patched.

Why it's happening

The patch payload contains more than one operation or the patch operation is different than replace.

How to fix it

Use only one operation in the patch payload, which is replace operation.

The payload is missing the required fields: FIELD or the values of the fields are corrupted.

What's happening

It is not possible to process action that is related to access to the underlying data set.

Why it's happening

The access to the data set is not properly defined.

How to fix it

Correct the access definition for the data set.

Provided evaluation method: METHOD is not supported. Supported values: VALUE.

What's happening

It is not possible to create learning configuration.

Why it's happening

The wrong evaluation method was used to create learning configuration.

How to fix it

Use a supported evaluation method, which is one of: regression, binary, multiclass.

You can have only one active evaluation per model. The request cannot be completed because of existing active evaluation: {{url}}

What's happening

It is not possible to create another learning iteration.

Why it's happening

You can have only one running evaluation for the model.

How to fix it

See the already running evaluation or wait for the evaluation to end and start the new one.

The deployment type {{type}} is not supported.

What's happening

It is not possible to create the deployment.

Why it's happening

Not supported deployment type was used.

How to fix it

A supported deployment type must be used.

Incorrect input: ({{message}})

What's happening

The REST API cannot be invoked successfully.

Why it's happening

This problem happens due to an issue with parsing JSON.

How to fix it

Make sure that the correct JSON is passed in the request.

Insufficient data - metric {{name}} cannot be calculated

What's happening

Learning iteration failed.

Why it's happening

Value for metric with defined threshold cannot be calculated because of insufficient feedback data.

How to fix it

Review and improve data in data source feedback_data_ref in learning configuration

For type {{type}} spark instance must be provided in `X-Spark-Service-Instance` header

What's happening

Deployment cannot be created

Why it's happening

batch and streaming deployments require spark instance to be provided

How to fix it

Provide spark instance in X-Spark-Service-Instance header

Action {{action}} failed with message {{message}}

What's happening

The REST API cannot be invoked successfully.

Why it's happening

This problem happens due to an issue with invoking underlying service.

How to fix it

If the message provides a suggestion to fix the issue, follow the suggestion. Otherwise, contact the support team.

Path `{{path}}` is not allowed. The only allowed path for patch stream is `/status`

What's happening

Stream deployment cannot be patched.

Why it's happening

The wrong path was used to patch the stream deployment.

How to fix it

Patch the stream deployment with supported path option, which is /status (it allows to start/stop stream processing).

Patch operation is not allowed, for instance, of type `{{$type}}`

What's happening

Deployment cannot be patched.

Why it's happening

The wrong deployment type is being patched.

How to fix it

Patch the stream deployment type.

Data connection `{{data}}` is invalid for feedback_data_ref

What's happening

learning configuration cannot be created for the model.

Why it's happening

Supported data source was not used when feedback_data_ref was defined.

How to fix it

Use only the supported data source type dashdb.

Path {{path}} is not allowed. The only allowed path for patch model is `/deployed_version/url` or `/deployed_version/href` for V2

What's happening

No option to patch model.

Why it's happening

The wrong path was used during patching of the model.

How to fix it

Patch model with supported path that you can use to update the version of the deployed model.

Parsing failure: {{msg}}

What's happening

The REST API cannot be invoked successfully.

Why it's happening

The requested payload cannot be parsed successfully.

How to fix it

Make sure that your request payload is correct and can be parsed correctly.

Runtime environment for selected model: {{env}} is not supported for `learning configuration`. Supported environments: [{{supported_envs}}].

What's happening

No option to create learning configuration.

Why it's happening

The model for which the learning_configuration was tried to be created is not supported.

How to fix it

Create learning configuration for model, which has the supported runtime.

Current plan '{{plan}}' allows {{limit}} deployments only

What's happening

It is not possible to create the deployment.

Why it's happening

The limit for number of deployments was reached for the current plan.

How to fix it

Upgrade to the plan that does not have such limitation.

Database connection definition is not valid ({{code}})

What's happening

It is not possible to use the learning configuration function.

Why it's happening

The database connection definition is invalid.

How to fix it

Try to fix the issue that is described by code returned by the underlying database.

Problems connecting underlying {{system}}

What's happening

The REST API cannot be invoked successfully.

Why it's happening

This problem might happen due to an issue during connection to the underlying system. It might be a temporary network issue.

How to fix it

Try to invoke the operation again. If you get an error again, contact the support team.

Error extracting X-Spark-Service-Instance header: ({{message}})

What's happening

This problem might happen if REST API that requires Spark credentials cannot be invoked.

Why it's happening

This problem might happen due to an issue with base-64 decoding or parsing Spark credentials.

How to fix it

Make sure that the correct Spark credentials were correctly base-64 encoded. For more information, see the documentation.

This function is forbidden for non-beta users.

What's happening

The REST API cannot be invoked successfully.

Why it's happening

The REST API that was invoked is in beta.

How to fix it

If you are interested in participating, add yourself to the wait list. The details can be found in the documentation.

What's happening

The REST API cannot be invoked successfully.

Why it's happening

This problem might happen due to an issue with invoking underlying service.

How to fix it

If the message provides a suggestion to fix the issue, follow the suggestion. Otherwise, contact the support team.

Rate limit exceeded.

What's happening

Rate limit exceeded.

Why it's happening

The rate limit for current plan is exceeded.

How to fix it

To solve this problem, acquire another plan with a greater rate limit

Invalid query parameter `{{paramName}}` value: {{value}}

What's happening

Validation error as passed incorrect value for query parameter.

Why it's happening

Error in getting result for query.

How to fix it

Correct query parameter value. The details can be found in the documentation.

Invalid token type: {{type}}

What's happening

Error regarding token type.

Why it's happening

Error in authorization.

How to fix it

Token must be started with Bearer prefix.

Invalid token format. You must use bearer token format.

What's happening

Error regarding token format.

Why it's happening

Error in authorization.

How to fix it

The token must be a bearer token and must start with Bearer prefix.

Input JSON file is missing or invalid: 400

What's happening

The following message displays when you try to score online: Input JSON file is missing or invalid.

Why it's happening

This message displays when the scoring input payload doesn't match the expected input type that is required for scoring the model. Specifically, the following reasons might apply:

The input payload is empty.
The input payload schema is not valid.
The input data types do not match the expected data types.

How to fix it

Correct the input payload. Make sure that the payload has correct syntax, a valid schema, and proper data types. After you make corrections, try to score online again. For syntax issues, verify the JSON file by using the jsonlint command.

The authorization token is expired: 401

What's happening

The following message displays when you try to score online: Authorization failed.

Why it's happening

This message displays when the token that is used for scoring is expired.

How to fix it

Regenerate the token for this IBM Watson Machine Learning instance and then retry. If you still see this issue contact IBM Support.

Unknown deployment identification: 404

What's happening

The following message displays when you try to score online Unknown deployment identification.

Why it's happening

This message displays when the deployment ID that is used for scoring does not exist.

How to fix it

Make sure you are providing the correct deployment ID. If not, deploy the model with the deployment ID and then try scoring it again.

Internal server error: 500

What's happening

The following message displays when you try to score online: Internal server error

Why it's happening

This message displays if the downstream data flow on which the online scoring depends fails.

How to fix it

Wait for some time and try to score online again. If it fails again, contact IBM Support.

Invalid type for ml_artifact: Pipeline

What's happening

The following message displays when you try to publish a Spark model by using Common API client library on your workstation.

Why it's happening

This message displays if you have an invalid pyspark set up in the operating system.

How to fix it

Set up system environment paths according to the instruction:

SPARK_HOME={installed_spark_path}
JAVA_HOME={installed_java_path}
PYTHONPATH=$SPARK_HOME/python/

ValueError: Training_data_ref name and connection cannot be None, if Pipeline Artifact is not given.

What's happening

The training data set is missing or is not referenced properly.

Why it's happening

The Pipeline Artifact is a training data set in this instance.

How to fix it

You must supply a training data set when you persist a Spark PipelineModel. If you don't, the client says it doesn't support PipelineModels, rather than saying a PipelineModel must be accompanied by the training set.

Getting help and support for Watson Machine Learning

Contents

Training an AutoAI experiment fails with service ID credentials

Deploying a custom foundation model from a deployment space fails

Case 1: Parameter value is out of range

Case 2: Unsupported data type

Case 3: Parameter value is too large

Case 4: model.safetensors file is saved with unsupported libraries

Case 5: Deployment of a Llama 3.1 model fails

Creating a job for an SPSS Modeler flow in a deployment space fails

Inactive Watson Machine Learning instance

Symptoms

Possible causes

Possible solutions

The authorization token is not provided.

What's happening

Why it's happening

How to fix it

Invalid authorization token.

What's happening

Why it's happening

How to fix it

The authorization token and instance_id that was used in the request are not the same.

What's happening

Why it's happening

How to fix it

The authorization token is expired.

What's happening

Why it's happening

How to fix it

The public key that is needed for authentication is not available.

What's happening

Why it's happening

How to fix it

Operation that is timed out after {{timeout}}

What's happening

Why it's happening

How to fix it

Unhandled exception of type {{type}} with {{status}}

What's happening

Why it's happening

How to fix it

Unhandled exception of type {{type}} with {{response}}

What's happening

Why it's happening

How to fix it

Unhandled exception of type {{type}} with {{json}}

What's happening

Why it's happening

How to fix it

Unhandled exception of type {{type}} with {{message}}

What's happening

Why it's happening

How to fix it

The requested object is not found.

What's happening

Why it's happening

How to fix it

The underlying database reported too many requests.

What's happening

Why it's happening

How to fix it

The definition of the evaluation is not defined in the artifactModelVersion or deployment. It must be specified " +\n "at least in one of the places.

What's happening

Why it's happening

How to fix it

Evaluation requires a learning configuration that is specified for the model.

What's happening

Why it's happening

How to fix it

Evaluation requires spark instance to be provided in X-Spark-Service-Instance header

What's happening

Why it's happening

How to fix it

Model does not contain any version.

What's happening

Why it's happening

How to fix it

Data module not found in IBM Federated Learning.

What's happening

Case 4: `model.safetensors` file is saved with unsupported libraries

Evaluation requires spark instance to be provided in `X-Spark-Service-Instance` header

For type {{type}} spark instance must be provided in `X-Spark-Service-Instance` header

Path `{{path}}` is not allowed. The only allowed path for patch stream is `/status`

Patch operation is not allowed, for instance, of type `{{$type}}`

Data connection `{{data}}` is invalid for feedback_data_ref

Path {{path}} is not allowed. The only allowed path for patch model is `/deployed_version/url` or `/deployed_version/href` for V2

Runtime environment for selected model: {{env}} is not supported for `learning configuration`. Supported environments: [{{supported_envs}}].