The following are the answers to common troubleshooting questions about using IBM Watson Machine Learning.
Getting help and support for Watson Machine Learning
If you have problems or questions when you use Watson Machine Learning, you can get help by searching for information or by asking questions through a forum. You can also open a support ticket.
When you use the forums to ask a question, tag your question so that it is seen by the Watson Machine Learning development teams.
If you have technical questions about Watson Machine Learning, post your question on Stack Overflow and tag your question with ibm-bluemix and machine-learning.
For questions about the service and getting started instructions, use the IBM developerWorks dW Answers forum. You must include the machine-learning and bluemix tags.
Contents
-
Deploying a custom foundation model from a deployment space fails
-
Training an AutoAI experiment fails with service ID credentials
-
Creating a job for an SPSS Modeler flow in a deployment space fails
-
The authorization token and instance_id that was used in the request are not the same
-
The public key that is needed for authentication is not available
-
Evaluation requires a learning configuration that is specified for the model
-
Evaluation requires spark instance to be provided in
X-Spark-Service-Instance
header -
Patch operation can modify existing learning configuration only
-
The payload is missing the required fields: FIELD or the values of the fields are corrupted
-
Provided evaluation method: METHOD is not supported. Supported values: VALUE
-
For type {{type}} spark instance must be provided in
X-Spark-Service-Instance
header -
Path
{{path}}
is not allowed. The only allowed path for Patch stream is/status
-
Patch operation is not allowed, for instance, of type
{{$type}}
-
Error extracting X-Spark-Service-Instance header: ({{message}})
-
ValueError: Training_data_ref name and connection cannot be None, if Pipeline Artifact is not given.
Follow these tips to resolve common problems you might encounter when you work with Watson Machine Learning.
Training an AutoAI experiment fails with service ID credentials
If you are training an AutoAI experiment using the API key for the serviceID, training might fail with this error:
User specified in query parameters does not match user from token.
One way to resolve this issue is to run the experiment with your user credentials. If you want to run the experiment with credentials for the service, follow these steps to update the roles and policies for the service ID.
-
Open your serviceID on IBM Cloud.
-
Create a new serviceID or update the existing ID with the following access policy:
- All IAM Account Management services with the roles API key reviewer,User API key creator, Viewer, Operator, and Editor. Ideally it is best if they create a new apikey for this ServiceId.
-
The updated policy will look as follows:
-
Run the training again with the credentials for the updated serviceID.
Deploying a custom foundation model from a deployment space fails
When you create a deployment for a custom foundation model from your deployment space, your deployment might fail due to many reasons. Follow these tips to resolve common problems that you might encounter when you deploy your custom foundation models from a deployment space.
Case 1: Parameter value is out of range
When you create a deployment for a custom foundation model from your deployment space, you must make sure that your base model parameter values are within the specified range. For more information, see Properties and parameters for custom foundation models. If you enter a value that is beyond the specified range, you might encounter an error.
For example, the value of max_new_tokens
parameter must be less than max_sequence_length
. When you update the base model parameter values, if you enter a value for max_new_tokens
greater than or equal
to the value of max_sequence_length
(2048), you might encounter an error.
The following image shows an example error message: Value must be an integer between 20 and 1000000000000000 and be greater than 'Max New Tokens'
.
If the default values for your model parameters result in an error, contact your administrator to modify the model's registry in the watsonxaiifm CR.
Case 2: Unsupported data type
You must make sure that you select a data type that is supported by your custom foundation model. When you update the base model parameter values, if you update the data type for your deployed model with an unsupported data type, your deployment might fail.
For example, the LLaMA-Pro-8B-Instruct-GPTQ
model supports the float16
data type only. If you deploy the LLaMA-Pro-8B-Instruct-GPTQ
model with float16
Enum
, then update the Enum
parameter from float16
to bfloat16
, your deployment fails.
If the data type that you selected for your custom foundation model results in an error, you can override the data type for the custom foundation model during deployment creation or contact your administrator to modify the model's registry in the watsonxaiifm CR.
Case 3: Parameter value is too large
If you enter a very large value for the parameter max_sequence_length
and max_new_token
parameters, you might encounter an error. For example, if you set the value of max_sequence_length
as 1000000000000000
,
you encounter the following error message:
Failed to deploy the custom foundation model. The operation failed due to 'max_batch_weight (19596417433) not large enough for (prefill) max_sequence_length (1000000000000000)'. Retry the operation. Contact IBM support if the problem persists.
You must make sure that you enter a value for the parameter which is less than value defined in model configuration file (config.json
).
Case 4: model.safetensors
file is saved with unsupported libraries
If the model.safetensors
file for your custom foundation model uses an unsupported data format in the metadata header, your deployment might fail.
For example, if you import the OccamRazor/mpt-7b-storywriter-4bit-128g
custom foundation model from Hugging Face to your deployment space and create an online deployment, your deployment might fail. This is because the model.safetensors
file for the OccamRazor/mpt-7b-storywriter-4bit-128g
model is saved with the save_pretrained
, which is an unsupported library. You might receive the following error message:
The operation failed due to 'NoneType' object has no attribute 'get'.
You must make sure that your custom foundation model is saved with the supported transformers
library.
Case 5: Deployment of a Llama 3.1 model fails
In your Llama 3.1 model deployment fails, try editing the contents of your model's config.json
file:
- Find the
eos_token_id
entry. - Change the value of the entry from an array to an integer.
Then try redeploying your model.
Creating a job for an SPSS Modeler flow in a deployment space fails
During the process of configuring a batch job for your SPSS Modeler flow in a deployment space, the automatic mapping of data assets with their respective connection might fail.
To fix the error with the automatic mapping of data assets and connections, follow these steps:
-
Click Create and save to save your progress and exit from the New job configuration dialog box.
-
In your deployment space, click the Jobs tab and select your SPSS Modeler flow job to review the details of your job.
-
In the job details page, click the Edit icon to manually update the mapping of your data assets and connections.
-
After updating the mapping of data assets and connection, you can resume with the process of configuring settings your job in the New job dialog box. For more information, see Creating deployment jobs for SPSS Modeler flows
Inactive Watson Machine Learning instance
Symptoms
After you try to submit an inference request to a foundation model by clicking the Generate button in the Prompt Lab, the following error message is displayed:
'code': 'no_associated_service_instance_error',
'message': 'WML instance {instance_id} status is not active, current status: Inactive'
Possible causes
The association between your watsonx.ai project and the related Watson Machine Learning service instance was lost.
Possible solutions
Recreate or refresh the association between your watsonx.ai project and the related Watson Machine Learning service instance. To do so, complete the following steps:
- From the main menu, expand Projects, and then click View all projects.
- Click your watsonx.ai project.
- From the Manage tab, click Services & integrations.
- If the the appropriate Watson Machine Learning service instance is listed, disassociate it temporarily by selecting the instance, and then clicking Remove. Confirm the removal.
- Click Associate service.
- Choose the appropriate Watson Machine Learning service instance from the list, and then click Associate.
The public key that is needed for authentication is not available.
What's happening
The REST API cannot be invoked successfully.
Why it's happening
This problem can happen due to internal service issues.
How to fix it
Contact the support team.
Operation that is timed out after {{timeout}}
What's happening
The REST API cannot be invoked successfully.
Why it's happening
The timeout occurred while performing the requested operation.
How to fix it
Try to invoke the operation again.
Unhandled exception of type {{type}} with {{status}}
What's happening
The REST API cannot be invoked successfully.
Why it's happening
This problem can happen due to internal service issues.
How to fix it
Try to invoke the operation again. If it happens again, contact the support team.
Unhandled exception of type {{type}} with {{response}}
What's happening
The REST API cannot be invoked successfully.
Why it's happening
This problem can happen due to internal service issues.
How to fix it
Try to invoke the operation again. If it happens again, contact the support team.
Unhandled exception of type {{type}} with {{json}}
What's happening
The REST API cannot be invoked successfully.
Why it's happening
This problem can happen due to internal service issues.
How to fix it
Try to invoke the operation again. If it happens again, contact the support team.
Unhandled exception of type {{type}} with {{message}}
What's happening
The REST API cannot be invoked successfully.
Why it's happening
This problem can happen due to internal service issues.
How to fix it
Try to invoke the operation again. If it happens again, contact the support team.
The requested object is not found.
What's happening
The REST API cannot be invoked successfully.
Why it's happening
The request resource is not found.
How to fix it
Make sure that you are referring to the existing resource.
The underlying database reported too many requests.
What's happening
The REST API cannot be invoked successfully.
Why it's happening
The user sent too many requests in a specific time.
How to fix it
Try to invoke the operation again.
The definition of the evaluation is not defined in the artifactModelVersion or deployment. It must be specified " +\n "at least in one of the places.
What's happening
The REST API cannot be invoked successfully.
Why it's happening
Learning Configuration does not contain all the required information
How to fix it
Provide definition
in learning configuration
Evaluation requires a learning configuration that is specified for the model.
What's happening
It is not possible to create learning iteration
.
Why it's happening
learning configuration
isn't defined for the model.
How to fix it
Create learning configuration
and try to create learning iteration
again.
Evaluation requires spark instance to be provided in X-Spark-Service-Instance
header
What's happening
The REST API cannot be invoked successfully.
Why it's happening
learning configuration
does not have the required information.
How to fix it
Provide spark_service
in Learning Configuration or in X-Spark-Service-Instance
header.
Model does not contain any version.
What's happening
It is not possible to create deployment or set the learning configuration
.
Why it's happening
This problem can happen due to inconsistency that is related to the persistence of the model.
How to fix it
Try to persist the model again and try to perform the action again.
Data module not found in IBM Federated Learning.
What's happening
The data handler for IBM Federated Learning is trying to extract a data module from the FL library but is unable to find it. You might see the following error message:
ModuleNotFoundError: No module named 'ibmfl.util.datasets'
Why it's happening
Possibly an outdated DataHandler.
How to fix it
Review and update your DataHandler to conform to the most recent MNIST data handler or make sure that your sample versions are up to date.
Patch operation can modify existing learning configuration only.
What's happening
It is not possible to invoke patch REST API method to patch learning configuration.
Why it's happening
learning configuration
isn't set for this model or the model does not exist.
How to fix it
Endure that model exists and has already learning configuration set.
Patch operation expects exactly one replace operation.
What's happening
The deployment cannot be patched.
Why it's happening
The patch payload contains more than one operation or the patch operation is different than replace
.
How to fix it
Use only one operation in the patch payload, which is replace
operation.
The payload is missing the required fields: FIELD or the values of the fields are corrupted.
What's happening
It is not possible to process action that is related to access to the underlying data set.
Why it's happening
The access to the data set is not properly defined.
How to fix it
Correct the access definition for the data set.
Provided evaluation method: METHOD is not supported. Supported values: VALUE.
What's happening
It is not possible to create learning configuration.
Why it's happening
The wrong evaluation method was used to create learning configuration.
How to fix it
Use a supported evaluation method, which is one of: regression
, binary
, multiclass
.
You can have only one active evaluation per model. The request cannot be completed because of existing active evaluation: {{url}}
What's happening
It is not possible to create another learning iteration.
Why it's happening
You can have only one running evaluation for the model.
How to fix it
See the already running evaluation or wait for the evaluation to end and start the new one.
The deployment type {{type}} is not supported.
What's happening
It is not possible to create the deployment.
Why it's happening
Not supported deployment type was used.
How to fix it
A supported deployment type must be used.
Incorrect input: ({{message}})
What's happening
The REST API cannot be invoked successfully.
Why it's happening
This problem happens due to an issue with parsing JSON.
How to fix it
Make sure that the correct JSON is passed in the request.
Insufficient data - metric {{name}} cannot be calculated
What's happening
Learning iteration failed.
Why it's happening
Value for metric with defined threshold cannot be calculated because of insufficient feedback data.
How to fix it
Review and improve data in data source feedback_data_ref
in learning configuration
For type {{type}} spark instance must be provided in X-Spark-Service-Instance
header
What's happening
Deployment cannot be created
Why it's happening
batch
and streaming
deployments require spark instance to be provided
How to fix it
Provide spark instance in X-Spark-Service-Instance
header
Action {{action}} failed with message {{message}}
What's happening
The REST API cannot be invoked successfully.
Why it's happening
This problem happens due to an issue with invoking underlying service.
How to fix it
If the message provides a suggestion to fix the issue, follow the suggestion. Otherwise, contact the support team.
Path {{path}}
is not allowed. The only allowed path for patch stream is /status
What's happening
Stream deployment cannot be patched.
Why it's happening
The wrong path was used to patch the stream
deployment.
How to fix it
Patch the stream
deployment with supported path option, which is /status
(it allows to start/stop stream processing).
Patch operation is not allowed, for instance, of type {{$type}}
What's happening
Deployment cannot be patched.
Why it's happening
The wrong deployment type is being patched.
How to fix it
Patch the stream
deployment type.
Data connection {{data}}
is invalid for feedback_data_ref
What's happening
learning configuration
cannot be created for the model.
Why it's happening
Supported data source was not used when feedback_data_ref
was defined.
How to fix it
Use only the supported data source type dashdb
.
Path {{path}} is not allowed. The only allowed path for patch model is /deployed_version/url
or /deployed_version/href
for V2
What's happening
No option to patch model.
Why it's happening
The wrong path was used during patching of the model.
How to fix it
Patch model with supported path that you can use to update the version of the deployed model.
Parsing failure: {{msg}}
What's happening
The REST API cannot be invoked successfully.
Why it's happening
The requested payload cannot be parsed successfully.
How to fix it
Make sure that your request payload is correct and can be parsed correctly.
Runtime environment for selected model: {{env}} is not supported for learning configuration
. Supported environments: [{{supported_envs}}].
What's happening
No option to create learning configuration
.
Why it's happening
The model for which the learning_configuration
was tried to be created is not supported.
How to fix it
Create learning configuration
for model, which has the supported runtime.
Current plan '{{plan}}' allows {{limit}} deployments only
What's happening
It is not possible to create the deployment.
Why it's happening
The limit for number of deployments was reached for the current plan.
How to fix it
Upgrade to the plan that does not have such limitation.
Database connection definition is not valid ({{code}})
What's happening
It is not possible to use the learning configuration
function.
Why it's happening
The database connection definition is invalid.
How to fix it
Try to fix the issue that is described by code
returned by the underlying database.
Problems connecting underlying {{system}}
What's happening
The REST API cannot be invoked successfully.
Why it's happening
This problem might happen due to an issue during connection to the underlying system. It might be a temporary network issue.
How to fix it
Try to invoke the operation again. If you get an error again, contact the support team.
Error extracting X-Spark-Service-Instance header: ({{message}})
What's happening
This problem might happen if REST API that requires Spark credentials cannot be invoked.
Why it's happening
This problem might happen due to an issue with base-64 decoding or parsing Spark credentials.
How to fix it
Make sure that the correct Spark credentials were correctly base-64 encoded. For more information, see the documentation.
This function is forbidden for non-beta users.
What's happening
The REST API cannot be invoked successfully.
Why it's happening
The REST API that was invoked is in beta.
How to fix it
If you are interested in participating, add yourself to the wait list. The details can be found in the documentation.
{{code}} {{message}}
What's happening
The REST API cannot be invoked successfully.
Why it's happening
This problem might happen due to an issue with invoking underlying service.
How to fix it
If the message provides a suggestion to fix the issue, follow the suggestion. Otherwise, contact the support team.
Rate limit exceeded.
What's happening
Rate limit exceeded.
Why it's happening
The rate limit for current plan is exceeded.
How to fix it
To solve this problem, acquire another plan with a greater rate limit
Invalid query parameter {{paramName}}
value: {{value}}
What's happening
Validation error as passed incorrect value for query parameter.
Why it's happening
Error in getting result for query.
How to fix it
Correct query parameter value. The details can be found in the documentation.
Invalid token type: {{type}}
What's happening
Error regarding token type.
Why it's happening
Error in authorization.
How to fix it
Token must be started with Bearer
prefix.
Invalid token format. You must use bearer token format.
What's happening
Error regarding token format.
Why it's happening
Error in authorization.
How to fix it
The token must be a bearer token and must start with Bearer
prefix.
Input JSON file is missing or invalid: 400
What's happening
The following message displays when you try to score online: Input JSON file is missing or invalid.
Why it's happening
This message displays when the scoring input payload doesn't match the expected input type that is required for scoring the model. Specifically, the following reasons might apply:
- The input payload is empty.
- The input payload schema is not valid.
- The input data types do not match the expected data types.
How to fix it
Correct the input payload. Make sure that the payload has correct syntax, a valid schema, and proper data types. After you make corrections, try to score online again. For syntax issues, verify the JSON file by using the jsonlint
command.
Unknown deployment identification: 404
What's happening
The following message displays when you try to score online Unknown deployment identification.
Why it's happening
This message displays when the deployment ID that is used for scoring does not exist.
How to fix it
Make sure you are providing the correct deployment ID. If not, deploy the model with the deployment ID and then try scoring it again.
Internal server error: 500
What's happening
The following message displays when you try to score online: Internal server error
Why it's happening
This message displays if the downstream data flow on which the online scoring depends fails.
How to fix it
Wait for some time and try to score online again. If it fails again, contact IBM Support.
Invalid type for ml_artifact: Pipeline
What's happening
The following message displays when you try to publish a Spark model by using Common API client library on your workstation.
Why it's happening
This message displays if you have an invalid pyspark set up in the operating system.
How to fix it
Set up system environment paths according to the instruction:
SPARK_HOME={installed_spark_path}
JAVA_HOME={installed_java_path}
PYTHONPATH=$SPARK_HOME/python/
ValueError: Training_data_ref name and connection cannot be None, if Pipeline Artifact is not given.
What's happening
The training data set is missing or is not referenced properly.
Why it's happening
The Pipeline Artifact is a training data set in this instance.
How to fix it
You must supply a training data set when you persist a Spark PipelineModel. If you don't, the client says it doesn't support PipelineModels, rather than saying a PipelineModel must be accompanied by the training set.