After you upload your custom foundation model to cloud object storage, create a connection to the model and a corresponding model asset. Use the connection to create a model asset in a project or space.
To create a model asset, add a connection to the model and then create a model asset. If you want to first test your custom foundation model in a project (for example by evaluating it in a Jupyter notebook), add your custom foundation model asset
to a project and then promote it to a space.
After you add the model asset, you can deploy it and use Prompt Lab to inference it.
Important:
If you upload your model to remote cloud storage, you must create a connection that is based on your personal credentials. Only connections that use personal credentials are allowed with remote cloud storage. As a result, other users of
the same deployment space do not get access to the model content but are allowed to do inference on the model deployments. Create the connection by using your access key and your secret access key. For information on how to enable personal
credentials for your account, see Account settings.
Before you begin
Copy link to section
You must enable task credentials to be able to deploy a custom foundation model. For more information, see Adding task credentials.
You can use global parameters to deploy your custom foundation models. Set the value of your base model parameter within the range that is specified in the following table. If you don't do that, your deployment might fail and inferencing will
not be possible.
Global parameters for custom foundation models
Parameter
Type
Range of values
Default value
Description
dtype
String
float16, bfloat16
float16
Use this parameter to specify the data type for your model.
max_batch_size
Number
max_batch_size >= 1
256
Use this parameter to specify the maximum batch size for your model.
max_concurrent_requests
Number
max_concurrent_requests >= 1 and max_concurrent_requests >= max_batch_size
1024
Use this parameter to specify the maximum number of concurrent requests that can be made to your model. This parameter is not available to deployments that use the watsonx-cfm-caikit-1.1 software specification.
max_new_tokens
Number
max_new_tokens >= 20
2047
Use this parameter to specify the maximum number of tokens that your model generate for an inference request.
max_sequence_length
Number
max_sequence_length >= 20 and max_sequence_length > max_new_tokens
2048
Use this parameter to specify the maximum sequence length for your model.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.