Deploying foundation models on-demand with Python client library
Deploying foundation models on-demand with Python client library
Last updated: Dec 12, 2024
Deploying foundation models on-demand with Python client library
Deploy foundation models on-demand programmatically with watsonx.ai Python client library. Deploying a foundation model on-demand makes it available on dedicated hardware for the exclusive use of your organization. IBM provides a set of curated
models that are available for you to deploy on-demand.
Before you begin
Copy link to section
You must set up or enable your task credentials to deploy foundation models on-demand. For more information, see Managing task credentials.
Review supported foundation model architectures, deployment types, and other considerations for deploying a foundation model on-demand. For more information, see Deploying foundation models on-demand.
Deploying foundation models on-demand with Python client library
Copy link to section
To deploy a foundation model on-demand by using the Python client library, create a model asset in the repository by creating the metadata for your asset and storing the model. Then, retrieve the asset ID and create an online deployment for
the asset.
Creating model asset in the watsonx.ai repository
Copy link to section
You must create an asset for the foundation model that you want to deploy on-demand in the watsonx.ai service repository. To store the model as an asset in the repository, create the metadata for your asset. After creating the metadata for
your model asset, you can store the model in your repository.
The following code snippet shows how to create metadata for your foundation model asset in the watsonx.ai repository:
metadata = {
client.repository.ModelMetaNames.NAME: "curated FM asset",
client.repository.ModelMetaNames.TYPE: client.repository.ModelAssetTypes.CURATED_FOUNDATION_MODEL_1_0,
}
Copy to clipboardCopied to clipboard
After creating the metadata for your foundation model asset, store the model using client.repository.store_model() function:
Once the foundation model asset is stored in the watsonx.ai respository, you can retrieve the asset ID for your model. The asset ID is required to create the deployment for your foundation model.
You can list all stored curated foundation models and filter them by framework type:
To create a new deployment for a foundation model that can be deployed on-demand with the Python client library, you must define a meta_props dictionary with the metadata that contains the details for your deployment.
You can optionally overwrite the model parameters when you create the metadata for your asset. To overwrite the model parameters, pass a dictionary with new parameters values in the FOUNDATION_MODEL field.
The following sample shows how to create an online deployment for your foundation model by using the watsonx.ai Python client library:
Testing deployed foundation model on-demand with Python client library
Copy link to section
You can test a foundation model that is deployed on-demand for online inferencing from the Python client library, as shown in the following code sample:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
generate_params = {
GenParams.MAX_NEW_TOKENS: 25,
GenParams.STOP_SEQUENCES: ["\n"]
}
deployed_model = ModelInference(
deployment_id=deployment_id,
params=generate_params,
api_client=client
)
response_text = deployed_model.generate_text(prompt="Example prompt")
full_response_json = deployed_model.generate(prompt=input_prompt, params=gen_params)
# generate stream for token in deployed_model.generate_text_stream(prompt=input_prompt):
print(token, end="")
Copy to clipboardCopied to clipboardShow more
Managing deployed foundation models on-demand with Python client library
Copy link to section
Update, scale, or delete your foundation model that are deployed on-demand with the Python client library.
Retrieving deployment details
Copy link to section
To retrieve the details of a deployment, use the get_details() function of the Python client library.
The following code sample shows how to use the Python client library to retieve the details of foundation models that are deployed on-demand:
deployment_details = deployed_model.get_details()
Copy to clipboardCopied to clipboard
Alternatively, you can retrieve the details of a specific deployment by passing the deployment_id, as shown in the following code sample.
You can deploy only one instance of a foundation model on-demand model in a deployment space. To handle increased demand, you can scale the deployment by creating additional copies.
The following code sample shows how to scale the number of replicas for your deployment by updating the number of hardware requests from the Python client library:
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.