Deploying foundation models on-demand with Python client library

Last updated: Dec 12, 2024

Deploy foundation models on-demand programmatically with watsonx.ai Python client library. Deploying a foundation model on-demand makes it available on dedicated hardware for the exclusive use of your organization. IBM provides a set of curated models that are available for you to deploy on-demand.

Before you begin

You must set up or enable your task credentials to deploy foundation models on-demand. For more information, see Managing task credentials.
Review supported foundation model architectures, deployment types, and other considerations for deploying a foundation model on-demand. For more information, see Deploying foundation models on-demand.

Deploying foundation models on-demand with Python client library

To deploy a foundation model on-demand by using the Python client library, create a model asset in the repository by creating the metadata for your asset and storing the model. Then, retrieve the asset ID and create an online deployment for the asset.

Creating model asset in the watsonx.ai repository

You must create an asset for the foundation model that you want to deploy on-demand in the watsonx.ai service repository. To store the model as an asset in the repository, create the metadata for your asset. After creating the metadata for your model asset, you can store the model in your repository.

The following code snippet shows how to create metadata for your foundation model asset in the watsonx.ai repository:

metadata = {
    client.repository.ModelMetaNames.NAME: "curated FM asset",
    client.repository.ModelMetaNames.TYPE: client.repository.ModelAssetTypes.CURATED_FOUNDATION_MODEL_1_0,
}

After creating the metadata for your foundation model asset, store the model using client.repository.store_model() function:

stored_model_details = client.repository.store_model(model='ibm/granite-13b-chat-v2-curated', meta_props=metadata)

Retrieving the identifier for your asset

Once the foundation model asset is stored in the watsonx.ai respository, you can retrieve the asset ID for your model. The asset ID is required to create the deployment for your foundation model.

You can list all stored curated foundation models and filter them by framework type:

client.repository.list(framework_filter='curated_foundation_model_1.0')

The following code snippet shows how to retrieve the ID for your foundation model asset:

stored_model_asset_id = client.repository.get_model_id(stored_model_details)

Deploying foundation model on-demand

To create a new deployment for a foundation model that can be deployed on-demand with the Python client library, you must define a meta_props dictionary with the metadata that contains the details for your deployment.

You can optionally overwrite the model parameters when you create the metadata for your asset. To overwrite the model parameters, pass a dictionary with new parameters values in the FOUNDATION_MODEL field.

The following sample shows how to create an online deployment for your foundation model by using the watsonx.ai Python client library:

meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: "curated_fm_deployment",
    client.deployments.ConfigurationMetaNames.DESCRIPTION: "Testing deployment using curated foundation model",
    client.deployments.ConfigurationMetaNames.ONLINE: {},
    client.deployments.ConfigurationMetaNames.SERVING_NAME: "test_curated_fm_01"
}
deployment_details = client.deployments.create(stored_model_asset_id, meta_props)
deployment_id = client.deployments.get_uid(deployment_details)
print("The deployment id:", deployment_id)

Testing deployed foundation model on-demand with Python client library

You can test a foundation model that is deployed on-demand for online inferencing from the Python client library, as shown in the following code sample:

from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams

generate_params = {
    GenParams.MAX_NEW_TOKENS: 25,
    GenParams.STOP_SEQUENCES: ["\n"]
}

deployed_model = ModelInference(
    deployment_id=deployment_id,
    params=generate_params,
    api_client=client
)

response_text = deployed_model.generate_text(prompt="Example prompt")

full_response_json = deployed_model.generate(prompt=input_prompt, params=gen_params)

# generate stream 
for token in deployed_model.generate_text_stream(prompt=input_prompt):
    print(token, end="")

Managing deployed foundation models on-demand with Python client library

Update, scale, or delete your foundation model that are deployed on-demand with the Python client library.

Retrieving deployment details

To retrieve the details of a deployment, use the get_details() function of the Python client library.

The following code sample shows how to use the Python client library to retieve the details of foundation models that are deployed on-demand:

deployment_details = deployed_model.get_details()

Alternatively, you can retrieve the details of a specific deployment by passing the deployment_id, as shown in the following code sample.

deployment_details = client.deployments.get_details(deployment_id)

Updating the deployment

You can update the deployment for your foundation model that is deployed on-demand.

The following code sample shows how to update the deployment details from the Python client library:

metadata = {client.deployments.ConfigurationMetaNames.NAME: "Deployment on Demand v2"}
updated_deployment_details = client.deployments.update(deployment_id, changes=metadata)

Scaling the deployment

You can deploy only one instance of a foundation model on-demand model in a deployment space. To handle increased demand, you can scale the deployment by creating additional copies.

The following code sample shows how to scale the number of replicas for your deployment by updating the number of hardware requests from the Python client library:

metadata = {client.deployments.ConfigurationMetaNames.HARDWARE_REQUEST: {"num_nodes": 2}}
deployment_details = client.deployments.update(deployment_id, changes=metadata)

Deleting the deployment

You can delete your deployed foundation model when you no longer need it to stop the billing charges.

The following code sample shows how to delete a foundation model deployed on-demand with the Python client library:

client.deployments.delete(deployment_id)

Learn more

Deploy foundation models on-demand with the Python client library

Parent topic: Deploying foundation models on-demand