Managing predictive deployments

Last updated: Dec 09, 2024

For proper deployment, you must set up a deployment space and then select and configure a specific deployment type. After you deploy assets, you can manage and update them to make sure they perform well and to monitor their accuracy.

To be able to deploy assets from a space, you must have a machine learning service instance that is provisioned and associated with that space.

Online and batch deployments provide simple ways to create an online scoring endpoint or do batch scoring with your models.

If you want to implement a custom logic:

Create a Python function to use for creating your online endpoint
Write a notebook or script for batch scoring

Note: If you create a notebook or a script to perform batch scoring such an asset runs as a platform job, not as a batch deployment.

Deployable assets

Following is the list of assets that you can deploy from a watsonx.ai Runtime space, with information on applicable deployment types:

List of assets that you can deploy
Asset type	Batch deployment	Online deployment
Functions	Yes	Yes
Models	Yes	Yes
Scripts	Yes	No

Notes:

A deployment job is a way of running a batch deployment, or a self-contained asset like a flow in watsonx.ai Runtime. You can select the input and output for your job and choose to run it manually or on a schedule. For more information, see Creating a deployment job.
You can deploy a Natural Language Processing model by using Python functions or Python scripts. Both online and batch deployments are supported.
Notebooks and flows use notebook environments. You can run them in a deployment space, but they are not deployable.

For more information, see:

After you deploy assets, you can manage and update them to make sure they perform well and to monitor their accuracy. Some ways to manage or update a deployment are as follows:

Manage deployment jobs. After you create one or more jobs, you can view and manage them from the Jobs tab of your deployment space.
Update a deployment. For example, you can replace a model with a better-performing version without having to create a new deployment.
Scale a deployment to increase availability and throughput by creating replicas of the deployment.
Delete a deployment to remove a deployment and free up resources.

Configuring API gateways to provide stable endpoints

watsonx.ai Runtime provides stable endpoints to prevent downtime. However, you might experience downtime if you move to a new Cloud Pak for Data instance or add an instance.

API gateways provide a stable URL that can be used with your Watson Machine Learning API endpoint. You can use an API gateway (available in Cloud Pak for Integration) with your deployment endpoints to handle downtime if it happens in the following cases:

If you have more than one instance of Cloud Pak for Data in a high-availability configuration, and one of the available instances fails. In this case, you can use an API gateway for switching automatically to another instance, thereby preventing complete failure.
If you have more than one application that uses the same endpoint, and the deployment endpoint is not available. For example, if you accidentally delete the deployment. In this case, you can update the endpoint in the API gateway to make sure that applications continue to use it.

Enabling GPU and MIG support for deployment runtimes

If you are deploying a predictive machine learning model that requires significant processing power for inferencing, you can optionally configure a GPU for deployment runtimes.

You can also enable MIG support for GPUs when you want to deploy an application that does not require the full power of an enitre GPU. If you are configuring MIG for GPU-accelerated workloads, all GPU-enabled nodes should adhere to a single strategy determined in the prior configuration steps. This ensures consistent behaviour across all GPU-enabled nodes in the cluster. To configure MIG support, see Nvidia Guide for configuring MIG support.

Learn more

Full list of asset types that can be added to a deployment space

Parent topic: Deploying assets