Deploying generative AI assets

Last updated: Jan 30, 2025

Deploy generative AI assets to use them in production and monitor these deployed assets.

Types of deployable assets for generative AI applications

You can use watsonx.ai to deploy the following assets for your generative AI applications:

Prompt templates
AI services
Tuned models
Custom foundation models
Deploy-on-demand foundation models

Deploying prompt templates

After you save a prompt template as a project asset, you can promote it to a deployment space. From the deployment space, you can deploy your prompt template to production and get the endpoint for inferencing.

If you have the watsonx.governance service, you can also capture and track the deployment details for a prompt template to meet governance requirements.

For more information, see Deploying a prompt template.

Deploying AI services

An AI service is a deployable unit of code that captures the logic of your generative AI use cases, such as Retrieval Augmented Generation (RAG). When your AI services are successfully deployed, you can use the endpoint for inferencing from your application.

AI services are created automatically when you deploy a complex generative AI solution with visual tools such as the Agent Lab, Prompt Lab or AutoAI. For example, if you use the Agent Lab or Prompt Lab to build and deploy your agentic or generative AI solution, the tool automatically detects the complexity of the solution and presents the correct type of deployment asset.

Although you can use the prompt templates to create and deploy saved prompts in the Prompt Lab, you cannot use them to deploy generative AI applications for complex use cases such as RAG.

If you choose to code your generative AI application that is based on these complex use cases, you must create an AI service and ensure that it follows certain requirements. You can deploy an AI service programmatically with watsonx.ai REST API or Python client library. After deploying the AI service, you can use the endpoint for inferencing.

For more information, see Deploying AI services.

Deploying tuned models

After you tune a foundation model and save the tuned model as a project asset, you can promote it to a deployments space. From the deployment space, you can test the tuned model and get the endpoint for inferencing.

For more information, see Deploying a tuned foundation model.

Deploying custom foundation models

In addition to working with foundation models that are curated by IBM, you can upload and deploy your own foundation models. After the models are deployed and registered with watsonx.ai, create prompts that inference the custom models from the Prompt Lab.

Deploying a custom foundation model provides the flexibility for you to implement the AI solutions that are right for your use case.

For more information, see Deploying a custom foundation model.

Deploying foundation models on-demand

Deploy a foundation model on-demand on dedicated hardware to make the foundation model available for use in various applications and services as needed. By using this approach, you can access the capabilities of these powerful foundation models without the need for extensive computational resources. Foundation models that you deploy on-demand are hosted in a dedicated deployment space where you can use these models for inferencing.

For more information, see Deploying foundation models on-demand.

Learn more

Parent topic: Deploying assets with watsonx.ai Runtime