Deploying AI services

Last updated: Feb 21, 2025

An AI service is a deployable unit of code that you can use to capture the logic of your generative AI use cases. When your AI services are successfully deployed, you can use the endpoint for inferencing from your application.

Deploying generative AI applications with AI services

While Python functions are the traditional way to deploy machine learning assets, AI services offer a more flexible option to deploy code for generative AI applications like streaming.

Unlike the standard Python function for deploying a predictive machine learning model, which requires input in a fixed schema, an AI service provides flexibility for multiple inputs and allows for customization.

AI services offer a secure solution to deploy your code functions. For example, credentials such as bearer tokens that are required for authentication are generated from task crendentials by the service and the token is made available to the AI service asset. You can use this token to get connection assets, download data assets, and more.

Deploying AI services visually

You can deploy your AI service directly to a deployment space by following a no-code approach from the user interface. Use this approach to create an online or batch deployment for your use case.

For more information, see Deploying AI services visually.

Deploying AI services with tools

You can use the following visual tools to create a generative AI solution in watsonx.ai:

Prompt Lab
AutoAI
Agent Lab

When you use visual tools to create a generative AI solution for a complex use case, such as RAG, your solution is deployed as an AI service. You can choose to deploy your solution directly from the user interface or export your solution in an editable notebook in Python that deploys the AI service. The notebook automatically generates the code to create an AI service in a standard format, and provides you a way to add more functionality or update after testing. While tools provide a user-friendly interface to create and deploy AI services, coding offers more flexibility and customization options.

For more information, see Deploying AI services with tools.

Deploying AI services with code

When you build your generative AI applications from the ground up, you can use an AI service to capture the programming logic of your application, which can be deployed with an endpoint for inferencing. For example, if you build a RAG application with frameworks such as LangChain, LlamaIndex, or more, you can use an AI service to capture the logic for retrieving answers from the vector index in the AI service and deploying the AI service.

For more information, see Deploying AI services with code.

Learn more

Deploying Python functions

Parent topic: Deploying foundation model assets