You can build a customized AI service which is tailored to your generative AI application from the ground up. For example, if you are deploying an asset that uses retrieval augmented generation (RAG), you can capture the logic for retrieving answers from the grounding documents in the AI service.
Process overview
The following graphic illustrates the process of coding AI services.
You can create a notebook that contains the AI service and connections within the Project. The AI service captures the logic of your RAG application and contains the generateration function, which is a deployable unit of code. The generation function is promoted to the deployment space, which is used to create a deployment. The deployment is exposed as a REST API endpoint that can be accessed by other applications. You can send a request to the REST API endpoint to use the deployed AI service for inferencing. The deployed AI service processes the request and returns a response.
Tasks for creating and deploying AI services
Here are the steps that you must follow to create, deploy, and manage AI services:
- Create AI service: Define an AI service in a notebook by using Python. The AI service must meet specific requirements for deploying as an AI service.
- Test AI service: Test the coding logic of your AI service locally.
- Create AI service assets: After creating and testing the AI service, you must package the AI service as a deployable asset.
- Deploy AI service assets: Deploy the AI service asset as an online or a batch deployment.
- Testing AI service deployment: Test your deployed AI service for online inferencing or batch scoring.
- Manage AI services: Access and update the deployment details. Scale or delete the deployment from the user interface or programmatically.
Sample notebooks for creating and deploying AI services
To learn how to create and deploy AI services programmatically, see the following sample notebooks:
Sample name | Framework | Techniques demonstrated |
---|---|---|
Use watsonx, and meta-llama/llama-3-2-11b-vision-instruct to run as an AI service | Python | Set up the environment Create an AI service Test AI service's function locally Deploy AI service Running an AI service |
Use watsonx, Elasticsearch, and LangChain to answer questions (RAG) | LangChain | Set up the environment Download the test dataset. Defining the foundation model on watsonx Set up connectivity information to Elasticsearch Generate a retrieval-augmented response to a question Creating an AI service Testing AI service function locally Deploying the AI service |
Use watsonx and meta-llama/llama-3-1-70b-instruct to create AI service | LangGraph | Set up the environment Create an AI service Test AI service's function locally Deploy AI service Running an AI service |
Learn more
Parent topic: Deploying AI services