Use foundation models in IBM watsonx.ai programmatically for text generation tasks.
Ways to develop
You can inference foundation models by using these programming methods:
Alternatively, you can use graphical tools from the watsonx.ai UI to inference foundation models. See Prompt Lab.
Inference types
You can prompt a foundation model by using one of the following text generation methods:
- Infer text: Waits to return the output that is generated by the foundation model all at one time.
- Infer text event stream: Returns the output as it is generated by the foundation model. This method is useful in conversational use cases, where you want a chatbot or virtual assistant to respond to a user in a fluid way that mimics a real conversation.
For chat use cases, use the Chat API. See Adding generative chat function to your applications with the chat API.
Node.js
- Text generation
-
See the following resources:
- Text generation stream
-
See the following resource:
Python library
See the Model Inference class of the watsonx.ai Python library.
The following topics describe how to use available sample notebooks:
REST API
The method that you use to inference a foundation model differs depending on whether the foundation model is provided with watsonx.ai or is associated with a deployment.
-
To inference a foundation model that is deployed by IBM in watsonx.ai, use the Text generation method.
curl -X POST \ -H 'Authorization: Bearer {token}' \ -H 'Content-Type: application/json' \ -H 'Accept: application/json' \ --data-raw '{ "input": "Tell me about interest rates", "parameters": { "max_new_tokens": 200 }, "model_id": "ibm/granite-3-8b-instruct", "project_id": "{project_id}" }' \ "https://{region}.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-02-11"
-
To inference a tuned foundation model, a custom foundation model, or a deploy on demand foundation model, use the Deployments>Infer text method.
The
{model_id}
is not required with this type of request because only one model is supported by the deployment.
Applying AI guardrails when inferencing
When you prompt a foundation model by using the API, you can use the moderations
field to apply AI guardrails to foundation model input and output. For more information, see Removing harmful language from model input and output.
Inferencing with a prompt template
You can inference a foundation model with input text that follows a pattern that is defined by a prompt template.
For more information, see Create a prompt template.
To extract prompt template text to use as input to the text generation method, take the following steps:
-
Use the Search asset types method of the Watson Data API to get the prompt template ID.
curl -X POST \ 'https://api.dataplatform.cloud.ibm.com/v2/asset_types/wx_prompt/search?version=2024-07-29&project_id={project_id}' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer ACCESS_TOKEN' \ --data '{ "query": "asset.name:{template_name}" }'
The prompt template ID is specified as the
metadata.asset_id
. -
Use the Get the inference input string for a given prompt method to get the prompt template text.
curl -X POST \ 'https://api.dataplatform.cloud.ibm.com/wx/v1/prompts/{prompt-template-id}/input?version=2024-07-29&project_id={project_id}' ...
For more information, see Get the inference input string for a given prompt
You can submit the extracted prompt text as input to the Generate text method.
Parent topic: Coding generative AI solutions