Inferencing a foundation model programmatically (Python)

Last updated: Feb 03, 2025

You can prompt foundation models in IBM watsonx.ai programmatically by using the Python library.

After you create a prompt in the Prompt Lab, you can save the prompt as a notebook, and then edit the notebook. Starting from a generated notebook is useful because it handles the initial setup steps, such as getting credentials and the project ID, for you.

Alternatively, you can work with foundation models directly from a new notebook that you create in watsonx.ai.

For more information, see the Model Inference class of the watsonx.ai Python library.

Sample notebooks

To find sample notebooks that prompt various foundation models available in watsonx.ai, see Python sample notebooks. Most of these notebooks are also available from the Resource hub.

For example, the Use watsonx to analyze car rentals reviews is a sample notebook that you can run to learn the steps involved in inferencing a foundation model in watsonx.ai.

Prompting a foundation model

Before you prompt a foundation model, get a list of the foundation models that are available for inferencing in watsonx.ai. For more information, see Getting information about available foundation models.

The following sample code shows you how to prompt the flan-t5-xxl-11b model by using different methods so that you can compare the type of output that is generated by each method.

Although only the example that shows how to generate a text stream includes model parameters for prompting, you can specify parameters for any of the methods.

Sample Python code for prompting a foundation model

import json
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai.foundation_models import ModelInference
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods

my_credentials = {
  "url": "https://{region}.ml.cloud.ibm.com",
  "apikey": {my-IBM-Cloud-API-key},
}

client = APIClient(my_credentials)

gen_parms = {
    GenParams.DECODING_METHOD: DecodingMethods.SAMPLE,
    GenParams.MAX_NEW_TOKENS: 100
}
model_id = client.foundation_models.TextModels.FLAN_T5_XXL
project_id = {my-project-ID}
space_id = None
verify = False

model = ModelInference(
  model_id=model_id,
  credentials=my_credentials,
  params=gen_parms,
  project_id=project_id,
  space_id=space_id,
  verify=verify,
)

prompt_txt = "In today's sales meeting, we "
gen_parms_override = gen_parms = {
    GenParams.DECODING_METHOD: DecodingMethods.SAMPLE,
    GenParams.MAX_NEW_TOKENS: 50
}

# GENERATE

generated_response = model.generate(prompt=prompt_txt, params=gen_parms_override)

print("Output from generate() method:")
print(json.dumps(generated_response, indent=2))

# GENERATE TEXT

generated_text_response = model.generate_text(prompt=prompt_txt, params=gen_parms_override)

print("Output from generate_text() method:")
print(generated_text_response)

# GENERATE STREAM

gen_stream_params = {
  GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
  GenParams.TEMPERATURE: 0.5,
  GenParams.MIN_NEW_TOKENS: 10,
  GenParams.MAX_NEW_TOKENS: 20
}

print("Output from generate_text_stream() method:")
stream_response = model.generate_text_stream(prompt=prompt_txt, params=gen_stream_params)

for chunk in stream_response:
  print(chunk, end='')

Note:

To use the sample code, you must replace {region}, {my-IBM-Cloud-API-key}, and {my-project-ID} with valid values for your environment.

Output from generate() method:

{
  "model_id": "google/flan-t5-xxl",
  "created_at": "2023-07-27T03:40:17.575Z",
  "results": [
    {
      "generated_text": "will discuss the new product line.",
      "generated_token_count": 8,
      "input_token_count": 10,
      "stop_reason": "EOS_TOKEN"
    }
  ],
  ...
}

Output from generate_text() method:

will discuss the new product line.

Output from generate_text_stream() method:

will discuss the new product line. Let's start with marketing plans

Removing harmful content

When you submit a prompt to a foundation model, the hate, abuse, and profanity (HAP) filter and the personally-identifiable information (PII) filter are disabled by default. You can enable them and specify the sensitivity of the HAP filter by using the modifications field in the API.

For more information about how the filter works, see Removing harmful language from model input and output.

To enable the filters with default settings applied when using the Python library, include the following parameter in the request:

response = model.generate(prompt,guardrails=True)

The following code example shows you how to enable and configure the filters.

guardrails_hap_params = {
  GenTextModerationsMetaNames.INPUT: False,
  GenTextModerationsMetaNames.THRESHOLD: 0.45
}
guardrails_pii_params = {
  GenTextModerationsMetaNames.INPUT: False,
  GenTextModerationsMetaNames.OUTPUT: True,
  GenTextModerationsMetaNames.MASK: {"remove_entity_value": True}
}

response = model.generate(prompt,
  guardrails=True,
  guardrails_hap_params=guardrails_hap_params,
  guardrails_pii_params=guardrails_pii_params)

Parent topic: Python library