0 / 0
Inferencing a foundation model programmatically
Last updated: Nov 27, 2024
Inferencing a foundation model programmatically

You can prompt foundation models in IBM watsonx.ai programmatically by using the Python library.

After you create a prompt in the Prompt Lab, you can save the prompt as a notebook, and then edit the notebook. Using the generated notebook as a starting point is useful because it handles the initial setup steps, such as getting credentials and the project ID information for you.

Alternatively, you can work with foundation models directly from a notebook in watsonx.ai by using the watsonx.ai Python library. For more information, see the ibm-watsonx-ai library.

Sample notebooks

To find sample notebooks that prompt various foundation models available in watsonx.ai, see Python sample notebooks. Most of these notebooks are also available from the Resource hub.

For example, the Use watsonx to analyze car rentals reviews is a sample notebook that you can run to learn the steps involved in inferencing a foundation model in watsonx.ai.

Prompting a foundation model

Before you prompt a foundation model, get a list of the foundation models that are available for inferencing in watsonx.ai. For more information, see Getting information about available foundation models.

The following sample code shows you how to prompt the flan-t5-xxl-11b model by using different methods so that you can compare the type of output that is generated by each method.

Although only the example that shows how to generate a text stream includes model parameters for prompting, you can specify parameters for any of the methods.

Sample Python code for prompting a foundation model

import json
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai.foundation_models import ModelInference
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods

my_credentials = {
  "url": "https://{region}.ml.cloud.ibm.com",
  "apikey": {my-IBM-Cloud-API-key},
}

client = APIClient(my_credentials)

gen_parms = {
    GenParams.DECODING_METHOD: DecodingMethods.SAMPLE,
    GenParams.MAX_NEW_TOKENS: 100
}
model_id = client.foundation_models.TextModels.FLAN_T5_XXL
project_id = {my-project-ID}
space_id = None
verify = False

model = ModelInference(
  model_id=model_id,
  credentials=my_credentials,
  params=gen_parms,
  project_id=project_id,
  space_id=space_id,
  verify=verify,
)

prompt_txt = "In today's sales meeting, we "
gen_parms_override = gen_parms = {
    GenParams.DECODING_METHOD: DecodingMethods.SAMPLE,
    GenParams.MAX_NEW_TOKENS: 50
}

# GENERATE

generated_response = model.generate(prompt=prompt_txt, params=gen_parms_override)

print("Output from generate() method:")
print(json.dumps(generated_response, indent=2))

# GENERATE TEXT

generated_text_response = model.generate_text(prompt=prompt_txt, params=gen_parms_override)

print("Output from generate_text() method:")
print(generated_text_response)

# GENERATE STREAM

gen_stream_params = {
  GenParams.DECODING_METHOD: DecodingMethods.GREEDY,
  GenParams.TEMPERATURE: 0.5,
  GenParams.MIN_NEW_TOKENS: 10,
  GenParams.MAX_NEW_TOKENS: 20
}

print("Output from generate_text_stream() method:")
stream_response = model.generate_text_stream(prompt=prompt_txt, params=gen_stream_params)

for chunk in stream_response:
  print(chunk, end='')
Note:

To use the sample code, you must replace {region}, {my-IBM-Cloud-API-key}, and {my-project-ID} with valid values for your environment.

Output from generate() method:

{
  "model_id": "google/flan-t5-xxl",
  "created_at": "2023-07-27T03:40:17.575Z",
  "results": [
    {
      "generated_text": "will discuss the new product line.",
      "generated_token_count": 8,
      "input_token_count": 10,
      "stop_reason": "EOS_TOKEN"
    }
  ],
  ...
}

Output from generate_text() method:

will discuss the new product line.

Output from generate_text_stream() method:

will discuss the new product line. Let's start with marketing plans

Removing harmful content

When you submit a prompt to a foundation model, the hate, abuse, and profanity (HAP) filter and the personally-identifiable information (PII) filter are disabled by default. You can enable them and specify the sensitivity of the HAP filter by using the modifications field in the API.

For more information about how the filter works, see Removing harmful language from model input and output.

To enable the filters with default settings applied when using the Python library, include the following parameter in the request:

response = model.generate(prompt,guardrails=True)

The following code example shows you how to enable and configure the filters.

guardrails_hap_params = {
  GenTextModerationsMetaNames.INPUT: False,
  GenTextModerationsMetaNames.THRESHOLD: 0.45
}
guardrails_pii_params = {
  GenTextModerationsMetaNames.INPUT: False,
  GenTextModerationsMetaNames.OUTPUT: True,
  GenTextModerationsMetaNames.MASK: {"remove_entity_value": True}
}

response = model.generate(prompt,
  guardrails=True,
  guardrails_hap_params=guardrails_hap_params,
  guardrails_pii_params=guardrails_pii_params)

Parent topic: Python library

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more