The manual coding approach for deploying AI services involves writing custom code to deploy and manage AI services. This approach provides full control over the deployment process and allows for customization to meet specific requirements.
Process overview
The following graphic illustrates the process of coding AI services.
You can create a notebook that contains the AI service and connections within the Project. The AI service captures the logic of your RAG application and contains the generation function, which is a deployable unit of code. The generation function is promoted to the deployment space, which is used to create a deployment. The deployment is exposed as a REST API endpoint that can be accessed by other applications. You can send a request to the REST API endpoint to use the deployed AI service for inferencing. The deployed AI service processes the request and returns a response.
Tasks for creating and deploying AI services
Follow these steps to create, deploy, and manage AI services:
- Create an AI service: Define an AI service in a notebook by using Python. The AI service must meet specific requirements for deploying as an AI service.
- Test AI service: Test the coding logic of your AI service locally.
- Create AI service assets: After you create and test the AI service, you must package the AI service as a deployable asset.
- Deploy AI service assets: Deploy the AI service asset as an online or a batch deployment.
- Testing AI service deployment: Test your deployed AI service for online inferencing or batch scoring.
- Manage AI services: Access and update the deployment details. Scale or delete the deployment from the user interface or programmatically.
Creating AI services in a notebook
To deploy an AI service, you can create an AI service directly in a notebook. You must define your AI service in Python and it must meet certain requirements. To deploy an AI service, you must create a watsonx.ai Runtime repository asset and upload the Python file to the asset.
Defining an AI service with Python client library
To define an AI service in a notebook by using the watsonx.ai Python client library, follow these steps:
-
To work with AI service in Python, install the
ibm-watsonx-ai
Python SDK:pip install ibm-watsonx-ai
-
After you install the Python client library, initialize the client and set the default deployment space:
from ibm_watsonx_ai import APIClient from ibm_watsonx_ai import Credentials credentials = Credentials( url=url, api_key=apikey ) client = APIClient(credentials) client.set.default_space(space_id=space_id)
-
Define your AI service in Python by using the following layout. Depending on your use case, you must include at least one of these functions as a nested function:
generate()
generate_stream()
generate_batch()
For more information, see Requirements for creating an AI service.
def basic_generate_demo(context, model="google/flan-t5-xl", **parameters): # "parameters" is a reserved argument and will be enabled in future # generate token from task credentials api task_token = context.generate_token() def generate(context): user_token = context.get_token() # extract token from header user_headers = context.get_headers() json_body = context.get_json() # example 1: json return { "headers": { "Content-Type": "application/json", "user-custom-header": "my-header-x1", }, "body": { "model": model }, } def generate_stream(context): user_token = context.get_token() # extract token from header user_headers = context.get_headers() json_body = context.get_json() # return a generator data_to_stream = json_body.get("sse", "Default message!") for x in data_to_stream: yield x def generate_batch(input_data_references, output_data_reference): # generate token from task credentials api task_token = context.generate_token() # do something. # ... return generate, generate_stream, generate_batch
Requirements for defining an AI service
The AI service captures the logic of your generative AI use case (such as a Retrieval-augmented generation application) and handles the REST API call to the deployment endpoint /ml/v4/deployments
.
Follow these guidelines to define an AI service:
-
Create a Python function. You can specify any name for your function. To learn more about the function parameters, see the watsonx.ai REST API documentation.
-
Depending on your use case, the Python function that you want to deploy must include at least one of these functions as a nested function in its scope:
generate()
generate_stream()
generate_batch()
-
When you use the watsonx.ai Python client library to save the Python function that contains a reference to an outer function, only the code in the scope of the outer function (including its nested functions) is saved. The code outside the outer function's scope is not be saved and therefore is not be available when you deploy the function.
Guidelines for defining the generate() function
The generate()
function can be used to process your authorization token. This function handles the REST call to the inference endpoint /ml/v4/deployments/{id_or_name}/ai_service
.
Follow these guidelines to define the generate()
function inside the AI service:
- You must use the name
generate
to define the function. - You can only provide one argument to the
generate()
function:context
. - The
generate()
function must return a value of the data typedict
(dictionary). - Optional: You can optionally specify the
body
orheader
keys.
Example
def generate(context):
user_token = context.get_token()
headers = context.get_headers()
json_body = context.get_json()
return {
"headers": {
"Content-Type": "text/plain"
},
"body": "Hello WatsonX"
}
Guidelines for defining the generate_stream() function
You can use the generate_stream()
function for generative AI use cases that require streaming. This function handles the REST call to the Server-Sent Events (SSE) inference endpoint POST /ml/v4/deployments/{id_or_name}/ai_service_stream
.
Follow these guidelines to define the generate_stream()
function inside the AI service:
- You must use the name
generate_stream
to define the function. - You can only provide one argument to the
generate_stream()
function:context
.
Example
def generate_stream(context):
user_token = context.get_token()
headers = context.get_headers()
json_body = context.get_json()
for x in ["Hello", "WatsonX", "!"]:
yield x
Output
id: 1
event: message
data: Hello
id: 2
event: message
data: WatsonX
id: 3
event: message
data: !
id: 4
event: eos
Guidelines for defining the generate_batch() function
The generate_batch()
function can be used for use cases that require batch inferencing. This function handles the REST API call to the jobs endpoint /ml/v4/deployments_jobs
.
Follow these guidelines to define the generate_batch()
function inside the AI service:
- You must use the name
generate_batch()
to define the function.
Example
def generate_batch(input_data_references: list[dict], output_data_reference: dict):
# context from outer function is visible
batch_token = context.generate_token()
print(f"batch_token: {batch_token[-5:]}", flush=True)
print(
f"generate_batch:\n{input_data_references=}\n{output_data_reference=}",
flush=True,
)
Sample code to create an AI service
The sample code defines an AI service deployable_ai_service_f1
. When a REST API request is sent to the /ml/v4/deployments
endpoint, deployable_ai_service_f1
is called. The function takes a JSON input
payload and includes the following nested functions as part of it's scope:
generate()
: Makes a REST API call to the/ml/v4/deployments/{id_or_name}/ai_service
endpoint. It takes in a context object, extracts the token, headers, and JSON body, and returns a response based on the mode key in the JSON body. The response format can be JSON, bytes, or string, with optional custom headers.generate_stream()
: Makes a REST API call to the SSE (Server-Sent Events) inference endpoint/ml/v4/deployments/{id_or_name}/ai_service_stream
. It takes in a context object, extracts the token, headers, and JSON body, and returns a stream of SSE events that are indicated byeos
(End of Stream).generate_batch()
: Makes a REST API call to the jobs endpoint/ml/v4/deployments_jobs
. It takes ininput_data_references
andoutput_data_reference
from the request JSON body, generates a batch token, and logs the input and output data references.
def deployable_ai_service_f1(context, params={"k1": "v1"}, **custom):
"""
The outer function handles the REST call to the deployment endpoint
POST /ml/v4/deployments
context.generate_token() - generate a token from the task credentials
To use `generate` and `generate_stream`, the deployment has to be ONLINE
To use `generate_batch`, the deployment has to be BATCH
"""
task_token = context.generate_token()
print(f"outer function: {task_token[-5:]}", flush=True)
def generate(context) -> dict:
"""
The `generate` function handles the REST call to the inference endpoint
POST /ml/v4/deployments/{id_or_name}/ai_service
context.get_token() - get the Bearer token from the header of the request
context.get_json() - get the body of the request
context.get_headers() - get the headers of the request
The generate function should return a dict
The following optional keys are supported currently
- body
- headers
This particular example accepts a json body of the format:
{ "mode" : <value> }
Depending on the <value> of the mode, it will return different response
"""
user_token = context.get_token()
headers = context.get_headers()
json_body = context.get_json()
print(f"my_generate: {user_token=}", flush=True)
print(f"request headers: {headers=}", flush=True)
print(f"json body: {json_body=}", flush=True)
match json_body.get("mode", "no-match"):
case "json":
# response Content-Type is "application/json"
return {
"headers": {
"Content-Type": "application/json",
"User-Defined-Head": "x-genai",
},
"body": {
"user_token": user_token[-5:],
"task_token": task_token[-5:],
"json_body": json_body,
"params": params,
"custom": custom,
},
}
case "json-no-header":
# response Content-Type is "application/json"
return {
"body": {
"user_token": user_token[-5:],
"task_token": task_token[-5:],
"json_body": json_body,
"params": params,
"custom": custom,
},
}
case "json-custom-header":
# response Content-Type is "text/plain; charset=utf-8; test-2"
return {
"headers": {
"Content-Type": "text/plain; charset=utf-8; test-2",
"User-Defined-Head": "x-genai",
},
"body": {
"user_token": user_token[-5:],
"task_token": task_token[-5:],
"json_body": json_body,
"params": params,
"custom": custom,
},
}
case "bytes":
# response Content-Type is "application/octet-stream"
return {
"headers": {
"Content-Type": "application/octet-stream",
"User-Defined-Head": "x-genai",
},
"body": b"12345678910",
}
case "bytes-no-header":
# response Content-Type is 'text/html; charset=utf-8'
return {
"body": b"12345678910",
}
case "bytes-custom-header":
# response Content-Type is "text/plain; charset=utf-8; test-2"
return {
"headers": {
"Content-Type": "text/plain; charset=utf-8; test-2",
"User-Defined-Head": "x-genai",
},
"body": b"12345678910",
}
case "str":
# response Content-Type is "text/plain"
return {
"headers": {
"Content-Type": "text/plain",
"User-Defined-Head": "x-genai",
},
"body": f"Hello WatsonX: {json_body}",
}
case "str-no-header":
# response Content-Type is "text/html; charset=utf-8"
return {
"body": f"Hello WatsonX: {json_body}",
}
case "str-custom-header":
# response Content-Type is "application/octet-stream; charset=utf-8; test-2"
return {
"headers": {
"Content-Type": "application/octet-stream; charset=utf-8; test-2",
"User-Defined-Head": "x-genai",
},
"body": f"Hello WatsonX: {json_body}",
}
case "negative-str-return":
# Bad request
return "Should give 400 bad request"
case _:
# response Content-Type is "text/html; charset=utf-8"
return {"body": "No match"}
def generate_stream(context):
"""
The generate_stream function handles the REST call to the SSE inference endpoint
POST /ml/v4/deployments/{id_or_name}/ai_service_stream
context.get_token() - get the Bearer token from the header of the request
context.get_json() - get the body of the request
context.get_headers() - get the headers of the request
The generate_stream function be a python `generator` with yield
The data in yield will the "data" for the SSE event
Example: The following request json
{ "sse": ["Hello" , "", "WatsonX"," ", "!"]}
will return the following stream of events
--------------
id: 1
event: message
data: Hello
id: 2
event: message
data:
id: 3
event: message
data: WatsonX
id: 4
event: message
data:
id: 5
event: message
data: !
id: 6
event: eos
---------------
The end of the stream will be marked by the event "eos"
"""
user_token = context.get_token()
headers = context.get_headers()
json_body = context.get_json()
print(f"generate_stream: {user_token=}", flush=True)
print(f"generate_stream: {headers=}", flush=True)
print(f"generate_stream: {json_body=}", flush=True)
import time
for x in json_body.get("sse", ["default", "message"]):
time.sleep(1)
yield x
def generate_batch(input_data_references: list[dict], output_data_reference: dict) -> None:
"""
The generate_batch function handles the REST jobs endpoint
POST /ml/v4/deployments_jobs
Arguments to the function are from the json body of the request to jobs
- input_data_references : scoring.input_data_references
- output_data_reference : scoring.output_data_reference
context.generate_token() : can access context object
from outer function scope if token is required
"""
batch_token = context.generate_token()
print(f"batch_token: {batch_token[-5:]}", flush=True)
print(
f"generate_batch:\n{input_data_references=}\n{output_data_reference=}",
flush=True,
)
return generate, generate_stream, generate_batch
Testing AI services
After you create your AI service, you can test the coding logic of your AI service by using the watsonx.ai Python client library.
Testing AI services with Python client library
To test the logic of your AI service locally by using the RuntimeContext
class of the watsonx.ai Python client library, follow these steps:
-
Use the
RuntimeContext
class of the Python client library to test your AI service locally:from ibm_watsonx_ai.deployments import RuntimeContext context = RuntimeContext( api_client=client, request_payload_json={} ) # custom is optional argument which is specified during the time of creation of deployment custom_object = {"space_id": space_id} generate, generate_stream, generate_batch = basic_generate_demo(context, **custom_object)
For more information, see watsonx.ai Python client library documentation for using
RuntimeContext
for AI services. -
Depending on your use case, you can test the
generate()
,generate_stream()
, orgenerate_batch()
functions as follows:-
To test the generate() function:
context.request_payload_json = { "test": "ai_service inference payload"} print(generate(context))
-
To test the generate_stream() function:
context.request_payload_json = {"sse": ["ai_service_stream", "inference", "test"]} for data in generate_stream(context): print(data)
-
To test the generate_batch() function:
input_data_references = [ { "type": "connection_asset", "connection": {"id": "2d07a6b4-8fa9-43ab-91c8-befcd9dab8d2"}, "location": { "bucket": "wml-v4-fvt-batch-pytorch-connection-input", "file_name": "testing-123", }, } ] output_data_reference = { "type": "data_asset", "location": {"name": "nb-pytorch_output.zip"}, } generate_batch(input_data_references, output_data_reference)
-
Creating AI service assets
To deploy an AI service, you must create a repository asset in watsonx.ai Runtime that contains the AI service and upload the Python file to the asset.
Requirements for creating AI service assets
When you use an integrated development environment (IDE) such as VSCode, Eclipse, PyCharm, or more to build your generative AI application, you must create a Python file to store your AI service. After you define the function, you must compress
the AI service to create a gzip
archive (.gz
file format).
When you use the watsonx.ai Python client library to create your AI service asset, the library automatically stores the function in gzip
archive for you. However, when you create an AI service asset by using the REST API, you
must follow the process of manually compressing your Python file in a gzip
archive.
You must use the runtime-24.1-py3.11
software specification to create and deploy an AI service asset that is coded in Python.
Creating AI service assets with Python client library
You can use the store_ai_service
function of the watsonx.ai Python client library to create an AI service asset.
The following code sample shows how to create an AI service asset by using the Python client library:
documentation_request = {
"application/json": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"query": {"type": "string"},
"parameters": {
"properties": {
"max_new_tokens": {"type": "integer"},
"top_p": {"type": "number"},
},
"required": ["max_new_tokens", "top_p"],
},
},
"required": ["query"],
}
}
documentation_response = {
"application/json": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {"query": {"type": "string"}, "result": {"type": "string"}},
"required": ["query", "result"],
}
}
meta_props = {
client.repository.AIServiceMetaNames.NAME: "AI service example",
client.repository.AIServiceMetaNames.DESCRIPTION: "This is AI service function",
client.repository.AIServiceMetaNames.SOFTWARE_SPEC_ID: client.software_specifications.get_id_by_name(
"runtime-24.1-py3.11"
),
client.repository.AIServiceMetaNames.REQUEST_DOCUMENTATION: documentation_request,
client.repository.AIServiceMetaNames.RESPONSE_DOCUMENTATION: documentation_response,
}
stored_ai_service_details = client.repository.store_ai_service(
basic_generate_demo, meta_props
)
ai_service_id = client.repository.get_ai_service_id(stored_ai_service_details)
print("The AI service asset id:", ai_service_id)
- The
REQUEST_DOCUMENTATION
andRESPONSE_DOCUMENTATION
parameters are optional. You can use these parameters to store the schema of the request and response ofgenerate
andgenerate_stream
functions. - The function call
client.repository.store_ai_service
saves the AI service functionbasic_generate_demo
into agzip
file internally.
For more information, see watsonx.ai Python client library documentation for creating an AI service asset.
Creating an AI service asset with REST API
You can use the /ml/v4/ai_services
REST API endpoint to create the AI services asset in the watsonx.ai Runtime repository. For more information, see watsonx.ai REST API documentation.
Deploying AI service assets
Depending on your use case, you can create an online or a batch deployment for your AI services asset from your deployment space. Deploy your AI service programmatically by using the watsonx.ai REST API, or Python client library.
Types of deployments for AI service
Depending on your use case, you can deploy the AI service asset as an online or a batch deployment. Choose the deployment type based on the functions used in the AI service.
- You must create an online deployment for your AI service asset for online scoring (AI service contains the
generate()
function) or streaming applications (AI service contains thegenerate_stream()
function). - You must create a batch deployment for your AI service asset for batch scoring applications (AI service contains the
generate_batch()
function).
Prerequisites
- You must set up your task credentials for deploying your AI services. For more information, see Adding task credentials.
- You must promote your AI services asset to your deployment space.
Deploying AI services with Python client library
You can create an online or a batch deployment for your AI service asset by using the Python client library.
Creating online deployment
The following example shows how to create an online deployment for your AI service by using the watsonx.ai Python client library:
deployment_details = client.deployments.create(
artifact_id=ai_service_id,
meta_props={
client.deployments.ConfigurationMetaNames.NAME: "ai-service - online test",
client.deployments.ConfigurationMetaNames.ONLINE: {},
client.deployments.ConfigurationMetaNames.HARDWARE_SPEC: {
"id": client.hardware_specifications.get_id_by_name("XS")
},
},
)
deployment_id = client.deployments.get_uid(deployment_details)
print("The deployment id:", deployment_id)
Creating batch deployment
The following example shows how to create a batch deployment for your AI service by using the watsonx.ai Python client library:
deployment_details = client.deployments.create(
artifact_id=ai_service_id,
meta_props={
client.deployments.ConfigurationMetaNames.NAME: f"ai-service - batch",
client.deployments.ConfigurationMetaNames.BATCH: {},
client.deployments.ConfigurationMetaNames.HARDWARE_SPEC: {
"id": client.hardware_specifications.get_id_by_name("XS")
},
},
)
deployment_id = client.deployments.get_uid(deployment_details)
print("The batch deployment id:", deployment_id)
Deploying AI services with REST API
You can use the /ml/v4/deployments
watsonx.ai REST API endpoint to create an online or a batch deployment for your AI service asset.
Creating online deployment
The following example shows how to create an online deployment for your AI service by using the REST API:
# POST /ml/v4/deployments
response = requests.post(
f'{HOST}/ml/v4/deployments?version={VERSION}',
headers=headers,
verify=False,
json={
"space_id": space_id,
"name": "genai flow online",
"custom": {
"key1": "value1",
"key2": "value2",
"model": "meta-llama/llama-3-8b-instruct"
},
"asset": {
"id": asset_id
},
"online": {}
}
)
Creating batch deployment
The following example shows how to create a batch deployment for your AI service by using the REST API:
response = requests.post(
f'{HOST}/ml/v4/deployments?version={VERSION}',
headers=headers,
verify=False,
json={
"hardware_spec": {
"id": "........",
"num_nodes": 1
},
"space_id": space_id,
"name": "ai service batch dep",
"custom": {
"key1": "value1",
"key2": "value2",
"model": "meta-llama/llama-3-8b-instruct"
},
"asset": {
"id": asset_id
},
"batch": {}
}
)
print(f'POST {HOST}/ml/v4/deployments?version={VERSION}', response.status_code)
print(json.dumps(response.json(), indent=2))
dep_id = response.json()["metadata"]["id"]
print(f"{dep_id=}")
Sample notebooks for creating and deploying AI services
To learn how to create and deploy AI services programmatically, see the following sample notebooks:
Sample name | Framework | Techniques demonstrated |
---|---|---|
Use watsonx and meta-llama/llama-3-2-11b-vision-instruct to run as an AI service |
Python | Set up the environment Create an AI service Test AI service's function locally Deploy AI service Running an AI service |
Use watsonx, Elasticsearch, and LangChain to answer questions (RAG) | LangChain | Set up the environment Download the test dataset. Defining the foundation model on watsonx Set up connectivity information to Elasticsearch Generate a retrieval-augmented response to a question Creating an AI service Testing AI service function locally Deploying the AI service |
Use watsonx and meta-llama/llama-3-1-70b-instruct to create AI service |
LangGraph | Set up the environment Create an AI service Test AI service's function locally Deploy AI service Running an AI service |
Parent topic: Deploying AI services with code