The manual coding approach for deploying AI services involves writing custom code to deploy and manage AI services. This approach provides full control over the deployment process and allows for customization to meet specific requirements.
Process overview
Copy link to section
The following graphic illustrates the process of coding AI services.
You can create a notebook that contains the AI service and connections within the Project. The AI service captures the logic of your RAG application and contains the generation function, which is a deployable unit of code. The generation function
is promoted to the deployment space, which is used to create a deployment. The deployment is exposed as a REST API endpoint that can be accessed by other applications. You can send a request to the REST API endpoint to use the deployed AI
service for inferencing. The deployed AI service processes the request and returns a response.
Tasks for creating and deploying AI services
Copy link to section
Follow these steps to create, deploy, and manage AI services:
Create an AI service: Define an AI service in a notebook by using Python. The AI service must meet specific requirements for deploying as an AI service.
Test AI service: Test the coding logic of your AI service locally.
Create AI service assets: After you create and test the AI service, you must package the AI service as a deployable asset.
Manage AI services: Access and update the deployment details. Scale or delete the deployment from the user interface or programmatically.
Creating AI services in a notebook
Copy link to section
To deploy an AI service, you can create an AI service directly in a notebook. You must define your AI service in Python and it must meet certain requirements. To deploy an AI service, you must create a watsonx.ai Runtime repository asset and
upload the Python file to the asset.
Defining an AI service with Python client library
Copy link to section
To define an AI service in a notebook by using the watsonx.ai Python client library, follow these steps:
To work with AI service in Python, install the ibm-watsonx-ai Python SDK:
pip install ibm-watsonx-ai
Copy to clipboardCopied to clipboard
After you install the Python client library, initialize the client and set the default deployment space:
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials
credentials = Credentials(
url=url, api_key=apikey
)
client = APIClient(credentials)
client.set.default_space(space_id=space_id)
Copy to clipboardCopied to clipboard
Define your AI service in Python by using the following layout. Depending on your use case, you must include at least one of these functions as a nested function:
defbasic_generate_demo(context, model="google/flan-t5-xl", **parameters):
# "parameters" is a reserved argument and will be enabled in future# generate token from task credentials api
task_token = context.generate_token()
defgenerate(context):
user_token = context.get_token() # extract token from header
user_headers = context.get_headers()
json_body = context.get_json()
# example 1: jsonreturn {
"headers": {
"Content-Type": "application/json",
"user-custom-header": "my-header-x1",
},
"body": {
"model": model
},
}
defgenerate_stream(context):
user_token = context.get_token() # extract token from header
user_headers = context.get_headers()
json_body = context.get_json()
# return a generator
data_to_stream = json_body.get("sse", "Default message!")
for x in data_to_stream:
yield x
defgenerate_batch(input_data_references, output_data_reference):
# generate token from task credentials api
task_token = context.generate_token()
# do something.# ...return generate, generate_stream, generate_batch
Copy to clipboardCopied to clipboardShow more
Requirements for defining an AI service
Copy link to section
The AI service captures the logic of your generative AI use case (such as a Retrieval-augmented generation application) and handles the REST API call to the deployment endpoint /ml/v4/deployments.
Follow these guidelines to define an AI service:
Create a Python function. You can specify any name for your function. To learn more about the function parameters, see the watsonx.ai REST API documentation.
Depending on your use case, the Python function that you want to deploy must include at least one of these functions as a nested function in its scope:
generate()
generate_stream()
generate_batch()
When you use the watsonx.ai Python client library to save the Python function that contains a reference to an outer function, only the code in the scope of the outer function (including its nested functions) is saved. The code outside
the outer function's scope is not be saved and therefore is not be available when you deploy the function.
Guidelines for defining the generate() function
Copy link to section
The generate() function can be used to process your authorization token. This function handles the REST call to the inference endpoint /ml/v4/deployments/{id_or_name}/ai_service.
Follow these guidelines to define the generate() function inside the AI service:
You must use the name generate to define the function.
You can only provide one argument to the generate() function: context.
The generate() function must return a value of the data type dict (dictionary).
Optional: You can optionally specify the body or header keys.
Guidelines for defining the generate_stream() function
Copy link to section
You can use the generate_stream() function for generative AI use cases that require streaming. This function handles the REST call to the Server-Sent Events (SSE) inference endpoint POST /ml/v4/deployments/{id_or_name}/ai_service_stream.
Follow these guidelines to define the generate_stream() function inside the AI service:
You must use the name generate_stream to define the function.
You can only provide one argument to the generate_stream() function: context.
Example
defgenerate_stream(context):
user_token = context.get_token()
headers = context.get_headers()
json_body = context.get_json()
for x in ["Hello", "WatsonX", "!"]:
yield x
Copy to clipboardCopied to clipboard
Output
id: 1 event: message data: Hello
id: 2 event: message data: WatsonX
id: 3 event: message data: !
id: 4 event: eos
Guidelines for defining the generate_batch() function
Copy link to section
The generate_batch() function can be used for use cases that require batch inferencing. This function handles the REST API call to the jobs endpoint /ml/v4/deployments_jobs.
Follow these guidelines to define the generate_batch() function inside the AI service:
You must use the name generate_batch() to define the function.
Example
defgenerate_batch(input_data_references: list[dict], output_data_reference: dict):
# context from outer function is visible
batch_token = context.generate_token()
print(f"batch_token: {batch_token[-5:]}", flush=True)
print(
f"generate_batch:\n{input_data_references=}\n{output_data_reference=}",
flush=True,
)
Copy to clipboardCopied to clipboard
Sample code to create an AI service
Copy link to section
The sample code defines an AI service deployable_ai_service_f1. When a REST API request is sent to the /ml/v4/deployments endpoint, deployable_ai_service_f1 is called. The function takes a JSON input
payload and includes the following nested functions as part of it's scope:
generate(): Makes a REST API call to the /ml/v4/deployments/{id_or_name}/ai_service endpoint. It takes in a context object, extracts the token, headers, and JSON body, and returns a response based on the mode key
in the JSON body. The response format can be JSON, bytes, or string, with optional custom headers.
generate_stream(): Makes a REST API call to the SSE (Server-Sent Events) inference endpoint /ml/v4/deployments/{id_or_name}/ai_service_stream. It takes in a context object, extracts the token, headers, and JSON
body, and returns a stream of SSE events that are indicated by eos (End of Stream).
generate_batch(): Makes a REST API call to the jobs endpoint /ml/v4/deployments_jobs. It takes in input_data_references and output_data_reference from the request JSON body, generates a
batch token, and logs the input and output data references.
defdeployable_ai_service_f1(context, params={"k1": "v1"}, **custom):
"""
The outer function handles the REST call to the deployment endpoint
POST /ml/v4/deployments
context.generate_token() - generate a token from the task credentials
To use `generate` and `generate_stream`, the deployment has to be ONLINE
To use `generate_batch`, the deployment has to be BATCH
"""
task_token = context.generate_token()
print(f"outer function: {task_token[-5:]}", flush=True)
defgenerate(context) -> dict:
"""
The `generate` function handles the REST call to the inference endpoint
POST /ml/v4/deployments/{id_or_name}/ai_service
context.get_token() - get the Bearer token from the header of the request
context.get_json() - get the body of the request
context.get_headers() - get the headers of the request
The generate function should return a dict
The following optional keys are supported currently
- body
- headers
This particular example accepts a json body of the format:
{ "mode" : <value> }
Depending on the <value> of the mode, it will return different response
"""
user_token = context.get_token()
headers = context.get_headers()
json_body = context.get_json()
print(f"my_generate: {user_token=}", flush=True)
print(f"request headers: {headers=}", flush=True)
print(f"json body: {json_body=}", flush=True)
match json_body.get("mode", "no-match"):
case "json":
# response Content-Type is "application/json"return {
"headers": {
"Content-Type": "application/json",
"User-Defined-Head": "x-genai",
},
"body": {
"user_token": user_token[-5:],
"task_token": task_token[-5:],
"json_body": json_body,
"params": params,
"custom": custom,
},
}
case "json-no-header":
# response Content-Type is "application/json"return {
"body": {
"user_token": user_token[-5:],
"task_token": task_token[-5:],
"json_body": json_body,
"params": params,
"custom": custom,
},
}
case "json-custom-header":
# response Content-Type is "text/plain; charset=utf-8; test-2"return {
"headers": {
"Content-Type": "text/plain; charset=utf-8; test-2",
"User-Defined-Head": "x-genai",
},
"body": {
"user_token": user_token[-5:],
"task_token": task_token[-5:],
"json_body": json_body,
"params": params,
"custom": custom,
},
}
case "bytes":
# response Content-Type is "application/octet-stream"return {
"headers": {
"Content-Type": "application/octet-stream",
"User-Defined-Head": "x-genai",
},
"body": b"12345678910",
}
case "bytes-no-header":
# response Content-Type is 'text/html; charset=utf-8'return {
"body": b"12345678910",
}
case "bytes-custom-header":
# response Content-Type is "text/plain; charset=utf-8; test-2"return {
"headers": {
"Content-Type": "text/plain; charset=utf-8; test-2",
"User-Defined-Head": "x-genai",
},
"body": b"12345678910",
}
case "str":
# response Content-Type is "text/plain"return {
"headers": {
"Content-Type": "text/plain",
"User-Defined-Head": "x-genai",
},
"body": f"Hello WatsonX: {json_body}",
}
case "str-no-header":
# response Content-Type is "text/html; charset=utf-8"return {
"body": f"Hello WatsonX: {json_body}",
}
case "str-custom-header":
# response Content-Type is "application/octet-stream; charset=utf-8; test-2"return {
"headers": {
"Content-Type": "application/octet-stream; charset=utf-8; test-2",
"User-Defined-Head": "x-genai",
},
"body": f"Hello WatsonX: {json_body}",
}
case "negative-str-return":
# Bad requestreturn"Should give 400 bad request"
case _:
# response Content-Type is "text/html; charset=utf-8"return {"body": "No match"}
defgenerate_stream(context):
"""
The generate_stream function handles the REST call to the SSE inference endpoint
POST /ml/v4/deployments/{id_or_name}/ai_service_stream
context.get_token() - get the Bearer token from the header of the request
context.get_json() - get the body of the request
context.get_headers() - get the headers of the request
The generate_stream function be a python `generator` with yield
The data in yield will the "data" for the SSE event
Example: The following request json
{ "sse": ["Hello" , "", "WatsonX"," ", "!"]}
will return the following stream of events
--------------
id: 1
event: message
data: Hello
id: 2
event: message
data:
id: 3
event: message
data: WatsonX
id: 4
event: message
data:
id: 5
event: message
data: !
id: 6
event: eos
---------------
The end of the stream will be marked by the event "eos"
"""
user_token = context.get_token()
headers = context.get_headers()
json_body = context.get_json()
print(f"generate_stream: {user_token=}", flush=True)
print(f"generate_stream: {headers=}", flush=True)
print(f"generate_stream: {json_body=}", flush=True)
import time
for x in json_body.get("sse", ["default", "message"]):
time.sleep(1)
yield x
defgenerate_batch(input_data_references: list[dict], output_data_reference: dict) -> None:
"""
The generate_batch function handles the REST jobs endpoint
POST /ml/v4/deployments_jobs
Arguments to the function are from the json body of the request to jobs
- input_data_references : scoring.input_data_references
- output_data_reference : scoring.output_data_reference
context.generate_token() : can access context object
from outer function scope if token is required
"""
batch_token = context.generate_token()
print(f"batch_token: {batch_token[-5:]}", flush=True)
print(
f"generate_batch:\n{input_data_references=}\n{output_data_reference=}",
flush=True,
)
return generate, generate_stream, generate_batch
Copy to clipboardCopied to clipboardShow more
Testing AI services
Copy link to section
After you create your AI service, you can test the coding logic of your AI service by using the watsonx.ai Python client library.
Testing AI services with Python client library
Copy link to section
To test the logic of your AI service locally by using the RuntimeContext class of the watsonx.ai Python client library, follow these steps:
Use the RuntimeContext class of the Python client library to test your AI service locally:
from ibm_watsonx_ai.deployments import RuntimeContext
context = RuntimeContext(
api_client=client, request_payload_json={}
)
# custom is optional argument which is specified during the time of creation of deployment
custom_object = {"space_id": space_id}
generate, generate_stream, generate_batch = basic_generate_demo(context, **custom_object)
To deploy an AI service, you must create a repository asset in watsonx.ai Runtime that contains the AI service and upload the Python file to the asset.
Requirements for creating AI service assets
Copy link to section
When you use an integrated development environment (IDE) such as VSCode, Eclipse, PyCharm, or more to build your generative AI application, you must create a Python file to store your AI service. After you define the function, you must compress
the AI service to create a gzip archive (.gz file format).
When you use the watsonx.ai Python client library to create your AI service asset, the library automatically stores the function in gzip archive for you. However, when you create an AI service asset by using the REST API, you
must follow the process of manually compressing your Python file in a gzip archive.
You must use the runtime-24.1-py3.11 software specification to create and deploy an AI service asset that is coded in Python.
Creating AI service assets with Python client library
Copy link to section
You can use the store_ai_service function of the watsonx.ai Python client library to create an AI service asset.
The following code sample shows how to create an AI service asset by using the Python client library:
The REQUEST_DOCUMENTATION and RESPONSE_DOCUMENTATION parameters are optional. You can use these parameters to store the schema of the request and response of generate and generate_stream functions.
The function call client.repository.store_ai_service saves the AI service function basic_generate_demo into a gzip file internally.
You can use the /ml/v4/ai_services REST API endpoint to create the AI services asset in the watsonx.ai Runtime repository. For more information, see watsonx.ai REST API documentation.
Deploying AI service assets
Copy link to section
Depending on your use case, you can create an online or a batch deployment for your AI services asset from your deployment space. Deploy your AI service programmatically by using the watsonx.ai REST API, or Python client library.
Types of deployments for AI service
Copy link to section
Depending on your use case, you can deploy the AI service asset as an online or a batch deployment. Choose the deployment type based on the functions used in the AI service.
You must create an online deployment for your AI service asset for online scoring (AI service contains the generate() function) or streaming applications (AI service contains the generate_stream() function).
You must create a batch deployment for your AI service asset for batch scoring applications (AI service contains the generate_batch() function).
Prerequisites
Copy link to section
You must set up your task credentials for deploying your AI services. For more information, see Adding task credentials.
You must promote your AI services asset to your deployment space.
Deploying AI services with Python client library
Copy link to section
You can create an online or a batch deployment for your AI service asset by using the Python client library.
Creating online deployment
Copy link to section
The following example shows how to create an online deployment for your AI service by using the watsonx.ai Python client library:
Set up the environment Download the test dataset. Defining the foundation model on watsonx Set up connectivity information to Elasticsearch Generate a retrieval-augmented response to a question Creating an AI
service Testing AI service function locally Deploying the AI service
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.