AIサービス資産を作成する: After you create and test the AI service, you must package the AI service as a deployable asset.
AIサービス資産を展開する: Deploy the AI service asset as an online or a batch deployment.
AI デプロイメントテスト: オンライン推論またはバッチスコアリング用にデプロイされたAIサービスをテストします。
AIサービスの管理: Access and update the deployment details. ユーザーインターフェースまたはプログラムから、 デプロイメントの規模を変更または削除します。
ノートブックでAIサービスを作成する
Copy link to section
AIサービスを展開するには、ノートブックに直接AIサービスを作成することができます。 Python でAIサービスを定義する必要があり、特定の要件を満たす必要があります。 To deploy an AI service, you must create a watsonx.ai Runtime repository asset and upload the Python file to the asset.
defbasic_generate_demo(context, model="google/flan-t5-xl", **parameters):
# "parameters" is a reserved argument and will be enabled in future# generate token from task credentials api
task_token = context.generate_token()
defgenerate(context):
user_token = context.get_token() # extract token from header
user_headers = context.get_headers()
json_body = context.get_json()
# example 1: jsonreturn {
"headers": {
"Content-Type": "application/json",
"user-custom-header": "my-header-x1",
},
"body": {
"model": model
},
}
defgenerate_stream(context):
user_token = context.get_token() # extract token from header
user_headers = context.get_headers()
json_body = context.get_json()
# return a generator
data_to_stream = json_body.get("sse", "Default message!")
for x in data_to_stream:
yield x
defgenerate_batch(input_data_references, output_data_reference):
# generate token from task credentials api
task_token = context.generate_token()
# do something.# ...return generate, generate_stream, generate_batch
Copy to clipboardクリップボードにコピーされましたShow more
AIサービスの定義に必要な要件
Copy link to section
The AI service captures the logic of your generative AI use case (such as a Retrieval-augmented generation application) and handles the REST API call to the deployment endpoint /ml/v4/deployments.
defgenerate_stream(context):
user_token = context.get_token()
headers = context.get_headers()
json_body = context.get_json()
for x in ["Hello", "WatsonX", "!"]:
yield x
defdeployable_ai_service_f1(context, params={"k1": "v1"}, **custom):
"""
The outer function handles the REST call to the deployment endpoint
POST /ml/v4/deployments
context.generate_token() - generate a token from the task credentials
To use `generate` and `generate_stream`, the deployment has to be ONLINE
To use `generate_batch`, the deployment has to be BATCH
"""
task_token = context.generate_token()
print(f"outer function: {task_token[-5:]}", flush=True)
defgenerate(context) -> dict:
"""
The `generate` function handles the REST call to the inference endpoint
POST /ml/v4/deployments/{id_or_name}/ai_service
context.get_token() - get the Bearer token from the header of the request
context.get_json() - get the body of the request
context.get_headers() - get the headers of the request
The generate function should return a dict
The following optional keys are supported currently
- body
- headers
This particular example accepts a json body of the format:
{ "mode" : <value> }
Depending on the <value> of the mode, it will return different response
"""
user_token = context.get_token()
headers = context.get_headers()
json_body = context.get_json()
print(f"my_generate: {user_token=}", flush=True)
print(f"request headers: {headers=}", flush=True)
print(f"json body: {json_body=}", flush=True)
match json_body.get("mode", "no-match"):
case "json":
# response Content-Type is "application/json"return {
"headers": {
"Content-Type": "application/json",
"User-Defined-Head": "x-genai",
},
"body": {
"user_token": user_token[-5:],
"task_token": task_token[-5:],
"json_body": json_body,
"params": params,
"custom": custom,
},
}
case "json-no-header":
# response Content-Type is "application/json"return {
"body": {
"user_token": user_token[-5:],
"task_token": task_token[-5:],
"json_body": json_body,
"params": params,
"custom": custom,
},
}
case "json-custom-header":
# response Content-Type is "text/plain; charset=utf-8; test-2"return {
"headers": {
"Content-Type": "text/plain; charset=utf-8; test-2",
"User-Defined-Head": "x-genai",
},
"body": {
"user_token": user_token[-5:],
"task_token": task_token[-5:],
"json_body": json_body,
"params": params,
"custom": custom,
},
}
case "bytes":
# response Content-Type is "application/octet-stream"return {
"headers": {
"Content-Type": "application/octet-stream",
"User-Defined-Head": "x-genai",
},
"body": b"12345678910",
}
case "bytes-no-header":
# response Content-Type is 'text/html; charset=utf-8'return {
"body": b"12345678910",
}
case "bytes-custom-header":
# response Content-Type is "text/plain; charset=utf-8; test-2"return {
"headers": {
"Content-Type": "text/plain; charset=utf-8; test-2",
"User-Defined-Head": "x-genai",
},
"body": b"12345678910",
}
case "str":
# response Content-Type is "text/plain"return {
"headers": {
"Content-Type": "text/plain",
"User-Defined-Head": "x-genai",
},
"body": f"Hello WatsonX: {json_body}",
}
case "str-no-header":
# response Content-Type is "text/html; charset=utf-8"return {
"body": f"Hello WatsonX: {json_body}",
}
case "str-custom-header":
# response Content-Type is "application/octet-stream; charset=utf-8; test-2"return {
"headers": {
"Content-Type": "application/octet-stream; charset=utf-8; test-2",
"User-Defined-Head": "x-genai",
},
"body": f"Hello WatsonX: {json_body}",
}
case "negative-str-return":
# Bad requestreturn"Should give 400 bad request"
case _:
# response Content-Type is "text/html; charset=utf-8"return {"body": "No match"}
defgenerate_stream(context):
"""
The generate_stream function handles the REST call to the SSE inference endpoint
POST /ml/v4/deployments/{id_or_name}/ai_service_stream
context.get_token() - get the Bearer token from the header of the request
context.get_json() - get the body of the request
context.get_headers() - get the headers of the request
The generate_stream function be a python `generator` with yield
The data in yield will the "data" for the SSE event
Example: The following request json
{ "sse": ["Hello" , "", "WatsonX"," ", "!"]}
will return the following stream of events
--------------
id: 1
event: message
data: Hello
id: 2
event: message
data:
id: 3
event: message
data: WatsonX
id: 4
event: message
data:
id: 5
event: message
data: !
id: 6
event: eos
---------------
The end of the stream will be marked by the event "eos"
"""
user_token = context.get_token()
headers = context.get_headers()
json_body = context.get_json()
print(f"generate_stream: {user_token=}", flush=True)
print(f"generate_stream: {headers=}", flush=True)
print(f"generate_stream: {json_body=}", flush=True)
import time
for x in json_body.get("sse", ["default", "message"]):
time.sleep(1)
yield x
defgenerate_batch(input_data_references: list[dict], output_data_reference: dict) -> None:
"""
The generate_batch function handles the REST jobs endpoint
POST /ml/v4/deployments_jobs
Arguments to the function are from the json body of the request to jobs
- input_data_references : scoring.input_data_references
- output_data_reference : scoring.output_data_reference
context.generate_token() : can access context object
from outer function scope if token is required
"""
batch_token = context.generate_token()
print(f"batch_token: {batch_token[-5:]}", flush=True)
print(
f"generate_batch:\n{input_data_references=}\n{output_data_reference=}",
flush=True,
)
return generate, generate_stream, generate_batch
from ibm_watsonx_ai.deployments import RuntimeContext
context = RuntimeContext(
api_client=client, request_payload_json={}
)
# custom is optional argument which is specified during the time of creation of deployment
custom_object = {"space_id": space_id}
generate, generate_stream, generate_batch = basic_generate_demo(context, **custom_object)
To deploy an AI service, you must create a repository asset in watsonx.ai Runtime that contains the AI service and upload the Python file to the asset.
When you use the watsonx.ai Python client library to create your AI service asset, the library automatically stores the function in gzip archive for you. However, when you create an AI service asset by using the REST API, you must follow the process of manually compressing your Python file in a gzip archive.
You must use the runtime-24.1-py3.11 software specification to create and deploy an AI service asset that is coded in Python.
Creating AI service assets with Python client library
Copy link to section
You can use the store_ai_service function of the watsonx.ai Python client library to create an AI service asset.
The following code sample shows how to create an AI service asset by using the Python client library:
You can use the /ml/v4/ai_services REST API endpoint to create the AI services asset in the watsonx.ai Runtime repository. 詳細については 、 watsonx.ai のREST APIドキュメントをご覧ください。
You must create an オンライン deployment for your AI service asset for online scoring (AI service contains the generate() function) or streaming applications (AI service contains the generate_stream() function).
You must create a バッチ deployment for your AI service asset for batch scoring applications (AI service contains the generate_batch() function).
前提条件
Copy link to section
AI サービスを展開するには、タスクの認証情報を設定する必要があります。 wx 詳細については、 タスクの認証情報を追加するを参照のこと。
AIサービスの資産 デプロイメントスペースにプロモートする必要があります。
Python クライアントライブラリを使用したAIサービスの展開
Copy link to section
You can create an online or a batch deployment for your AI service asset by using the Python client library.
デプロイメントの作成
Copy link to section
The following example shows how to create an online deployment for your AI service by using the watsonx.ai Python client library: