An AI service is a deployable unit of code that you can use to capture the logic of your generative AI use cases. When your AI services are successfully deployed, you can use the endpoint for inferencing from your application.
Deploying generative AI applications with AI services
While Python functions are the traditional way to deploy machine learning assets, AI services offer a more flexible option to deploy code for generative AI applications like streaming.
Unlike the standard Python function for deploying a predictive machine learning model, which requires input in a fixed schema, an AI service provides flexibility for multiple inputs and allows for customization.
AI services offer a secure solution to deploy your code functions. For example, credentials such as bearer tokens that are required for authentication are generated from task crendentials by the service and the token is made available to the AI service asset. You can use this token to get connection assets, download data assets, and more.
Deploying AI services with Prompt Lab
You can use visual tools such as Prompt Lab to automatically generate AI services in a standard format. Then, you can modify the AI service for your use case. For example, if you are deploying an asset that uses Retrieval Augmented Generation (RAG), you can use the Prompt Lab to capture the logic for retrieving answers from the vector index in the AI service and deploying the AI service.
For more information, see Deploying AI services with Prompt Lab.
Deploying AI services with direct coding
When you build your generative AI applications from the ground up, you can use an AI service to capture the programming logic of your application, which can be deployed with an endpoint for inferencing. For example, if you build a RAG application with frameworks such as LangChain, LlamaIndex, or more, you can use an AI service to capture the logic for retrieving answers from the vector index in the AI service and deploying the AI service.
For more information, see Deploying AI services with direct coding.
Learn more
Parent topic: Deploying foundation model assets