When you have a strategy for your generative AI solution, you can plan a workflow that contains the tasks that you need to complete.
The following table lists the high-level tasks that you can include in your plan and whether each task is required, recommended, optional, or sometimes required depending on your needs. Some tasks are required in only some situations, but are recommended for all situations.
Task | Required? |
---|---|
Define an AI use case | Sometimes, Recommended |
Develop governance workflows | Sometimes |
Set up a project | Required |
Prepare data | Sometimes |
Experiment with prompts | Required |
Evaluate your prompts | Sometimes, Recommended |
Optimize a foundation model | Sometimes |
Deploy your solution | Required |
Monitor and maintain your solution | Sometimes, Recommended |
Defining an AI use case
An AI use case consists of a set of factsheets that contain lineage, history, and other relevant information about the lifecycle of an AI asset such as a model or prompt template.
Your organization might require that you track and document your AI solution for transparency or regulatory compliance. However, AI use cases are useful even when they are not required because they provide an integrated way to track progress, decisions, and metrics about your solution.
To create an AI use case, first create an inventory, and then create the use case. Add data scientists, data engineers, and other users who are involved in the creation, testing, or governing of your solution as use case collaborators.
Learn about defining an AI use case
Developing governance workflows
A governance workflow enforces a review and approval process for AI use cases and model use.
Your organization might require one or more of the following types of governance workflows:
- Model risk governance workflows to approve AI use cases, approve foundation model lifecycle events, run risk assessments, or automate performance monitoring of models.
- Regulatory compliance management workflows to process alerts published by regulatory agencies.
- Operational risk management workflows to track model risk along with other operational risks across the enterprise.
To set up a governance workflow, you configure it in the Governance console.
Learn about developing governance workflows
Setting up a project
A project is a collaborative workspace where people work with models and data to fulfill a shared goal.
You need a project to build prompts, run experiments, and tune models.
A project for a gen AI solution typically contains the following types of items that are either explicitly added by AI engineers or are created as the result of a process:
- Connection assets to data sources, such as a vector store or where you store training or tuning data.
- Data assets that represent data sets for training or tuning models.
- Prompt session assets that you save for future reference.
- Prompt template assets that provide endpoints for inferencing.
- Notebooks that you create or that are generated by processes such as saving a prompt as a notebook or running an AutoAI experiment.
- Vector indexes that represent vectorized documents for a RAG pattern.
- Experiment and flow assets that you create from running tools, such as AutoAI, Tuning Studio, or Synthetic Data Generator.
- AI service assets that provide endpoints for gen AI patterns, such as RAG.
- Jobs that are created by running assets in tools.
You have an automatically created sandbox project. However, you might want to create a project with a name that reflects your goal. You can create a project from the home page or from the navigation menu. Add everyone who you want to work on the solution. You assign a role to each collaborator to control their permissions in the project.
Learn more about creating projects
Preparing your data
Data preparation involves providing access to the required data in the format and at the quality level that your solution needs.
You need to prepare data if you plan to ground the model with your documents in a RAG pattern, to evaluate a prompt template, or to tune a foundation model. You might not need to prepare data if your use case is to translate, summarize, classify, or generate text, unless you want to run evalutions.
For a RAG pattern, you transform documents into embedded vectors for efficient retrieval. You store vectorized documents in a vector store and retrieve them with a vector index. You can include documents in a RAG pattern in the following ways:
- Upload document files from your local system and add them to a vector store
- Specify documents that are in an existing vector store
- Add documents from a connected data source into a vector store
You can choose from different methods of creating a RAG pattern, depending on the total size of your documents, the level of automation for experimenting, and other factors.
For evaluating a prompt template or tuning a foundation model, you provide a data set that contains representative input to the model and the appropriate output for the model to generate in response. You can provide tuning data in the following ways:
- Upload a file from your local system
- Connect to the data source that contains the data set
Learn more about preparing data
Experimenting with prompts
A prompt is how you instruct a foundation model to generate a response.
You can experiment with prompts by altering conditions in the following ways:
- Switching between chat and non-chat modes
- Altering your prompt text or the system prompt
- Changing the foundation model
- Adjusting the model parameters
- Enabling and disabling guardrails
- Adding images or documents to a chat
- Adding variables to dynamically change prompt text
- Configuring agents to call tools
To develop your prompts, you can experiment in the Prompt Lab or with REST API, Python, or Node.js code. To automate finding an optimal RAG pattern, run an AutoAI for RAG experiment.
Learn more about experimenting with prompts
Evaluating your prompts
An evaluation of a prompt tests the quality of the model output for the set of metrics that you choose. Some metrics are based on comparing model output against the appropriate output that you provide in the testing data set. How efficiently your model generates responses is also evaluated.
Your organization might require evaluations for regulatory compliance or internal policies. However, evaluations are useful even when they are not required because the metric scores can indicate the quality of your solution and might predict decreased user satisfaction when scores drop.
When you evaluate a prompt, you can configure the following factors:
- The sample size to test
- Which metrics to include
- The threshold value for each metric
Generative AI metrics provide the following types of information about the prompt:
- How similar the output text is to the input text
- How similar the output text is to reference outputs
- Whether the input or output text contains harmful or sensitive information
You view the current results and the results over time. The results of each evaluation are added to the use case for the prompt.
To evaluate a prompt, run an evaluation in the Prompt Lab, with code, or from a deployed prompt template. If you run an AutoAI experiment for a RAG pattern, the candidate prompts are evaluated and ranked automatically.
To evaluate and compare multiple prompt templates simultaneously, run an evaluation experiment in Evaluation Studio.
Learn more about evaluating prompts
Optimizing a foundation model
Optimizing a foundation model improves one or more performance indicators of the model.
You can optimize the foundation model in your solution for accuracy, cost, inferencing throughput, or control of the model lifecycle.
The methods to deploy a foundation model vary based on the following characteristics:
- The billing method is per token inferenced or per hour hosted
- The hosting environment is on multitenant or dedicated hardware
- The deployment mechanism is by IBM or by you
- The model is tuned or not tuned
- The deprecation policy is controlled by IBM or by you
To run a model on multi-tenant hardware, pay per token, and have IBM control the model lifecycle, select a model that IBM provides and deploys.
To tune a foundation model, select a tuning method, add tuning data, and run the job in the Tuning Studio or with code. Then, deploy the tuned model on multi-tenant hardware, pay per token, and control the model lifecycle.
To run a model on dedicated hardware, pay per hour, and control the model lifecycle, you can either import and deploy a custom model, or deploy a deploy on demand model.
Learn more about optimizing foundation models
Deploying your solution
Deploying an asset makes it available for testing or for productive use by an endpoint. After you create deployments, you can test and manage them, and prepare your assets to deploy into pre-production and production environments.
You create deployments in deployment spaces, which are separate workspaces from projects and to which you can add a different set of collaborators.
To deploy most types of gen AI assets, you promote the asset to a deployment space and create a deployment that contains an endpoint. You can then call the endpoint from your application to inference the foundation model. For prompt templates that do not contain variables, you can copy the endpoint code directly from Prompt Lab. You can create separate deployment spaces for testing, staging, and production deployments to support your ModelOps workflow.
Learn more about deploying AI assets
Monitoring and maintaining your solution
After you embed your solution into your application and put it into production, you must maintain your solution. You can also monitor model performance. Maintaining your solution can include updating or replacing the foundation model with newer versions or optimizing the model based on evaluations or user feedback. Monitoring your solution evaluates the performance of the model in your production environment.
Your organization might require you to monitor your solution and ensure that performance does not fall below specified thresholds.
To monitor your solution, open the deployment of your solution in the deployment space and activate evaluations. You can use the payload logging endpoint to send scoring requests for fairness and drift evaluations and use the feedback logging endpoint to provide feedback data for quality evaluations.
If the foundation model for your solution is provided and deployed by IBM, IBM might replace that model with a newer version. When IBM deprecates the model for your solution, you must update your solution to change the model before the model is removed. If you deployed the foundation model for your solution, you might want to periodically update your model to improve performance.
Learn more about monitoring and maintaining your solution
- Foundation model lifecycle
- Evaluating prompt templates in deployment spaces
- Sending model transactions
Parent topic: Planning a generative AI solution