Planning the implementation workflow for your solution

Last updated: Feb 12, 2025

When you have a strategy for your generative AI solution, you can plan a workflow that contains the tasks that you need to complete.

The following table lists the high-level tasks that you can include in your plan and whether each task is required, recommended, optional, or sometimes required depending on your needs. Some tasks are required in only some situations, but are recommended for all situations.

Summary of workflow tasks
Task	Required?
Define an AI use case	Sometimes, Recommended
Develop governance workflows	Sometimes
Set up a project	Required
Prepare data	Sometimes
Experiment with gen AI	Required
Evaluate your gen AI assets	Sometimes, Recommended
Optimize a foundation model	Sometimes
Deploy your solution	Required
Monitor and maintain your solution	Sometimes, Recommended

Defining an AI use case

An AI use case consists of a set of factsheets that contain lineage, history, and other relevant information about the lifecycle of an AI asset such as a model or prompt template.

Your organization might require that you track and document your AI solution for transparency or regulatory compliance. However, AI use cases are useful even when they are not required because they provide an integrated way to track progress, decisions, and metrics about your solution.

To create an AI use case, first create an inventory, and then create the use case. Add data scientists, data engineers, and other users who are involved in the creation, testing, or governing of your solution as use case collaborators.

Learn about defining an AI use case

Developing governance workflows

A governance workflow enforces a review and approval process for AI use cases and model use.

Your organization might require one or more of the following types of governance workflows:

Model risk governance workflows to approve AI use cases, approve foundation model lifecycle events, run risk assessments, or automate performance monitoring of models.
Regulatory compliance management workflows to process alerts published by regulatory agencies.
Operational risk management workflows to track model risk along with other operational risks across the enterprise.

To set up a governance workflow, you configure it in the Governance console.

Learn about developing governance workflows

Setting up a project

A project is a collaborative workspace where people work with models and data to fulfill a shared goal.

You need a project to build prompts, run experiments, and tune models.

A project for a gen AI solution typically contains the following types of items that are either explicitly added by AI engineers or are created as the result of a process:

Connection assets to data sources, such as a vector store or where you store training or tuning data.
Data assets that represent data sets for training or tuning models.
Prompt session assets that you save for future reference.
Prompt template assets that provide endpoints for inferencing.
Notebooks that you create or that are generated by processes such as saving a prompt as a notebook or running an AutoAI experiment.
Vector indexes that represent vectorized documents for a RAG pattern.
Experiment and flow assets that you create from running tools, such as AutoAI, Tuning Studio, or Synthetic Data Generator.
AI service assets that provide endpoints for gen AI patterns, such as RAG.
Jobs that are created by running assets in tools.

You have an automatically created sandbox project. However, you might want to create a project with a name that reflects your goal. You can create a project from the home page or from the navigation menu. Add everyone who you want to work on the solution. You assign a role to each collaborator to control their permissions in the project.

Learn more about creating projects

Preparing your data

Data preparation involves providing access to the required data in the format and at the quality level that your solution needs.

You need to prepare data if you plan to ground the model with your documents in a RAG pattern, to evaluate a prompt template, or to tune a foundation model. You might not need to prepare data if your use case is to translate, summarize, classify, or generate text, unless you want to run evalutions.

For a RAG pattern, you transform documents into embedded vectors for efficient retrieval. You store vectorized documents in a vector store and retrieve them with a vector index. You can include documents in a RAG pattern in the following ways:

Upload document files from your local system and add them to a vector store
Specify documents that are in an existing vector store
Add documents from a connected data source into a vector store

You can choose from different methods of creating a RAG pattern, depending on the total size of your documents, the level of automation for experimenting, and other factors.

For evaluating a prompt template or tuning a foundation model, you provide a data set that contains representative input to the model and the appropriate output for the model to generate in response. You can provide tuning data in the following ways:

Upload a file from your local system
Connect to the data source that contains the data set

Learn more about preparing data

Experimenting with gen AI

A prompt is how you instruct a foundation model to generate a response. You save a prompt as a prompt template. When you add documents, images, or agents to a prompt, you save it as an AI service.

You can experiment with prompts by altering conditions in the following ways:

Switching between chat and non-chat modes
Altering your prompt text or the system prompt
Changing the foundation model
Adjusting the model parameters
Enabling and disabling guardrails
Adding images or documents to a chat
Adding variables to dynamically change prompt text
Configuring agents to call tools

To develop your prompts, you can experiment in the Prompt Lab or with REST APIs, Python, or Node.js code. To automate finding an optimal RAG pattern, run an AutoAI for RAG experiment. To develop an AI agent that calls tools, you can write code with REST APIs, Python, or Node.js.

Learn more about experimenting with gen AI

Evaluating your gen AI assets

An evaluation of a gen AI asset, such as a prompt or AI service, tests the quality of the model output for the set of metrics that you choose. Some metrics are based on comparing model output against the appropriate output that you provide in the testing data set. How efficiently your model generates responses is also evaluated.

Your organization might require evaluations for regulatory compliance or internal policies. However, evaluations are useful even when they are not required because the metric scores can indicate the quality of your solution and might predict decreased user satisfaction when scores drop.

When you evaluate an AI asset, you can configure the following factors:

The sample size to test
Which metrics to include
The threshold value for each metric

General metrics provide the following types of information about the AI asset:

How well the model performs compared to a labeled test set.
Whether the model produces biased outcomes.
Changes in accuracy of model output over time.
How efficiently the model processes transactions.

Metrics that are specific to generative AI provide the following types of information about the AI asset:

How similar the output text is to the input text.
How similar the output text is to reference outputs.
Whether the input or output text contains harmful or sensitive information.
How well the AI asset maintains performance against adversarial attacks.

You view the current results and the results over time. The results of each evaluation are added to the use case for the prompt.

To evaluate a prompt, run an evaluation in the Prompt Lab, with code, or from a deployed prompt template. If you run an AutoAI experiment for a RAG pattern, the candidate AI services are evaluated and ranked automatically.

To evaluate and compare multiple prompt templates simultaneously, run an evaluation experiment in Evaluation Studio.

Learn more about evaluating AI assets

Optimizing a foundation model

Optimizing a foundation model improves one or more performance indicators of the model.

You can optimize the foundation model in your solution for accuracy, cost, inferencing throughput, or control of the model lifecycle.

The methods to deploy a foundation model vary based on the following characteristics:

The billing method is per token inferenced or per hour hosted
The hosting environment is on multitenant or dedicated hardware
The deployment mechanism is by IBM or by you
The model is tuned or not tuned
The deprecation policy is controlled by IBM or by you

To run a model on multi-tenant hardware, pay per token, and have IBM control the model lifecycle, select a model that IBM provides and deploys.

To tune a foundation model, select a tuning method, add tuning data, and run the job in the Tuning Studio or with code.Then, deploy the tuned model on multi-tenant hardware, pay per token, and control the model lifecycle.

To run a model on dedicated hardware, pay per hour, and control the model lifecycle, you can either import and deploy a custom model, or deploy a deploy on demand model.

Learn more about optimizing foundation models

Deploying your solution

Deploying an asset makes it available for testing or for productive use by an endpoint. After you create deployments, you can test and manage them, and prepare your assets to deploy into pre-production and production environments.

You create deployments in deployment spaces, which are separate workspaces from projects and to which you can add a different set of collaborators.

To deploy most types of gen AI assets, you promote the asset to a deployment space and create a deployment that contains an endpoint. You can then call the endpoint from your application to inference the foundation model. For prompt templates that do not contain variables, you can copy the endpoint code directly from Prompt Lab. You can create separate deployment spaces for testing, staging, and production deployments to support your ModelOps workflow.

Learn more about deploying AI assets

Monitoring and maintaining your solution

After you embed your solution into your application and put it into production, you must maintain your solution. You can also monitor model performance. Maintaining your solution can include updating or replacing the foundation model with newer versions or optimizing the model based on evaluations or user feedback. Monitoring your solution evaluates the performance of the model in your production environment.

Your organization might require you to monitor your solution and ensure that performance does not fall below specified thresholds.

To monitor your solution, open the deployment of your solution in the deployment space and activate evaluations. You can use the payload logging endpoint to send scoring requests for fairness and drift evaluations and use the feedback logging endpoint to provide feedback data for quality evaluations.

If the foundation model for your solution is provided and deployed by IBM, IBM might replace that model with a newer version. When IBM deprecates the model for your solution, you must update your solution to change the model before the model is removed. If you deployed the foundation model for your solution, you might want to periodically update your model to improve performance.

Learn more about monitoring and maintaining your solution

Parent topic: Planning a generative AI solution