Before you start planning your generative AI solution, you must make some key strategic decisions. Your strategy must account for your organization's requirements and priorities, the skills and preferences of your development team, your data requirements, and the solution requirements.
To set the strategy for implementing your generative AI solution, make the following decisions:
- What is your use case
- Who to involve
- How to manage risk and compliance
- How to develop your solution
- How to choose a model
- How to prepare data
- How to evaluate quality and mitigate risk
- How to optimize and manage foundation models
- How to deliver and maintain your solution
What is your use case
Understand the goal of your solution, whether that goal is feasible and valuable, and how you can determine when your solution is ready:
- Gen AI task
- What do you need the model to do?
- Conversational or separate outputs
- Understand whether you need a conversational experience where users chat with a model that retains the context of previous interactions or whether each interaction with the model is separate. Some tasks, such as classification, summarization, or generation might not benefit from a conversation. Conversations cost more than separate outputs because conversations include the chat history with every input.
- Feasibility
- Understand the limits of foundation models so that you can evaluate whether your use case is feasible. For example, you can check model benchmark scores for the type of use case that you want to implement.
- Business value
- Consider whether the benefits of the solution are greater than the cost of running the solution.
- Success criteria
- Decide how to measure whether the solution is successful. For example, you can rely on evaluation metrics or user feedback from target users.
Who to involve
If you involve the appropriate stakeholders from the beginning, you have less risk of needing to change direction or repeating parts of the process. At a minimum, involve stakeholders from these teams in the planning process:
- People who define your organization's priorities and processes
- You need these people to tell you about requirements and restrictions that you must abide by. You might need to document specific information about your solution, following a workflow to comply with a regulation, or select a model with a specific type of source. For example, your organization might require that you select an open source model.
- People who use the solution
- You need these people to define the requirements of the solution and to help test and validate that the solution works in their processes.
- People who create the solution
- You need these people involved in design and operational decisions. This team of collaborators might include designers, data engineers, AI engineers, data scientists, and risk and compliance managers. If you are implementing a retrieval-augmented generation solution on your documentation, consider including your content writers, who can adapt the content for AI.
How to manage risk and compliance
If you understand your risks and compliance needs before you start developing your solution, you can be better prepared for audits later.
- AI-related risks
- Understand key risk dimensions like reputational, regulatory, and operational risk. The risks for using generative AI include the same risks as for traditional machine learning models, risks that are amplified by generative AI, and new risks that are specific to generative AI. For example, the risk of generating output that contains factually inaccurate or untruthful content, which is referred to as hallucinating, is specific to generative AI.
- Legal and regulatory compliance
- Determine which laws and regulations you must comply with, the methods for tracking compliance, and the methods for ensuring compliance. For example, your organization might require a formal risk assessment or an approval workflow for AI solutions.
- Use case documentation
- Create an AI use case to gather all of the information for managing a model or prompt template from the request phase through development and into production. Documenting your use case provides a convenient way to track your progress whether or not your organization requires it for regulatory purposes.
More information about risk and compliance
How to develop your solution
You and your development team can choose between different tools and methods in the watsonx.ai user interface or to work entirely with code:
- Coding language
- If you want to write code, you can choose between REST APIs, Python, and Node.js code. Factors for choosing the language include the preference and skills of your developers, how you want to deploy the solution, and how much work your team wants to do in their interactive development environment (IDE) versus the watsonx.ai user interface.
- Level of automation
- You can choose how much of your solution code is generated for you:
-
- All code: You can write code with REST APIs in your IDE. You can write and run code with Python libraries with the notebook editor.
-
- Some code: You can generate Python notebooks with many tools and then adapt the code as needed. For example, you can generate a notebook based on a prompt template or generate multiple notebooks for embedding and vectorizing documents.
-
- No code: You can complete all prompt engineering, model tuning, and document embedding and vectorizing tasks with tools.
- Functionality
- Most of the functionality that you can find in the watsonx.ai user interface is available with APIs. However, capabilities differ between tools.
More information about working methods
How to choose a model
You don't need to choose a model before you make a plan for your solution. However, if you understand which criteria are most important to you and your organization, you can reduce the risk of selecting an inappropriate model.
- Task
- The task that you want the model to do can be a limiting factor for choosing a model. For many model tasks, you can choose between many models. However, for other model tasks, such as translation or responding in a language other than English, you have fewer choices.
- Cost
- The cost of inferencing varies among models. If keeping inferencing costs low is a priority for you, choose a cheaper model, a smaller model, a quantized model, or a model that you can tune.
- Environmental impact
- In general, larger models have a larger environmental impact during both training and inferencing. Smaller models and quantized models have a smaller environment impact.
- Accuracy and other scores
- You can compare model benchmarks and choose the model that has high scores in the areas that are most important to you.
- Indemnity and model origin
- Your organization might have policies about choosing models that are transparent about their training data, are open source, or that offer indemnity.
- Customization
- You can customize a model for a specific domain by tuning it. You can choose to tune some models that are provided with watsonx.ai in the Tuning Studio. Alternatively, you can tune a model in an external tool and import your custom model into watsonx.ai.
- You can add knowledge or skills to IBM Granite models with InstructLab.
More information about choosing a foundation model
How to prepare data
Foundation models are trained on large amounts of data, but not on your internal company data. If you need a foundation model to know your company data, you must decide how to provide your data to the model.
- Grounding documents
- If you need a solution that answers questions by grounding the model with the information in your documents, you can set up a retrieval-augmented generation (RAG) pattern. In a RAG pattern, you vectorize your documents for efficient retrieval of passages that answer user questions.
- Tuning and testing data
- If you need to improve or tailor the output for natural language processing tasks such as classification, summarization, and generation, you can tune the model. If you want to test the quality of your prompt, you can evaluate it with generative AI metrics. For both tasks, you must provide a set of validated prompt input and output examples. If your data contains any sensitive information, such as personally identifiable information (PII), make sure that you know your organization's policy about PII. For example, you might need to mask PII or generate synthetic data for tuning or testing your model.
- Knowledge or skills
- Provide data sets that inform the model. You can use IntructLab to augment an existing foundation model with the capabilities needed for your use case. You provide seed examples or grounding data that is the basis for generating synthetic data for instruction tuning the foundation model.
More information about data preparation
How to evaluate quality and mitigate risk
You must decide how to measure quality and ensure safety.
- Evaluation
- You can evaluate solution performance and risks against industry-standard metrics. You can measure the textual accuracy, similarity, and quality of foundation model output. You can also evaluate fairness, performance, and drift of model output. These metrics help you to ensure that AI solutions are free from bias, can be easily explained and understood by business users, and are auditable in business transactions.
- Guardrails
- You can enable guardrails to remove potentially harmful content or PII content from both input and output text in prompts.
- Testing
- Consider setting up a red team to emulate adversarial attacks.
More information evaluating quality and mitigating risk
How to optimize and manage foundation models
You can optimize a foundation model for accuracy, cost, inferencing latency, and control of the model lifecycle.
- Default optimization
- IBM provides a set of foundation models that are deployed on multi-tenant hardware. You pay for inferencing per token. IBM controls the model lifecycle by updating and deprecating models. When a model is deprecated, you must update your solution to inference the new version of the model or a different model.
- Optimize for accuracy and cost
- If you need to improve the accuracy of your prompt and reduce costs by inferencing a smaller foundation model, you can prompt tune a provided foundation model. You deploy a prompt-tuned model on multitenant hardware and pay for inferencing per token.
- Optimize for accuracy and control
- If you trained or tuned a model externally to watsonx.ai for your use case, you can import and deploy a custom model. You deploy the model on dedicated hardware. You pay per hour for hosting the model instead of for inferencing. You control the model lifecycle.
- Optimize for latency and control
- If your solution must support a high number of concurrent users, you can deploy a deploy-on-demand model that is provided by IBM on dedicated hardware. Dedicated hardware provides lower latency than multi-tenant hardware. You pay per hour for hosting the model instead of for inferencing. You control the model lifecycle.
More information about optimizing and managing foundation models
How to deliver and maintain your solution
You must decide how to deliver your gen AI solution and help ensure its continued quality.
- Deploying your solution
- For inferencing a foundation model, you deploy an endpoint that you call in your application to inference the model. Depending on the architecture of your gen AI solution, the endpoint might be in a code snippet, a Python function, an AI service, or code that your team developed.
- Managing ModelOps with deployment spaces
- You can support a ModelOps flow by creating separate deployment spaces for testing, staging, and production versions of your solution. You can manage access to your production solution by adding the appropriate collaborators to each space.
- Monitoring
- Similar to evaluating your solution during development, you can monitor solution performance and risks, such as fairness, quality, and explainability. You can view trends over time and set thresholds to alert you when performance dips.
- User feedback
- Consider implementing a user feedback mechanism and creating a process for gathering that feedback and improving your solution with it. For example, if you implement a RAG pattern, you can add a feedback mechanism for users to evaluate the answers to their questions. You can set up a process to evaluate incorrect and inadequate answers and either adapt the RAG pattern or adapt your content to provide better answers.
More information about delivery and maintenance
Learn more
Parent topic: Planning a generative AI solution