Python client samples for model evaluations

Last updated: Nov 21, 2024

Review and use sample Jupyter Notebooks that use the Python client library for model evaluations to demonstrate features and tasks.

When you use a sample notebook to demonstrate features and tasks with the Python client, you must be comfortable with coding in a Jupyter Notebook. A Jupyter Notebook is a web-based environment for interactive computing. You can run small pieces of code that process your data, and then immediately view the results of your computation. With sample Jupyter Notebooks, you can complete tutorials to demonstrate tasks such as building, training, and deploying models and configuring model evaluations.

Sample notebooks

View or run the following Jupyter notebooks to learn how to complete different tasks:

Sample name	Tasks demonstrated
Using IBM watsonx.governance metrics toolkit to evaluate the quality of your prompt template	Calculate content analysis and question robustness metrics for prompt template evaluations.
Retrieval and answer quality metrics computation using LLM as Judge in IBM watsonx.governance for RAG task	Calculate RAG and answer quality metrics to generate responses for RAG tasks.
Computing adversarial robustness and prompt leakage risk using IBM watsonx.governance	Calculate the Adversarial robustness metric to measure how your model defends against attacks such as prompt injections, jailbreaks, and system prompt leakage.
Embeddings generation for LLMs	Use CSV files of scored data to generate embeddings for the input and output columns and download the CSV file with the model output that contains embeddings.
Embeddings generation and persistence for LLMs	Generate embeddings for existing records in the payload table, provide new scored data frames to generate and store records with embeddings in the payload table, or configure and evaluate drift v2 evaluations.
Design time notebook for Multi Lingual support of Generative AI Quality metrics for IBM WatsonX.governance	Demonstrate the generative AI quality prompt template evaluation results in Japanese.

Next steps

To learn more about using notebook editors, see Notebooks.
To learn more about working with notebooks, see Coding and running notebooks.
To learn more about authenticating in a notebook, see Authenticating.