0 / 0
Python client samples for model evaluations
Last updated: Nov 21, 2024
Python client samples for model evaluations

Review and use sample Jupyter Notebooks that use the Python client library for model evaluations to demonstrate features and tasks.

When you use a sample notebook to demonstrate features and tasks with the Python client, you must be comfortable with coding in a Jupyter Notebook. A Jupyter Notebook is a web-based environment for interactive computing. You can run small pieces of code that process your data, and then immediately view the results of your computation. With sample Jupyter Notebooks, you can complete tutorials to demonstrate tasks such as building, training, and deploying models and configuring model evaluations.

Sample notebooks

View or run the following Jupyter notebooks to learn how to complete different tasks:

Sample name Tasks demonstrated
Using IBM watsonx.governance metrics toolkit to evaluate the quality of your prompt template Calculate content analysis and question robustness metrics for prompt template evaluations.
Retrieval and answer quality metrics computation using LLM as Judge in IBM watsonx.governance for RAG task Calculate RAG and answer quality metrics to generate responses for RAG tasks.
Computing adversarial robustness and prompt leakage risk using IBM watsonx.governance Calculate the Adversarial robustness metric to measure how your model defends against attacks such as prompt injections, jailbreaks, and system prompt leakage.
Embeddings generation for LLMs Use CSV files of scored data to generate embeddings for the input and output columns and download the CSV file with the model output that contains embeddings.
Embeddings generation and persistence for LLMs Generate embeddings for existing records in the payload table, provide new scored data frames to generate and store records with embeddings in the payload table, or configure and evaluate drift v2 evaluations.
Design time notebook for Multi Lingual support of Generative AI Quality metrics for IBM WatsonX.governance Demonstrate the generative AI quality prompt template evaluation results in Japanese.

Next steps

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more