Configuring model monitors

Last updated: Sep 09, 2022

Set up monitors to evaluate each model deployment that you are tracking with Watson OpenScale.

Providing model details

Provide information about your model to enable Watson OpenScale to access your database and understand how the model is set up.

To configure monitors, provide the following information in the Model details section on the model configuration page in Watson OpenScale:

Model input

Select the type of data that the deployment analyzes and the type of algorithm that you use to build your model.

Training data

If you do not use a notebook to provide a training data summary, you must enter the location of the training data. Specify the exact location in the Db2 database or Cloud Object Storage where the training data is located. After you select the storage type (either Db2 or Cloud Object Storage), you must complete the following information:

For a Db2 database, enter the following information:
- Host name or IP address, excluding the initial https:// prefix and the final forward slash (/)
- Port
- Database (name)
- Username
- Password
For Cloud Object Storage, enter the following information:
- Login URL: The Login URL must match the region setting of the bucket where your training data is located. You will specify the training data bucket in the next step.
- Resource instance (ID). To find the Resource instance ID, go to https://cloud.ibm.com/resources, expand the Storage resource, click the Cloud Object Storage service, and then click Service credentials. Expand the desired Key name. Copy the value of resource_instance_id without the quotation marks
- API key. To find the API key, go to https://cloud.ibm.com/resources, expand the Storage resource, click the Cloud Object Storage service, and then click Service credentials**. Expand the desired Key name. Copy the value of apikey without the quotation marks

To ensure a valid connection, click Connect to connect to the training data. After you successfully connect, you must make additional selections and save your work:

For a Db2 database, select both a schema and a training table that includes columns that are expected by your model.
For Cloud Object Storage, select a Bucket and a Data Set.

Model transaction

Watson OpenScale checks the payload logging automatically for models that you create with IBM Watson Machine Learning. For external machine learning providers, you must send a sample payload either by pasting a JSON file into the JSON payload box or by using the API to send a request.

Training data label

You must select a single unique feature from the data to serve as the label (prediction) column. This is what the model was designed to predict.

Training features

Select all the features that were used to train the model before it was deployed.

Model output details

Select model output details and save your work. Specifically, you must select a prediction column and a prediction probability column. Watson OpenScale might detect these values for you.

Numeric/categorical data

For numeric or categorical data, you must provide information about the training data for your model to configure the monitors.

Manually configure monitors - Requires you to provide connection information to your training data.

The format of the training data must match the model. For example, if the model expects M and F for the feature Gender, then the training data should have M and F, not Male and Female. Similarly, if a feature column is identified as numeric, this column should be numeric in your model during model training. If a feature column is identified as numeric and you use this column as categorical in your model during model training, then you must update the column to be categorical.

Watson OpenScale supports either Db2 or Cloud Object Storage locations:
- Specify the Location (either Db2 or Cloud Object Storage), then:
  - For a Db2 database, enter the following information:
    - Host name or IP address
    - Port
    - Database (name)
    - Username
    - Password
  - For Cloud Object Storage, enter the following information:
    - Login URL: The Login URL must match the region setting of the bucket where your training data is located. You will specify the training data bucket in the next step.
    - Resource instance (ID)
    - API key
- To ensure a valid connection, click Test to connect to the training data.
- Specify the exact location in the Db2 database or Cloud Object Storage where the training data is located.
  - For a Db2 database, select both a schema and a training table that includes columns that are expected by your model.
  - For Cloud Object Storage, select a Bucket and a Data Set.
Upload a training data distribution file - Choose this option if you prefer to keep your training data private. You can use a custom Python notebook to provide Watson OpenScale with information to analyze your training data without providing access to the training data itself.

Running the Python notebook lets you capture distinct values in the schema columns, as well as the column names. In addition, you can use the notebook to pre-configure the Fairness monitor.

Download the custom notebook, and replace any credentials with your own credentials.
Review the notebook carefully, specifying data for your model where appropriate. Save the notebook.
Run the notebook to generate a JSON-formatted configuration file.
Upload the JSON configuration file.

Watson OpenScale locates your training data from the metadata that is stored with the model in IBM Watson Machine Learning.
Select the columns used to train the model - these are the features that your model deployment expects in a request. Do not select the label (prediction) column.
You can choose either a string column or a numeric column as the prediction column.

Images and unstructured text

Images

For models that accept images as input, the image needs to be represented as a (height) x (width) x (# channels) format, where each point represents either monochrome or RGB values for each pixel.
Unstructured text

For models that accept text as input, note that the model accepts the entire text, and not a vectorized representation of the text.

Fairness and drift metrics are not supported for unstructured (image or text) data types.