Learn the terms and concepts that are used for evaluating machine learning models.
Acceptable fairness The percentage of favorable outcomes that a monitored group must receive to meet the fairness threshold. It is calculated by multiplying perfect equality by the fairness threshold.
Alert A notification that a performance metric is outside of the acceptable range specified by configured monitors.
Balanced data set A data set that includes the scoring requests received by the model for the selected hour and the perturbed records.
Baseline data Previous data that is collected before intervention or modification. This data serves as the foundation to which future data collected is compared to.
Batch deployment Processes the input data from a file, data connection, or connected data in a storage bucket, and writes the output to a selected destination. A method to deploy models that processes input data from a file and writes the output to a file.
Batch processing If you need to monitor deployments involving huge payload/feedback data, then batch processing is suggested.
Bias When a machine learning model produces a result for a monitored person, group, or thing that is considered to be unfair when compared to a reference result. Can be caused by a problem with the training data for a model. The Fairness monitor
can detect bias that falls under a threshold you set. Related term: Debiasing.
Cloud Object Storage A service offered by IBM for storing and accessing data. If Cloud Object Storage is the repository for machine learning assets, the associated service credentials must be used to connect to the assets for model evaluations.
See also: Resource ID, API key.
Confidence score The probability that a machine learning model's prediction is correct. A higher score indicates a higher probability that the predicted outcome matches the actual outcome.
Contrastive explanation Explanations that indicate the minimal set of feature column value changes to change the model prediction. This is computed for a single data point.
Data mart Workspace where all the metadata for model evaluations gets saved. Behind the scenes, it is connected to a database persistence layer where metadata gets saved.
Debiased transactions The transactions for which debiased outcome is generated.
Debiasing When the Fairness monitor detects bias. When a monitored group receives biased outcomes, take steps to mitigate the bias automatically or manually.
Deployment You deploy a model to make an endpoint available so you can input new data (the request) to the model and get a score, or response. A model deployment can be in a pre-production environment for testing, or a production environment for actual
usage.
Drift When model accuracy declines over time. Can be caused by a change in model input data that leads to model performance deterioration. To monitor for draft, alerts can be created for when the model accuracy drops below a specified acceptable
threshold.
Evaluation The process of using metrics to assess a machine learning model and measure how well the model performs (in areas such as fairness and accuracy). Monitors can assess a model for areas important to goals.
Explanation An insight into the evaluation of a particular measurement of a model. An explanation helps you understand model evaluation results and also experiment with what-if scenarios to help address issues.
Fairness Determine whether a model produces biased outcomes that favor a monitored group over a reference group. The fairness evaluation checks when the model shows a tendency to provide a favorable/preferable outcome more often for one group over
another. Typical categories to monitor are age, sex, and race.
Features List of dataset column names (feature columns) used to train a machine learning model.
Example: In a model that predicts whether a person qualifies for a loan, the features for employment status and credit history might be given greater weight than zip code.
Feedback data Labeled data that matches the schema and structure of the data used to train a machine learning model (including the target) but that was not used for training. This data is already known or actual data used by the Quality monitor to measure
the accuracy of a deployed model. Determines whether predictions are accurate when measured against the known outcome.
Global explanation Explains model's prediction on a sample of data.
Headless subscription A subscription that has a realtime deployment behind the scenes. Through headless subscription, user can monitor the deployment by using the data (Payload/Feedback) being supplied to the deployment without supplying any scoring URL.
Labeled data Data that is labeled in a uniform manner for the machine learning algorithms to recognize during model training.
Example: A table of data with labeled columns is typical for supervised machine learning. Images can also be labeled for use in a machine learning problem.
Local explanation Explains a model's prediction by using specific, individual examples.
Meta-fields Specialized data that is unique between products.
Monitor Track performance results for different model evaluations
Example: Fairness, drift, quality, explainability.
Monitored group When evaluating fairness, the monitored group represents the values that are most at risk for biased outcomes.
Example: In the sex feature, Female and Nonbinary can be set as monitored groups.
Online deployment Method of accessing a deployment through an API endpoint that provides a real-time score or solution on new data.
Payload data Any real-time data supplied to a model. Consists of requests to a model (input) and responses from a model (output).
Payload logging Persisting payload data.
Perfect equality The percentage of favorable outcomes delivered to all reference groups. For the balanced and debiased data sets, the calculation includes monitored group transactions that were altered to become reference group transactions.
Perturbations Data points that are simulated around real data points during the computation of different metrics that are associated with monitors—such as fairness, explainability.
Pre-production space An environment that is used to readily test the data for model validations.
Prediction column The variable that a supervised machine learning model (trained with labeled data) predicts when presented with new data.
See also: Target.
Probability The confidence with which a model predicts the output. Applicable for classification models.
Production space A deployment space used for operationalizing machine learning models. Deployments from a production space are evaluated for comparison of actual performance against specified metrics.
Quality A monitor that evaluates how well a model predicts accurate outcomes based on the evaluation of feedback data. It uses a set of standard data science metrics to evaluate how well the model predicts outcomes that match the actual outcomes
in the labeled data set.
Records Transactions on which monitors are evaluated.
Reference group When evaluating fairness, the reference group represents the values that are least at risk for biased outcomes.
Example: For the Age feature, you can set 30-55 as the reference group and compare results for other cohorts to that group.
Relative weight The relative weight that a feature has on predicting the target variable. A higher weight indicates more importance. Knowing the relative weight helps explain the model results.
Resource ID The unique identifier for a resource stored in Cloud Object Storage. To obtain:
Find and expand the resource (such as a storage service)
Copy the value for Resource ID without the quotation marks
Response time The time taken to process a scoring request by the model deployment
Runtime data Data obtained from running a model's lifecycle.
Scoring endpoint The HTTPS endpoint that users can call to receive the scoring output of a deployed model.
Scoring request The input to a deployment.
See also: Payload.
Scoring In a model inference, the action of sending request to model and getting a response.
Self-managed Model transactions stored in your own data warehouse and evaluated by your own Spark analytics engine.
Service credentials The access IDs required to connect to IBM Cloud resources.
Service Provider A machine learning providers (typically a model engine: WML, AWS, Azure, Custom) which hosts the deployments.
Subscription A deployment getting monitored. There is a 1-1 mapping between deployment and subscription.
System-managed Model transactions stored in a database and evaluated using computing resources.
Target The feature or column of a data set that the trained model predicts. The model is trained by using pre-existing data to learn patterns and discover relationships between the features of the data set and the target.
See also: Prediction column.
Threshold When monitors are configured to evaluate a machine learning model. A benchmark for an acceptable range of outcomes is established. When the outcome falls under the configured threshold, an alert is triggered assess and remedy the situation.
Training data Data used to teach and train a model's learning algorithm.
Transactions The records for machine learning model evaluations that are stored in the payload logging table.
Unlabeled data Data that is not associated with labels that identify characteristics, classifications, and properties. Unstructured data that is not labeled in a uniform manner.
Example: Email or unlabeled images are typical of unlabeled data. Unlabeled data can be used in unsupervised machine learning.
User ID The id of the user associated with the scoring request
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.