Fairness for a group
The Fairness for a group metric gives the model's propensity to deliver favorable outcomes to one group over another. A group can be any attribute, such as age, sex, or race.
Fairness for a group at a glance
- Description: The propensity of the model to deliver favorable outcomes to one group over another.
- Default thresholds: Lower limit = 80%
- Default recommendation: Debiased scoring endpoint that you can use in your business application for receiving debiased responses from your deployed model.
- Problem type: All
- Data type: Structured
- Chart values: Last value in the timeframe
- Metrics details available: Yes
Interpreting fairness
On the Insights dashboard, you can view the results of the model evaluations that you enable when you configure the fairness monitor.
When you click a model deployment tile, the Fairness section displays a summary of the metrics that describe the outcomes of the evaluation. To see more details about the outcomes, you can click the fairness score metric.
The Evaluations page displays a chart that provides metrics from the results of your model evaluation during specific time periods. For more information, see Viewing data for a deployment.
On the Evaluations page, the fairness monitor provides details about the fairness score. The fairness score can be calculated with the disparate impact formula.
The fairness score pane displays violations for a group that has a lower score than the threshold that you set. The Monitored groups pane displays the fairness score for a monitored group in comparison to the average score for the other groups.
You can click a data point on the chart to view more details about how the fairness score was calculated. For each monitored group, you can view the calculations for the following types of data sets:
- Balanced: This balanced calculation includes the scoring request that is received for the selected hour. The calculation also includes more records from previous hours if the minimum number of records that are required for evaluation was not met. Includes more perturbed and synthesized records that are used to test the model's response when the value of the monitored feature changes.
- Payload: The actual scoring requests that are received by the model for the selected hour.
- Training: The training data records that are used to train the model.
- Debiased: The output of the debiasing algorithm after processing the runtime and perturbed data.
Parent topic: Fairness metrics overview