Feature drift in Watson OpenScale drift v2 metrics
Watson OpenScale calculates feature drift by measuring the change in value distribution for important features.
How it works
Watson OpenScale calculates drift for categorical and numeric features by measuring the probability distribution of continuous and discrete values. To identify discrete values for numeric features, Watson OpenScale uses a binary logarithm to compare the number of distinct values of each feature to the total number of values of each feature.
Watson OpenScale uses the following binary logarithm formula to identify discrete numeric features:
If the
is less than the binary logarithm of the distinct_values_count
, the feature is identified as discrete.total_count
Do the math
Watson OpenScale uses the following formulas to calculate feature drift:
Jensen Shannon distance
Jensen Shannon Distance is the normalized form of Kullback-Liebler (KL) Divergence that measures how much one probability distribution differs from the second probabillity distribution. Jensen Shannon Distance is a symmetrical score and always has a finite value.
Watson OpenScale uses the following formula to calculate the Jensen Shannon distance for two probability distributions, baseline (B) and production (P):
is the KL Divergence.
Total variation distance
Total variation distance measures the maximum difference between the probabilities that two probability distributions, baseline (B) and production (P), assign to the same transaction as shown in the following formula:
If the two distributions are equal, the total variation distance between them becomes 0.
Watson OpenScale uses the following formula to calculate total variation distance:
-
𝑥 is a series of equidistaant samples that span the domain of
that range from the combined miniumum of the baseline and production data to the combined maximum of the baseline and production data.
-
is the difference between two consecutive 𝑥 samples.
-
is the value of the density function for production data at a 𝑥 sample.
-
is the value of the density function for baseline data for at a 𝑥 sample.
-
The
denominator represents the total area under the density function plots for production and baseline data. These summations are an approximation of the integrations over the domain space and both these terms should be 1 and total should be 2.
Overlap coefficient
Watson OpenScale calculates the overlap coefficient by measuring the total area of the intersection between two probability distributions. To measure dissimilarity between distributions, the intersection or the overlap area is subtracted from 1 to calculate the amount of drift. Watson OpenScale uses the following formula to calculate the overlap coefficient:
-
𝑥 is a series of equidistant samples that span the domain of
that range from the combined miniumum of the baseline and production data to the combined maximum of the baseline and production data.
-
is the difference between two consecutive 𝑥 samples.
-
is the value of the density function for production data at a 𝑥 sample.
-
is the value of the density function for baseline data for at a 𝑥 sample.
Learn more
Parent topic: Drift v2 metrics