This notebook demonstrates how to setup automated metrics that help you measure, monitor, and understand the behavior of your Watson Assistant system. As described in Watson Assistant Continuous Improvement Best Practices, this is the first step of your continuous improvement process. The goal of this step is to understand where your assistant is doing well vs where it isn’t and to potentially focus your improvement effort to one of the problem areas identified. We define two measures to achieve this goal: Coverage and Effectiveness.
Coverage is the portion of total user messages your assistant is attempting to respond to.
Effectiveness refers to how well your assistant is handling the conversations it is attempting to respond to.
The pre-requisite for running this notebook is Watson Assistant (formerly Watson Conversation). This notebook assumes familiarity with Watson Assistant and concepts such as skills, workspaces, intents and training examples.
Some familiarity with Python is recommended. This notebook runs on Python 3.7+ environment.
In this section, we install and import required libraries and functions and add project access token.
# Import and apply global CSS styles
from IPython.display import HTML
!curl -O https://raw.githubusercontent.com/watson-developer-cloud/assistant-improve-recommendations-notebook/master/src/main/css/custom.css
HTML(open('custom.css', 'r').read())
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 348 100 348 0 0 2949 0 --:--:-- --:--:-- --:--:-- 2949
!pip install --user --upgrade "assistant-improve-toolkit";
# Import required libraries
import pandas as pd
import json
import re
from pandas import json_normalize
from ibm_watson import AssistantV1, AssistantV2
from IPython.display import display
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_cloud_sdk_core.authenticators import BasicAuthenticator
# Import the visualization related functions
from assistant_improve_toolkit.visualize_func import make_pie, coverage_barh, width_bar, show_coverage_over_time
# Import Cloud Object Storage related functions
from assistant_improve_toolkit.cos_op import generate_link, generate_excel_measure, export_result_excel
# Import Watson Assistant related functions
from assistant_improve_toolkit.watson_assistant_func import get_logs, get_assistant_definition, load_logs_from_file
# Import Dataframe computation related functions
from assistant_improve_toolkit.computation_func import get_effective_df, get_coverage_df, chk_is_valid_node, format_data
This notebook uses Watson Assistant v1 API to access skill definition. To access message logs, the notebook uses both v1 and v2 APIs. You authenticate to the API by using IBM Cloud Identity and Access Management (IAM).
You can access the values you need for this configuration from the Watson Assistant user interface. Go to the Skills page and select View API Details from the menu of a skill title.
IAMAuthenticator
is your Api Key under Service Credentialsassistant.set_service_url
is the base URL of Watson Assistant. For example, for us-south, the endpoint is https://api.us-south.assistant.watson.cloud.ibm.com
. This value will be different depending on the location of your service instance. For more information, see Service Endpoint# Provide credentials to connect to assistant
authenticator = IAMAuthenticator('API_KEY')
# Initialize v1 SDK instance
sdk_v1_object = AssistantV1(version='2020-04-01', authenticator = authenticator)
sdk_v1_object.set_service_url('https://api.us-south.assistant.watson.cloud.ibm.com')
# Initialize v2 SDK instance
sdk_v2_object = AssistantV2(version='2020-09-24', authenticator = authenticator)
sdk_v2_object.set_service_url('https://api.us-south.assistant.watson.cloud.ibm.com')
Add the information of your assistant. To load the skill of an assistant in the next section, you need to provide either Workspace ID or Skill ID. To locate your assistant ID, open the assistant settings and click API Details. To location your workspace ID or skill ID, go to the Skills page and select View API Details from the menu of a skill tile. If you are using versioning in Watson Assistant, this ID represents the Development version of your skill definition.
For more information about authentication and finding credentials in the Watson Assistant UI, please see Watson Assistant v1 API and Watson Assistant v2 API in the offering documentation.
assistant_information = {'workspace_id' : '',
'skill_id' : '',
'assistant_id' : '',
'environment_id': ''}
Fetch assistant definition and load into a dataframe. Note that assistant definition will be saved into a cached file and loaded from the file. Set overwrite
to True to refresh the cached file.
df_assistant = get_assistant_definition(sdk_v2_object, assistant_information, overwrite=False)
if df_assistant is not None:
# Get all intents
assistant_intents = [intent['intent'] for intent in df_assistant['intents'].values[0]]
# Get all dialog nodes
assistant_nodes = pd.DataFrame(df_assistant['dialog_nodes'].values[0])
assistant_loaded = True
else:
assistant_loaded = False
Please provide a valid Workspace ID or Skill ID!
Fetch user generated logs. By default, the notebook extracts message logs using v1 APIs. Set version=2
to query message logs generated by v2 APIs. See Assistant v2 List logs for more information.
You can apply filters while fetching logs, e.g.,
meta.summary.input_text_length_i>0
response_timestamp>=2018-09-18
See more examples in Logs notebook.
Note that logs will be saved into a cached file and loaded from the file. Set overwrite
to True to refresh the cached file.
# Define output filename
filename = 'logs'
# Create file name
if assistant_information['workspace_id'] is not None and len(assistant_information['workspace_id']) > 0:
filename += '_workspace_' + assistant_information['workspace_id']
if assistant_information['assistant_id'] is not None and len(assistant_information['assistant_id']) > 0:
filename += '_assistant_' + assistant_information['assistant_id']
if assistant_information['skill_id'] is not None and len(assistant_information['skill_id']) > 0:
filename += '_skill_' + assistant_information['skill_id']
# Remove all special characters from file name
filename = re.sub(r'[^a-zA-Z0-9_\- .]', '', filename) + '.json'
# Filter to be applied while fetching logs
filters = ['language::en',
'meta.summary.input_text_length_i>0']
# Fetch the logs, set `overwrite` to True to reload logs, set version=2 to use v2 log apis
log_raw_data = get_logs(sdk_v1_object=sdk_v1_object,
sdk_v2_object=sdk_v2_object,
assistant_info=assistant_information,
num_logs=20000,
filename=filename,
filters=filters,
overwrite=False,
version=2)
df_logs = pd.DataFrame(log_raw_data)
if log_raw_data is not None:
# Mark that logs have been loaded
logs_loaded = True
else:
logs_loaded = False
Please provide a valid Workspace ID, Assistant ID, or Skill ID!
if not assistant_loaded:
# The following code is for using demo workspace
import requests
print('Loading workspace data from Watson developer cloud Github repo ... ', end='')
workspace_data = requests.get("https://raw.githubusercontent.com/watson-developer-cloud/assistant-improve-recommendations-notebook/master/notebook/data/workspace.json").text
df_assistant = json_normalize(json.loads(workspace_data))
# # Specify assistant definition JSON file
# assistant_definition_file = 'SPECIFY_FILE_NAME'
# print('Loading assistant definition from {}'.format(assistant_definition_file))
# # Store assistant definition in a dataframe
# df_assistant = json_normalize(json.load(open(assistant_definition_file)))
# Get all intents
assistant_intents = [intent['intent'] for intent in df_assistant['intents'].values[0]]
# Get all dialog nodes
assistant_nodes = pd.DataFrame(df_assistant['dialog_nodes'].values[0])
print('completed!')
else:
print('Assistant definition has been loaded in Section 2.1.2.')
Loading workspace data from Watson developer cloud Github repo ... completed!
Another option is to load an existing log JSON file. Log JSON files can be produced by using Logs notebook, or fetch_logs
.
if not logs_loaded:
# The following code is for using demo logs
import requests
print('Loading demo log data from Watson developer cloud Github repo ... ', end='')
log_raw_data = requests.get("https://raw.githubusercontent.com/watson-developer-cloud/assistant-improve-recommendations-notebook/master/notebook/data/sample_logs.json").text
print('completed!')
logs = json.loads(log_raw_data)
# The following code is for loading your log file
# Specify a log JSON file
# logs = load_logs_from_file(filename='logs.json')
df_logs = pd.DataFrame(logs)
else:
print('Logs have been loaded in Section 2.1.3.')
Loading demo log data from Watson developer cloud Github repo ... completed!
The logs returned from logs
API are stored in a nested structure. In this step, we expand the nested structure and extract the fields used for analysis.
# Format the logs data from the workspace
df_formatted = format_data(df_logs)
Extracting request and response ... Extracting context and output ... Extracting intents ... Completed!
As described in Watson Assistant Continuous Improvement Best Practices, Effectiveness and Coverage are the two measures that provide a reliable understanding of your assistant’s overall performance. Both of the two measures are customizable based on your preferences. In this section, we provide a guideline for setting each of them.
Coverage measures your Watson Assistant system at the utterance level. You may include automated metrics that help identify utterences that your service is not answering. Example metrics include:
For Confidence threshold, you can set a threshold to include utterances with confidence values below this threshold. For more information regarding Confidence, see Absolute scoring.
For Dialog information, you can specify what the notebook should look for in your logs to determine that a message is not covered by your assistant.
Note that these lists are treated as "OR" conditions - any occurrence of any of them will signify that a message is not covered.
Where to find node id, node name, and node condition?
You can find the values of these variables from your assistant definition JSON file based on following mappings.
dialog_node
title
conditions
You can also find node name, and node condition in your dialog editor. For more information, see Dialog Nodes.
Below we provide example code for identifying coverage based on confidence and dialog node.
#Specify the confidence threhold you want to look for in the logs
confidence_threshold = .20
# Add coverage node ids, if any, to list
node_ids = ['node_1_1467910920863', 'node_1_1467919680248']
# Add coverage node names, if any, to list
node_names = []
# Add coverage node conditions, if any, to list
node_conditions = ['#out_of_scope || #off_topic', 'anything_else']
# Check if the dialog nodes are present in assistant definition
df_coverage_nodes = chk_is_valid_node(node_ids, node_names, node_conditions, assistant_nodes)
df_coverage_nodes
Condition | Node ID | Node Name | Valid | |
---|---|---|---|---|
0 | true | node_1_1467910920863 | NaN | True |
1 | true | node_1_1467919680248 | NaN | True |
2 | #out_of_scope || #off_topic | node_1_1467743415843 | NaN | True |
3 | anything_else | node_2_1487280430136 | NaN | True |
Effectiveness measures your Watson Assistant system at the conversation level. You may include automated metrics that help identify problematic conversations. Example metrics include:
Below we provide example code for identifying escalation based on intents and dialog information.
If you have specific intents that point to escalation or any other effectiveness measure, specify those in chk_effective_intents
lists below.
Note: If you don't have specific intents to capture effectiveness, leave chk_effective_intents list empty.
# Add your escalation intents to the list
chk_effective_intents=['connect_to_agent']
# Store the intents in a dataframe
df_chk_effective_intents = pd.DataFrame(chk_effective_intents, columns = ['Intent'])
# Add a 'valid' flag to the dataframe
df_chk_effective_intents['Valid']= True
# Add count column for selected intents
df_chk_effective_intents['Count']= 0
# Checking the validity of the specified intents. Look out for the `valid` column in the table displayed below.
for intent in chk_effective_intents:
# Check if intent is present in assistant definition
if intent not in assistant_intents:
# If not present, mark it as 'not valid'
df_chk_effective_intents.loc[df_chk_effective_intents['Intent']==intent,['Valid']] = False
# Remove intent from the chk_effective_intents list
chk_effective_intents.remove(intent)
else:
# Calculate number of times each intent is hit
count = df_formatted.loc[df_formatted['response.top_intent_intent']==intent]['log_id'].nunique()
df_chk_effective_intents.loc[df_chk_effective_intents['Intent']==intent,['Count']] = count
# Display intents and validity
df_chk_effective_intents
Intent | Valid | Count | |
---|---|---|---|
0 | connect_to_agent | True | 73 |
If you have specific dialog nodes that point to escalation or any other effectiveness measure, you can automated capture them based on three variables: node id, node name, and node condition.
Note that these lists are treated as "OR" conditions - any occurrence of any of them will signify that a message is not covered.
Where to find node id, node name, and node condition?
You can find the values of these variables from your assistant definition JSON file based on following mappings.
dialog_node
title
conditions
You can also find node name, and node condition in your dialog editor. For more information, see Dialog Nodes.
Note: If your assistant does not incorporate escalations and you do not have any other automated conversation-level quality metrics to identify problematic conversations (e.g., poor NPS, task not completed), you can simply track coverage and average confidence over a recent sample of your entire production logs. Leave an empty list for node_ids, node_names and node_conditions.
# Add effectiveness node ids, if any, to list
node_ids = []
# Add effectiveness node names, if any, to list
node_names = ['not_trained']
# Add effectiveness node conditions, if any, to list
node_conditions = ['#connect_to_agent', '#answer_not_helpful']
# If your assistant does not incorporate escalations and you do not have any other automated conversation-level quality metrics, uncomment lines below
# node_ids = []
# node_names = []
# node_conditions = []
# Check if the dialog nodes are present in assistant definition
df_chk_effective_nodes = chk_is_valid_node(node_ids, node_names, node_conditions, assistant_nodes)
df_chk_effective_nodes
Condition | Node ID | Node Name | Valid | |
---|---|---|---|---|
0 | #connect_to_agent | node_2_1537212368188 | NaN | True |
1 | #answer_not_helpful | node_1_1537212185418 | NaN | True |
2 | #not_trained | node_1_1537297843450 | not_trained | True |
The combination of effectiveness and coverage is very powerful for diagnostics. If your effectiveness and coverage metrics are high, it means that your assistant is responding to most inquiries and responding well. If either effectiveness or coverage are low, the metrics provide you with the information you need to start improving your assistant.
df_formatted_copy = df_formatted.copy(deep = True)
# Mark if a message is covered and store results in df_coverage dataframe
df_coverage = get_coverage_df(df_formatted_copy , df_coverage_nodes, confidence_threshold)
# Mark if a conversation is effective and store results in df_coverage dataframe
# Set filter_non_intent_node to True to filter out utterances whose last visited node does not contain any intents
df_effective = get_effective_df(df_formatted_copy, chk_effective_intents, df_chk_effective_nodes, filter_non_intent_node=False, assistant_nodes=assistant_nodes)
# Calculate average confidence
avg_conf = float("{0:.2f}".format(df_coverage[df_coverage['Covered']==True]['response.top_intent_confidence'].mean()*100))
# Calculate coverage
coverage = float("{0:.2f}".format((df_coverage['Covered'].value_counts().to_frame()['Covered'][True]/df_coverage['Covered'].value_counts().sum())*100))
# Calculate effectiveness
effective_perc = float("{0:.2f}".format((df_effective.loc[df_effective['Escalated_conversation']==False]['response.context.conversation_id'].nunique()/df_effective['response.context.conversation_id'].nunique())*100))
# Plot pie graphs for coverage and effectiveness
coverage_pie = make_pie(coverage, "Percent of total messages covered")
effective_pie = make_pie(effective_perc, 'Percent of non-escalated conversations')
# Messages to be displayed with effectiveness and coverage
coverage_msg = '<h2>Coverage</h2></br>A message that is not covered would either be a \
message your assistant responded to with some form \
of “I’m not trained” or that it immediately handed over \
to a human agent without attempting to respond'
effectiveness_msg = '<h2>Effectiveness</h2></br>This notebook provides a list of metrics customers \
can use to assess how effective their assistant is at \
responding to conversation and metrics '
# Display the coverage and effectiveness pie charts
HTML('<tr><th colspan="4"><div align="center"><h2>Coverage and Effectiveness<hr/></h2></div></th></tr>\
<tr>\
<td style="width:500px">{c_pie}</td>\
<td style="width:450px"><div align="left"> {c_msg} </div></td>\
<td style="width:500px">{e_pie}</td>\
<td style="width:450px"><div align="left"> {e_msg} </div></td>\
</tr>'
.format(c_pie=coverage_pie, c_msg = coverage_msg, e_pie = effective_pie, e_msg = effectiveness_msg))
Here, we can see our assistant's coverage and effectiveness. We will have to take a deeper look at both of these metrics to understand the nuances and decide where we should focus next.
Note that the distinction between a user message and a conversation. A conversation in Watson Assistant represents a session of one or more messages from a user and the associated responses returned to the user from the assistant. A conversation includes a Conversation id for the purposes of grouping a sequence of messages and responses.
# Compute the number of conversations in the log
convs = df_coverage['response.context.conversation_id'].nunique()
# Compute the number of messages in the log
msgs = df_coverage['response.context.conversation_id'].size
#Display the results
print('Overall messages\n', "=" * len('Overall messages'), '\nTotal Conversations: ', convs, '\nTotal Messages: ', msgs, '\n\n', sep = '')
#Display the coverage bar chart
display(coverage_barh(coverage, avg_conf, 'Coverage & Average confidence', False))
Overall messages ================ Total Conversations: 4319 Total Messages: 7706
Compare the coverage over time with any major updates to your assistant, to see if the changes affected the performance. Use the interval parameter to set a time interval. You can choose from: {"minute", "5-minute", "15-minute", "30-minute", "hour", "day", "week", "month"}. Move your cursor over the bars to check the coverage value.
show_coverage_over_time(df_coverage, interval='day')
# Get the escalated conversations
df_effective_true = df_effective.loc[df_effective['Escalated_conversation']==True]
# Get the non-escalated conversations
df_not_effective = df_effective.loc[df_effective['Escalated_conversation']==False]
# Calculate percentage of escalated conversations
ef_escalated = float("{0:.2f}".format(100-effective_perc))
# Calculate coverage and non-coverage in escalated conversations
if len(df_effective_true) > 0:
escalated_covered = float("{0:.2f}".format((df_effective_true['Covered'].value_counts().to_frame()['Covered'][True]/df_effective_true['Covered'].value_counts().sum())*100))
escalated_not_covered = float("{0:.2f}".format(100- escalated_covered))
else:
escalated_covered = 0
escalated_not_covered = 0
# Calculate coverage and non-coverage in non-escalated conversations
if len(df_not_effective) > 0:
not_escalated_covered = float("{0:.2f}".format((df_not_effective['Covered'].value_counts().to_frame()['Covered'][True]/df_not_effective['Covered'].value_counts().sum())*100))
not_escalated_not_covered = float("{0:.2f}".format(100 - not_escalated_covered))
else:
not_escalated_covered = 0
not_escalated_not_covered = 0
# Calculate average confidence of escalated conversations
if len(df_effective_true) > 0:
esc_avg_conf = float("{0:.2f}".format(df_effective_true[df_effective_true['Covered']==True]['response.top_intent_confidence'].mean()*100))
else:
esc_avg_conf = 0
# Calculate average confidence of non-escalated conversations
if len(df_not_effective) > 0:
not_esc_avg_conf = float("{0:.2f}".format(df_not_effective[df_not_effective['Covered']==True]['response.top_intent_confidence'].mean()*100))
else:
not_esc_avg_conf = 0
# Set sampling size for conversations, set to -1 to disable sampling
SAMPLE_SIZE = 100
export_result_excel(df_effective, sample_size=SAMPLE_SIZE)
# Get the links to the excels
all_html_link = '<a href={} target="_blank">All.xlsx</a>'.format('All.xlsx')
escalated_html_link = '<a href={} target="_blank">Escalated_sample.xlsx</a>'.format('Escalated_sample.xlsx')
not_escalated_html_link = '<a href={} target="_blank">NotEscalated_sample.xlsx</a>'.format('NotEscalated_sample.xlsx')
# Embed the links in HTML table format
link_html = '<tr><th colspan="4"><div align="left"><a id="file_list"></a>View the lists here: {} {} {}</div></th></tr>'.format(all_html_link, escalated_html_link, not_escalated_html_link)
if 100-effective_perc > 0:
escalated_bar = coverage_barh(escalated_covered, esc_avg_conf, '', True, 15, width_bar(100-effective_perc))
else:
escalated_bar = ''
if effective_perc > 0:
non_escalated_bar = coverage_barh(not_escalated_covered, not_esc_avg_conf, '' , True , 15,width_bar(effective_perc))
else:
non_escalated_bar = ''
# Plot the results
HTML('<tr><th colspan="4"><div align="left"><h2>Breakdown by effectiveness<hr/></h2></div></th></tr>\
'+ link_html + '<tr><td style= "border-right: 1px solid black; border-bottom: 1px solid black; width : 400px"><div align="left"><strong>Effectiveness (Escalated) </br>\
<font size="5">{ef_escalated}%</strong></font size></br></div></td>\
<td style="width:1000px; height=100;">{one}</td></tr>\
<tr><td style= "border-right: 1px solid black; border-bottom: 1px solid black; width : 400px;"><div align="left"><strong>Effectiveness (Not escalated) </br>\
<font size="5">{effective_perc}%</strong></font size></br></div></td>\
<td style="width:1000px; height=100;border-bottom: 1px solid black;">{two}</td>\
</tr>'.format(ef_escalated= ef_escalated,
one = escalated_bar,
effective_perc = effective_perc,
two = non_escalated_bar))
You can download all the analyzed data from All.xlsx
. A sample of escalated and non-escalated conversations are available in Escalated_sample.xlsx
and NotEscalated_sample.xlsx
respectively.
Let us take a look at the reasons for non-coverage of messages
# Count the causes for non-coverage and store results in dataframe
not_covered = pd.DataFrame(df_coverage['Not Covered cause'].value_counts().reset_index())
# Name the columns in the dataframe
not_covered.columns = ['Messages', 'Total']
not_covered
Messages | Total | |
---|---|---|
0 | 'anything_else' node | 724 |
1 | '#out_of_scope || #off_topic' node | 186 |
When users engage in a conversation session, an assistant identifies the intent of each message from the user. Based on the logic flow defined in a dialog tree, the assistant communicates with users and performs actions. The assistant may succeed or fail to satisfy users' intent. One way to identify patterns of success or failure is by analyzing which intents most often lead to a dialog node associated with resolution, and which intents most often lead to users abandoning the session. Analyzing resolved and abandoned intents can help you identify issues in your assistant to improve, such as a problematic dialog flow or imprecise intents. In this section, we demonstrate a method of conducting intent analysis using context variables.
We introduce two context variables: response_context_IntentStarted
and response_context_IntentCompleted
. You will need to modify your dialog skill definition (workspace) to introduce these variables in your dialog flow. After you modify your dialog skill definition, your logs will be marked such that when users trigger a conversation with an intent, the assistant will use response_context_IntentStarted
to record the intent. During the conversation, the assistant will use response_context_IntentCompleted
to record if the intent is satisfied. Follow the steps below to add the context variables for an intent in your dialog skill definition.
response_context_IntentStarted
as a variable and [intent_name] as the valueresponse_context_IntentCompleted
as variable and [intent_name] as the valueThen repeat the above steps for every intent you want to analyze in this way.
After completing the above steps, run the following code for intent analysis. Note that the analysis requires logs generated after the above changes. You will need to reload the updated assistant definition and logs.
# Define context variables
start_intent_variable = 'response_context_IntentStarted'
if start_intent_variable in df_formatted:
# Group dataframe by conversation_id and start_intent_variable
df_intent_started = df_formatted.groupby(['response.context.conversation_id', start_intent_variable]).count().reset_index()
# Refactors data to show only columns of conversation_id and start_intent_variable
df_intent_started = df_intent_started[['response.context.conversation_id', start_intent_variable]]
# Count the number of conversation_ids with each start_intent_variable
intent_started = df_intent_started[start_intent_variable].value_counts().reset_index()
intent_started.columns = ['Intent', 'Count']
display(HTML(intent_started.to_html()))
else:
print('Cannot find \'response_context_IntentStarted\' and \'response_context_IntentCompleted\' in logs. Please check step 4 and make sure updated logs are reloaded.')
Intent | Count | |
---|---|---|
0 | locate_amenity | 7 |
1 | turn_on | 6 |
2 | phone | 3 |
3 | turn_up | 2 |
4 | turn_down | 1 |
5 | turn_off | 1 |
end_intent_variable = 'response_context_IntentCompleted'
if end_intent_variable in df_formatted:
# Group dataframe by conversation_id and end_intent_variable
df_intent_completed = df_formatted.groupby(['response.context.conversation_id',end_intent_variable]).count().reset_index()
# Refactor data to show columns of conversation_id and end_intent_variable only
df_intent_completed = df_intent_completed[['response.context.conversation_id',end_intent_variable]]
# Count the number of conversation_ids with each end_intent_variable
intent_completed = df_intent_completed[end_intent_variable].value_counts().reset_index()
intent_completed.columns = ['Intent', 'Count']
# Show counts of resolved intents
intent_completed_title = '\nCount of resolved intents in all conversations\n'
print(intent_completed_title, "=" * len(intent_completed_title),'', sep = '')
display(HTML(intent_completed.to_html()))
# Convert dataframe to a list
res_intent_list = intent_completed.values.tolist()
# Get list of started intents
all_intent = df_intent_started[start_intent_variable].value_counts().reset_index().values.tolist()
# Loop over resolved intents list. Each element contains a pair of intent and count
data = []
for pair_ab in res_intent_list:
# Loop over each row of started intents. Each row contains a pair of intent and count
for pair_all in all_intent:
# Check if the intent name matches in started and resolved intents
if pair_ab[0] == pair_all[0]:
# Then acccesses the count from that matched intent, and calculate percentage
perc = (pair_ab[1]/pair_all[1])*100
# Add the matched intent name and percentage to data list
data.append([pair_ab[0],perc])
# Create a new dataframe with data list
resolved_percentage = pd.DataFrame(data=data).reset_index(drop=True)
if len(resolved_percentage) > 0:
# Format the dataframe, and orders data in descending order (shows highest percentage first)
resolved_percentage.columns = ['Intent','Percentage']
resolved_percentage.sort_values(ascending=False,inplace=True, by='Percentage')
# Format the data in the percentage column to include '%', and 1 decimal point
resolved_percentage['Percentage'] = resolved_percentage['Percentage'].apply(lambda x: "{0:.1f}%".format(x))
resolved_percentage.reset_index(drop=True, inplace=True)
# Show most resolved intents
most_resolved_intents = "\nMost resolved intents (%)\n"
print(most_resolved_intents, "=" * len(most_resolved_intents),'', sep = '')
display(HTML(resolved_percentage.to_html()))
else:
print('No resolved intents detected')
else:
print('Cannot find \'response_context_IntentStarted\' and \'response_context_IntentCompleted\' in logs. Please check step 4 and make sure updated logs are reloaded.')
Count of resolved intents in all conversations ================================================
Intent | Count | |
---|---|---|
0 | turn_on | 5 |
1 | locate_amenity | 3 |
2 | phone | 3 |
Most resolved intents (%) ===========================
Intent | Percentage | |
---|---|---|
0 | phone | 100.0% |
1 | turn_on | 83.3% |
2 | locate_amenity | 42.9% |
if start_intent_variable in df_formatted and end_intent_variable in df_formatted:
# Create lists of started and end_intent_variable
intent_complete_list = df_intent_completed.values.tolist()
intent_started_list = df_intent_started.values.tolist()
# Looping over completed intents list. Each element contains a pair of conversation id and end_intent_variable
for pair in intent_complete_list:
# Checks if any element is found in list of started intents
if pair in intent_started_list:
# If found, remove that pair from the list of started intents
intent_started_list.remove(pair)
# Create a new dataframe with updated dataset.
# This updated dataset contains intents that have been started but not completed, thus categorised as abandoned
df_intent_abandoned = pd.DataFrame(data=intent_started_list)
if len(df_intent_abandoned) > 0:
# Group each pair (conversation id, intent abandoned), and show number of occurances of each abandoned intent
final_intent_abandoned = df_intent_abandoned[1].value_counts().reset_index()
final_intent_abandoned.columns = ['Intent','Count']
# Show counts of abandoned intents
intent_abandoned_title = '\nCount of abandoned intents in all conversations\n'
print(intent_abandoned_title, "=" * len(intent_abandoned_title),'', sep = '')
display(HTML(final_intent_abandoned.to_html()))
# Convert dataframe to a list
aban_intent_list = final_intent_abandoned.values.tolist()
# Get list of started intents
all_intent = df_intent_started[start_intent_variable].value_counts().reset_index().values.tolist()
# Loop over resolved intents list. Each element contains a pair of intent and count
data = []
for pair_ab in aban_intent_list:
# Loop over each row of started intents. Each row contains a pair of intent and count
for pair_all in all_intent:
# Check if the intent name matches in started and resolved intents
if pair_ab[0] == pair_all[0]:
# Then acccesse the count from that matched intent, and calculate percentage
perc = (pair_ab[1]/pair_all[1])*100
# Add the matched intent name and percentage to data list
data.append([pair_ab[0],perc])
# Create a new dataframe with data list
abandoned_percentage = pd.DataFrame(data=data).reset_index(drop=True)
# Format the dataframe, and orders data in descending order (shows highest percentage first)
abandoned_percentage.columns = ['Intent','Percentage']
abandoned_percentage.sort_values(ascending=False,inplace=True, by='Percentage')
abandoned_percentage.reset_index(drop=True, inplace=True)
# Format the data in the percentage column to include '%', and 1 decimal point
abandoned_percentage['Percentage'] = abandoned_percentage['Percentage'].apply(lambda x: "{0:.1f}%".format(x))
# Show most abandoned intents
most_abandoned_intents = "\nMost abandoned intents (%)\n"
print(most_abandoned_intents, "=" * len(most_abandoned_intents),'', sep = '')
display(HTML(abandoned_percentage.to_html()))
else:
print('No abandoned intents detected')
else:
print('Cannot find \'response_context_IntentStarted\' and \'response_context_IntentCompleted\' in logs. Please check step 4 and make sure updated logs are reloaded.')
Count of abandoned intents in all conversations =================================================
Intent | Count | |
---|---|---|
0 | locate_amenity | 4 |
1 | turn_on | 2 |
2 | turn_up | 2 |
3 | turn_down | 1 |
4 | turn_off | 1 |
Most abandoned intents (%) ============================
Intent | Percentage | |
---|---|---|
0 | turn_up | 100.0% |
1 | turn_down | 100.0% |
2 | turn_off | 100.0% |
3 | locate_amenity | 57.1% |
4 | turn_on | 33.3% |
Finally, we generate an Excel file that lists all conversations for which there are abandoned and resolved intents for further analysis.
if 'df_intent_abandoned' in locals() and df_intent_abandoned is not None and df_intent_completed is not None:
if len(df_intent_abandoned) == 0:
df_intent_abandoned = pd.DataFrame(columns = ['Conversation_id','Intent'])
if len(df_intent_completed) == 0:
df_intent_completed = pd.DataFrame(columns = ['Conversation_id','Intent'])
# Rename columns
df_intent_abandoned.columns = ['Conversation_id','Intent']
df_intent_completed.columns = ['Conversation_id','Intent']
# Generate excel file
file_name = 'Abandoned_Resolved.xlsx'
generate_excel_measure([df_intent_abandoned,df_intent_completed], ['Abandoned', 'Resolved'], filename= file_name, project_io=None)
link_html = 'Abandoned and resolved intents: <b><a href={} target="_blank">Abandoned_Resolved.xlsx</a></b>'.format(file_name)
display(HTML(link_html))
else:
print('Cannot find \'response_context_IntentStarted\' and \'response_context_IntentCompleted\' in logs. Please check step 4 and make sure updated logs are reloaded.')
The metrics described above help you narrow your immediate focus of improvement. We suggest the following two strategies:
Toward improving Effectiveness
We suggest focusing on a group of problematic conversations, e.g., escalated conversations, then performing a deeper analysis on these conversation as follows.
Toward improving Coverage
For utterances where an intent was found but no response was given. We suggest performing a deeper analysis to identify root causes, e.g., missing entities or lacking of dialog logic.
For utterances where no intent was found, we suggest expanding intent coverage as follows.
For more information, please check Watson Assistant Continuous Improvement Best Practices.
Zhe Zhang, Ph.D. in Computer Science, is a Data Scientist for IBM Watson AI. Zhe has a research background in Natural Language Processing, Sentiment Analysis, Text Mining, and Machine Learning. His research has been published at leading conferences and journals including ACL and EMNLP.
Sherin Varughese is a Data Scientist for IBM Watson AI. Sherin has her graduate degree in Business Intelligence and Data Analytics and has experience in Data Analysis, Warehousing and Machine Learning.
The authors would like to thank the following members of the IBM Research and Watson Assistant teams for their contributions and reviews of the notebook: Matt Arnold, Adam Benvie, Kyle Croutwater, Eric Wayne.
Copyright © 2021 IBM. This notebook and its source code are released under the terms of the MIT License.