Analyzing Emotion and Tone of Financial Complaints with Watson NLP¶

This notebook demonstrates how to analyze financial customer complaints using Watson NLP.

The data that is used in this notebook is taken from the Consumer Complaint Database that is published by the Consumer Financial Protection Bureau (CFPB), an U.S. government agency. The Consumer Complaint Database is a collection of complaints about consumer financial products and services that the CFPB sent to companies for response. A complaint contains a consumer’s narrative description of their experience if the consumer opts to share this information publicly and after the CFPB has taken steps to remove all personal information. In this notebook, you will focus on complaints that contain narrative descriptions to show how to use Watson NLP.

The data is publicly available at https://www.consumerfinance.gov/data-research/consumer-complaints/.

What you'll learn in this notebook¶

Watson NLP offers so-called blocks for various NLP tasks. This notebooks shows:

Tone classification with the Tone classification model for English (ensemble_classification-workflow_en_tone-stock). This workflow model classifies the tone of a document as excited, frustrated, sad, polite, impolite, satisfied and sympathetic.
Emotion classification with the Emotion classification model for English (ensemble_classification-workflow_en_emotion-stock). This workflow model classifies the emotion of a document into anger, disgust, fear, joy or sadness.

Before you start¶

You can step through the notebook execution cell by cell, by selecting Shift-Enter or you can execute the entire notebook by selecting Cell -> Run All from the menu.

Note: If you have other notebooks currently running with the NLP Environment environment, stop their kernels before running this notebook. All these notebooks share the same runtime environment, and if they are running in parallel, you may encounter memory issues. To stop the kernel of another notebook, open that notebook, and select File > Stop Kernel.

Begin by importing and initializing some helper libraries that are used throughout the notebook.

In [1]:

import os
import pandas as pd
# we want to show large text snippets to be able to explore the relevant text
pd.options.display.max_colwidth = 400

In [2]:

import watson_nlp

Load the complaints¶

The data can be downloaded via an API from https://www.consumerfinance.gov/data-research/consumer-complaints/. For this notebook, the complaints for one month will be downloaded and only those that contain the consumer narrative text. The data is exported in CSV format. The URL to retrieve this data is:

In [3]:

url = "https://www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/?date_received_max=2021-03-30&date_received_min=2021-02-28&field=all&format=csv&has_narrative=true&no_aggs=true&size=18102"

Read the data into a dataframe.

You can find a detailed explanation of the available columns here: https://www.consumerfinance.gov/complaint/data-use/#:~:text=Types%20of%20complaint%20data%20we%20publish .

In the analysis, you will focus on the Product column and the column with the complaint text Consumer complaint narrative.

In [4]:

df_all = pd.read_csv(url)
text_col = 'Consumer complaint narrative'

# In this example, we take only the first 1000 complaints in the dataset for further analysis. 
# Set df to df_all to run on the complete dataset.
df_small = df_all.head(1000)
df = df_small
df.head(3)

Out[4]:

	Date received	Product	Sub-product	Issue	Sub-issue	Consumer complaint narrative	Company public response	Company	State	ZIP code	Tags	Consumer consent provided?	Submitted via	Date sent to company	Company response to consumer	Timely response?	Consumer disputed?	Complaint ID
0	03/10/21	Credit reporting, credit repair services, or other personal consumer reports	Credit reporting	Incorrect information on your report	Account status incorrect	In XX/XX/XXXX I moved with a current XXXX account and service ( so, I thought ) I transferred my account to the new apartment, in fact I went into the store and had no problem not at any time was I told I had a past due balance from my old apartment and all of my mail was forwarded. My account and service set up was seamless.. Skip to late XXXX of XXXX and I received a phone call from Sequium ...	Company believes it acted appropriately as authorized by contract or law	Sequium Asset Solutions, LLC	CO	80112	None	Consent provided	Web	03/10/21	Closed with explanation	Yes	NaN	4201725
1	03/10/21	Credit reporting, credit repair services, or other personal consumer reports	Credit reporting	Improper use of your report	Credit inquiries on your report that you don't recognize	Upon reviewing my Equifax Credit Report I noticed a hard inquiry for XXXX XXXX XXXX XXXX XXXX which I did not authorize or was aware of. \n\nInquiry Date : XX/XX/2020 Company XXXX XXXX XXXX,	None	EQUIFAX, INC.	CA	90064	None	Consent provided	Web	03/10/21	Closed with explanation	Yes	NaN	4201710
2	03/10/21	Debt collection	Credit card debt	Written notification about debt	Didn't receive enough information to verify debt	XXXX XXXX XXXX XXXX XXXX XXXX, NY XXXX Social Security # XXXX DOB : XX/XX/1955 XXXX XXXX XXXXXXXX XXXX XXXX XXXX XXXX, XXXX, Texas XXXX XXXX XXXX XXXX XXXX, XXXX XXXXXXXX XXXX XXXX, XXXX, GA XXXX XXXX XXXX XXXX, XXXX XXXX. XXXX XXXX, XXXX, PA XXXX DISCLOSURE : THIS IS NOT AN IDENTITY THEFT DISPUTE, PLEASE REFRAIN FROM TAKING ANY POSITION OF IDENTITY THEFT EITHER WITH ANY CREDIT REPORTING A...	None	PORTFOLIO RECOVERY ASSOCIATES INC	NY	11550	None	Consent provided	Web	03/10/21	Closed with explanation	Yes	NaN	4200781

You can look at all products that are available in the data set to do further analysis around these product groups.

In [5]:

df['Product'].value_counts().sort_values().plot(kind='barh') 

Out[5]:

<Axes: >

Tone classification¶

The tone classification model predicts the most prevalent tones of a document text. Available tones are excited, frustrated, sad, polite, impolite, satisfied and sympathetic. Each tone is assigned a confidence, so we can either use the highest-rated tone or we can assign several tones to a document e.g. by taking all tones whose confidence exceeds a certain threshold.

In customer complaints, you would expect the tone to be sad or frustrated. Let's see if the analysis confirms this assumption.

Start with loading the tone workflow model for English:

In [6]:

tone_model = watson_nlp.load('ensemble_classification-workflow_en_tone-stock')

Create a helper function to run the tone analysis on a single complaint. It will return all tones that have a confidence that is higher than 1/7.

In [7]:

def classify_tone(complaint_text):
    # run the tone model 
    tone_result = tone_model.run(complaint_text)
    tone_classes = [c.to_dict() for c in tone_result.classes]
    tone_conf = [c['class_name'] for c in tone_classes if c['confidence'] > 0.14]
    return tone_conf

Run the tone classification on the dataframe and show the tones with the product and the complaint text. Note: This cell will run for several minutes.
For better progress feedback, the cell is using progress_apply from the tqdm library. You can also use apply directly, i.e. df[text_col].apply(..).

In [8]:

from tqdm.notebook import tqdm
tqdm.pandas(colour='green')

In [9]:

# run tone classification and create a dataframe holding the tones
tone = df[text_col].progress_apply(lambda text: classify_tone(text))
tone_df = pd.DataFrame(tone)
tone_df.rename(inplace=True, columns={text_col:'Tones'})
# combine with our complaint dataframe
text_tone_df = df[["Product", text_col]].merge(tone_df, how='left', left_index=True, right_index=True)
text_tone_df.head()

  0%|          | 0/1000 [00:00<?, ?it/s]

Out[9]:

	Product	Consumer complaint narrative	Tones
0	Credit reporting, credit repair services, or other personal consumer reports	In XX/XX/XXXX I moved with a current XXXX account and service ( so, I thought ) I transferred my account to the new apartment, in fact I went into the store and had no problem not at any time was I told I had a past due balance from my old apartment and all of my mail was forwarded. My account and service set up was seamless.. Skip to late XXXX of XXXX and I received a phone call from Sequium ...	[sad, polite, frustrated]
1	Credit reporting, credit repair services, or other personal consumer reports	Upon reviewing my Equifax Credit Report I noticed a hard inquiry for XXXX XXXX XXXX XXXX XXXX which I did not authorize or was aware of. \n\nInquiry Date : XX/XX/2020 Company XXXX XXXX XXXX,	[polite]
2	Debt collection	XXXX XXXX XXXX XXXX XXXX XXXX, NY XXXX Social Security # XXXX DOB : XX/XX/1955 XXXX XXXX XXXXXXXX XXXX XXXX XXXX XXXX, XXXX, Texas XXXX XXXX XXXX XXXX XXXX, XXXX XXXXXXXX XXXX XXXX, XXXX, GA XXXX XXXX XXXX XXXX, XXXX XXXX. XXXX XXXX, XXXX, PA XXXX DISCLOSURE : THIS IS NOT AN IDENTITY THEFT DISPUTE, PLEASE REFRAIN FROM TAKING ANY POSITION OF IDENTITY THEFT EITHER WITH ANY CREDIT REPORTING A...	[sad, polite]
3	Debt collection	XXXX XXXX XXXX XXXX XXXX XXXX, NY XXXX Social Security # XXXX DOB : XX/XX/XXXX XXXX XXXX XXXX, P. O. Box XXXX, XXXX, Texas XXXX XXXX XXXX XXXX XXXX, XXXX XXXX. Box XXXX, XXXX, GA XXXX XXXX XXXX XXXX, P. O. Box XXXX, XXXX, PA XXXX DISCLOSURE : THIS IS NOT AN IDENTITY THEFT DISPUTE, PLEASE REFRAIN FROM TAKING ANY POSITION OF IDENTITY THEFT EITHER WITH ANY CREDIT REPORTING AGENCY OR ANY SUBSCRIBE...	[frustrated]
4	Debt collection	failed to validate and delete inaccurate information on my credit report after months of me disputing the inaccurate Information	[sad, frustrated]

Display the tones of the complaints by product¶

Use the explode function to transform the tones list to separate rows for each tone. That way, you can count the occurrences for each tone in a subsequent step.

In [10]:

exp_tones = text_tone_df.explode('Tones')
# Count tone occurrences and use the relative frequency. unstack() creates a column for each tone.
unstacked = exp_tones.groupby('Product')['Tones'].value_counts(normalize=True).unstack()
# Plot a horizontal bar chart
unstacked.plot.barh(stacked=True).legend(loc='center left', bbox_to_anchor=(1.0, 0.5))

Out[10]:

<matplotlib.legend.Legend at 0x7f2b662f1750>

As expected, most complaints are classified as sad or frustrated, but still using a polite tone. There is no strong indicator that some products have a higher frustration rate than others.

Emotion classification¶

The emotion classification model classifies the emotion of a document text. Available emotions are anger, disgust, fear, joy and sadness. As for tones, each emotion is assigned a confidence score. This time you will concentrate on the emotion with the highest confidence score.

You would expect anger and sadness to be the most prevalent emotions in the complaint data set.

Start with loading the emotion workflow model for English:

In [11]:

emotion_model = watson_nlp.load('ensemble_classification-workflow_en_emotion-stock')

Again, use a helper model to run the model on a single complaint. The classes are ordered by the confidence score. So you can use the first emotion as the prevalent emotion with the highest confidence.

In [12]:

def classify_emotion(complaint_text):
    # run the emotion model 
    emotion_result = emotion_model.run(complaint_text)
    # get the first emotion as the one with the highest confidence
    top_emotion = emotion_result.classes[0].to_dict()['class_name']
    return top_emotion

Run the emotion classification on the dataframe and show the highest ranked emotion with the product and the complaint text. Note: This cell will run for several minutes.
For better progress feedback, the cell is using progress_apply from the tqdm library. You can also use apply directly, i.e. df[text_col].apply(..).

In [13]:

# run emotion classification and create a dataframe holding the results
emotion = df[text_col].progress_apply(lambda text: classify_emotion(text))
emotion_df = pd.DataFrame(emotion)
emotion_df.rename(inplace=True, columns={text_col:'Emotion'})
# combine with our complaint dataframe
text_emotion_df = df[["Product", text_col]].merge(emotion_df, how='left', left_index=True, right_index=True)
text_emotion_df.head(3)

  0%|          | 0/1000 [00:00<?, ?it/s]

Out[13]:

	Product	Consumer complaint narrative	Emotion
0	Credit reporting, credit repair services, or other personal consumer reports	In XX/XX/XXXX I moved with a current XXXX account and service ( so, I thought ) I transferred my account to the new apartment, in fact I went into the store and had no problem not at any time was I told I had a past due balance from my old apartment and all of my mail was forwarded. My account and service set up was seamless.. Skip to late XXXX of XXXX and I received a phone call from Sequium ...	sadness
1	Credit reporting, credit repair services, or other personal consumer reports	Upon reviewing my Equifax Credit Report I noticed a hard inquiry for XXXX XXXX XXXX XXXX XXXX which I did not authorize or was aware of. \n\nInquiry Date : XX/XX/2020 Company XXXX XXXX XXXX,	sadness
2	Debt collection	XXXX XXXX XXXX XXXX XXXX XXXX, NY XXXX Social Security # XXXX DOB : XX/XX/1955 XXXX XXXX XXXXXXXX XXXX XXXX XXXX XXXX, XXXX, Texas XXXX XXXX XXXX XXXX XXXX, XXXX XXXXXXXX XXXX XXXX, XXXX, GA XXXX XXXX XXXX XXXX, XXXX XXXX. XXXX XXXX, XXXX, PA XXXX DISCLOSURE : THIS IS NOT AN IDENTITY THEFT DISPUTE, PLEASE REFRAIN FROM TAKING ANY POSITION OF IDENTITY THEFT EITHER WITH ANY CREDIT REPORTING A...	sadness

Display the emotion classification for each product group¶

In [14]:

unstacked = text_emotion_df.groupby('Product')['Emotion'].value_counts(normalize=True).unstack()
unstacked.plot.barh(stacked=True).legend(loc='center left',bbox_to_anchor=(1.0, 0.5))

Out[14]:

<matplotlib.legend.Legend at 0x7f2c978b6ce0>

As expected, the most prevalent emotions in the complaints are sadness and anger. In contrast to the tones classification, we picked only the emotion with the highest confidence score and not multiple emotions with a score above a certain threshold. Sadness seems to be the 'stronger' emotion overall, with higher confidences than anger. Companies might have a look at products or complaints showing emotion anger, because the customers that created those complaints might be 'pissed-off' most.

Summary¶

This notebook shows you how to use the Watson NLP library and how quickly and easily you can get started with Watson NLP by running the pretrained models for tone and emotion classification and entity extraction. You learned how easy you can extract custom terms using dictionaries.

Authors¶

Simone Zerfass IBM, Germany

Alexander Lang IBM, Germany