Watson Natural Language Processing library
The Watson Natural Language Processing library provides basic natural language processing functions for syntax analysis and out-of-the-box pre-trained models for a wide variety of text processing tasks, such as sentiment analysis, keyword extraction and vectorization. The Watson Natural Language Processing library is available for Python only.
With Watson Natural Language Processing, you can turn unstructured data into structured data, making the data easier to understand and transferable, in particular if you are working with a mix of unstructured and structured data. Examples of such data are call center records, customer complaints, social media posts, or problem reports. The unstructured data is often part of a larger data record which includes columns with structured data. Extracting meaning and structure from the unstructured data and combining this information with the data in the columns of structured data, gives you a deeper understanding of the input data and can help you to make better decisions.
Watson Natural Language Processing provides pre-trained models in over 20 languages. They are curated by a dedicated team of experts, and evaluated for quality on each specific language. These pre-trained models can be used in production environments without you having to worry about license or intellectual property infringements.
Although you can create your own models, the easiest way to get started with Watson Natural Language Processing is to run the pre-trained models on unstructured text to perform language processing tasks.
Here are some examples of language processing tasks available in Watson Natural Language Processing pre-trained models:
- Syntax: tokenization, lemmatization, part of speech tagging, and dependency parsing
- Entity extraction: find mentions of entities (like person, organization, or date)
- Keywords extraction: extract noun phrases that are relevant in the input text
- Text classification: analyze text and then assign a set of pre-defined tags or categories based on its content
- Sentiment classification: is the input document positive, negative or neutral?
- Tone classification: classify the tone in the input document (like excited, frustrated, or sad)
- Emotion classification: classify the emotion of the input document (like anger or disgust)
- Keywords extraction: extract noun phrases that are relevant in the input text
- Concepts: find concepts from DBPedia in the input text
- Relations: detect relations between two entities
- Hierarchical categories: assign individual nodes within a hierarchical taxonomy to the input document
- Embeddings: map individual words or larger text snippets into a vector space
Using Watson Natural Language Processing in a notebook
You can run your Python notebooks using the Watson Natural Language Processing library in any of the following provided environments. The GPU environment templates include the Watson Natural Language Processing library.
DO + NLP: Indicates that the environment templates includes both the CPLEX and the DOcplex libraries to model and solve decision optimization problems and the Watson Natural Language Processing library.
~ : Indicates that the environment template requires the Watson Studio Professional plan. See Offering plans.
* : Indicates that the environment is deprecated.
|Name||Hardware configuration||CUH rate per hour|
|DO + NLP Runtime 22.2 on Python 3.10||2 vCPU and 8 GB RAM||6|
|DO + NLP Runtime 22.1 on Python 3.9||2 vCPU and 8 GB RAM||6|
|GPU V100 Runtime 22.2 on Python 3.10 ~||40 vCPU + 172 GB + 1 NVIDIA TESLA K100 (1 GPU)||68|
|GPU V100 Runtime 22.1 on Python 3.9 ~||40 vCPU + 172 GB + 1 NVIDIA TESLA K100 (1 GPU)||68|
|GPU K80 Runtime 22.1 on Python 3.9 *||4 vCPU + 24 GB + 0.5 NVIDIA TESLA K80 (1 GPU)||6|
DO + NLP Runtime 22.2 on Python 3.10 or
DO + NLP Runtime 22.1 on Python 3.9 environment should be large enough to run notebooks that use the prebuild models. If you need a larger environment, for example to train
your own models, you can use a GPU V100 environment.
If you don't use a provided environment template, you can create a custom template that includes the Watson Natural Language Processing library. See Creating your own environment template.
- Create a custom template without GPU by selecting the engine type
Default, the hardware configuration size that you need, and choosing
DO + NLP Runtime 22.2 on Python 3.10or
DO + NLP Runtime 22.1 on Python 3.9as the software version.
- Create a custom template with GPU by selecting the engine type
GPU, the hardware configuration size that you need, and choosing
GPU Runtime 22.1 on Python 3.9as the software version.
Working with the pre-trained models
Watson Natural Language Processing encapsulates natural language functionality via blocks where each block supports functions to:
load(): load a block model.
run(): run the block on input argument(s).
train(): train the block on your own data. Not all blocks support training.
save(): save the block model you trained on your own data.
There are two types of blocks:
Blocks that operate directly on the input document
An example of a block that operates directly on the input document is the Syntax block, which performs natural language processing operations such as tokenization, lemmatization, part of speech tagging or dependency parsing.
This block can be loaded and run on the input document directly. For example:
import watson_nlp # Load the syntax model for English syntax_model = watson_nlp.load(watson_nlp.download('syntax_izumo_en_stock')) # Run the syntax model and print the result syntax_prediction = syntax_model.run('Welcome to IBM!') print(syntax_prediction)
Blocks that depend on other blocks
Block that depend on other blocks cannot be applied on the input document directly, and must be linked with one or more blocks in order to process the input document. In general, machine learning models such as classifiers or entity extractors that require preprocessing the input text fall into this category. For example, the Entity Mention block depends on the Syntax block.
These blocks can be loaded but can only be run in a particular order on the input document. For example:
import watson_nlp # Load Syntax and a Entity Mention model for English syntax_model = watson_nlp.load(watson_nlp.download('syntax_izumo_en_stock')) entity_model = watson_nlp.load(watson_nlp.download('entity-mentions_bert_multi_stock')) # Run the syntax model on the input text syntax_prediction = syntax_model.run('IBM announced new advances in quantum computing') # Now run the entity mention model on the result of syntax entity_mentions = entity_model.run(syntax_prediction) print(entity_mentions)
Loading and running a model
Watson Natural Language Processing contains the
download() functions to allow you to quickly load pre-trained models to your notebook. To download a model, you first need to know its name. Model names follow
a standard convention encoding the type of model (like classification or entity extraction), type of algorithm (like BERT or SVM), language code and details of the type system.
To find the right block to use, use the block catalog. See Watson NLP block catalog.
You can find the expected input for a given block class (for example to the Entity Mentions model) by using
help() on the block class
import watson_nlp help(watson_nlp.blocks.entity_mentions.BERT.run)
Sample project and notebooks
To help you get started with the Watson Natural Language Processing library, you can download a sample project and notebooks from the Cloud Pak for Data as a Service Gallery. The notebooks demonstrate how to use the different Watson Natural Language Processing blocks and how to train your own models.
You can access the Gallery by sellecting Gallery from the Cloud Pak for Data navigation menu.
This notebook shows you how to analyze financial customer complaints using Watson Natural Language Processing. It uses data from the Consumer Complaint Database published by the Consumer Financial Protection Bureau (CFPB). The notebook teaches you to use the Tone classification and Emotion classification models.
This notebook demonstrates how to analyze car complaints using Watson Natural Language Processing. It uses publicly available complaint records from car owners stored by the National Highway and Transit Association (NHTSA) of the US Department of Transportation. This notebook shows you how use syntax analysis to extract the most frequently used nouns, which typically depict the problems that review authors talk about and combine these results with structured data using association rule mining.
Complaint classification with Watson Natural Language processing
This notebook demonstrates how to train different text classifiers using Watson Natural Language Processing. The classifiers predict the product group from the text of a customer complaint. This could be used, for example to route a complaint to the appropriate staff member. The data that is used in this notebook is taken from the Consumer Complaint Database that is published by the Consumer Financial Protection Bureau (CFPB), a U.S. government agency and is publicly available. You will learn how to train a custom CNN model and a VotingEnsemble model and evaluate their quality.
Entity extraction on Financial Complaints with Watson Natural Language Processing
This notebook demonstrates how to extract named entities from financial customer complaints using Watson Natural Language Processing. It uses data from the Consumer Complaint Database published by the Consumer Financial Protection Bureau (CFPB). In the notebook you will learn how to do dictionary-based term extraction to train a custom extraction model based on given dictionaries and extract entities using the BERT model.
If you don't want to download the sample notebooks to your project individually, you can download the entire sample project Text Analysis with Watson Natural Language Processing from the Cloud Pak for Data as a Service Gallery.
The sample project contains the sample notebooks listed in the previous section, including:
Analyzing hotel reviews using Watson Natural Language Processing
This notebook shows you how to use syntax analysis to extract the most frequently used nouns from the hotel reviews, classify the sentiment of the reviews and use aspect-oriented sentiment analysis for the most frequently extracted aspects. The data file that is used by this notebook is included in the project as a data asset.
You can run all of the sample notebooks with the
DO + NLP Runtime 22.1 on Python 3.9 environment except for the Complaint classification with Watson Natural Language processing notebook. To run this notebook, you need
to create an environment template that is large enough to complete the training of the classification models on the training data.
Parent topic: Libraries and scripts for notebooks