Watson Natural Language Processing library (beta)
The Watson Natural Language Processing library provides basic natural language processing functions for syntax analysis and out-of-the-box pre-trained models for a wide variety of text processing tasks, such as sentiment analysis, keyword extraction and vectorization. The Watson Natural Language Processing library is available for Python only.
Note: This tool is provided as a beta release and is not supported for use in production environments.
With Watson Natural Language Processing, you can turn unstructured data into structured data, making the data easier to understand and transferable, in particular if you are working with a mix of unstructured and structured data. Examples of such data are call center records, customer complaints, social media posts, or problem reports. The unstructured data is often part of a larger data record which includes columns with structured data. Extracting meaning and structure from the unstructured data and combining this information with the data in the columns of structured data, gives you a deeper understanding of the input data and can help you to make better decisions.
Watson Natural Language Processing provides pre-trained models in over 20 languages. They are curated by a dedicated team of experts, and evaluated for quality on each specific language. These pre-trained models can be used in production environments without you having to worry about license or intellectual property infringements.
Although you can create your own models, the easiest way to get started with Watson Natural Language Processing is to run the pre-trained models on unstructured text to perform language processing tasks.
Here are some examples of language processing tasks available in Watson Natural Language Processing pre-trained models:
- Syntax: tokenization, lemmatization, part of speech tagging, and dependency parsing
- Entity extraction: find mentions of entities (like person, organization, or date)
- Text classification: analyze text and then assign a set of pre-defined tags or categories based on its content
- Sentiment classification: is the input document positive, negative or neutral?
- Tone classification: classify the tone in the input document (like excited, frustrated, or sad)
- Emotion classification: classify the emotion of the input document (like anger or disgust)
- Keywords extraction: extract noun phrases that are relevant in the input text
- Embeddings: map individual words or larger text snippets into a vector space
Using Watson Natural Language Processing in a notebook
You can use the Watson Natural Language Processing library in Python notebooks in Watson Studio by selecting the
Default Python 3.8 + Watson NLP XS (beta) environment at the time you open the notebook. The Watson Natural Language
Processing library is pre-installed for you when the runtime is instantiated. See Compute resource options for the notebook editor in projects.
Default Python 3.8 + Watson NLP XS (beta) environment should be large enough to run notebooks that use the prebuild models. If you need a larger environment, for example to train your own models, you can create your own environment
template. To create your own environment template, select the engine type
Default, the hardware configuartion size that you need, and choose
Default Python 3.8 + Watson NLP as the software version to include the
Watson Natural Language Processing library. For details, see Creating your own environment template.
Working with the pre-trained models
Watson Natural Language Processing encapsulates natural language functionality via blocks where each block supports functions to:
load(): load a block model.
run(): run the block on input argument(s).
train(): train the block on your own data. Not all blocks support training.
save(): save the block model you trained on your own data.
There are two types of blocks:
Blocks that operate directly on the input document
An example is the Syntax block, which performs natural language processing operations such as tokenization, lemmatization, part of speech tagging or dependency parsing.
This block can be loaded and run on the input document directly. For example:
import watson_nlp # Load the syntax model for English syntax_model = watson_nlp.load(watson_nlp.download('syntax_izumo_en_stock')) # Run the syntax model and print the result syntax_prediction = syntax_model.run('Welcome to IBM!') print(syntax_prediction)
Blocks that depend on other blocks
These blocks cannot be applied on the input document directly, and must be linked with one or more blocks in order to process the input document. In general, machine learning models such as classifiers or entity extractors that require preprocessing the input text fall into this category. For example, the Entity Mention block depends on the Syntax block.
These blocks can be loaded but can only be run in a particular order on the input document. For example:
import watson_nlp # Load Syntax and a Entity Mention model for English syntax_model = watson_nlp.load(watson_nlp.download('syntax_izumo_en_stock')) entity_model = watson_nlp.load(watson_nlp.download('entity-mentions_bert_multi_stock')) # Run the syntax model on the input text syntax_prediction = syntax_model.run('IBM announced new advances in quantum computing') # Now run the entity mention model on the result of syntax entity_mentions = entity_model.run(syntax_prediction) print(entity_mentions)
Loading and running a model
Watson Natural Language Processing contains the
download() functions to allow you to quickly load pre-trained models to your notebook. To download a model, you first need to know its name. Model names follow
a standard convention encoding the type of model (like classification or entity extraction), type of algorithm (like BERT or SVM), language code and details of the type system.
To find the right block to use, use the block catalog. See Watson NLP block catalog.
You can find the expected input for a given block class (for example to the Entity Mentions model) by using
help() on the block class
import watson_nlp help(watson_nlp.blocks.entity_mentions.BERT.run)
Sample project and notebooks
To help you get started with the Watson Natural Language Processing library, you can download a sample project and notebooks from the Cloud Pak for Data as a Service Gallery. The notebooks demonstrate how to use the different Watson Natural Language Processing blocks and how to train your own models.
You can access the Gallery by sellecting Gallery from the Cloud Pak for Data navigation menu.
This notebook shows you how to analyze financial customer complaints using Watson Natural Language Processing. It uses data from the Consumer Complaint Database published by the Consumer Financial Protection Bureau (CFPB). The notebook teaches you to use the Tone classification and Emotion classification models.
This notebook demonstrates how to analyze car complaints using Watson Natural Language Processing. It uses publicly available complaint records from car owners stored by the National Highway and Transit Assocoation (NHTSA) of the US Department of Transportation. This notebook shows you how use syntax analysis to extract the most frequently used nouns, which typically depict the problems that review authors talk about and combine these results with structured data using association rule mining.
This notebook demonstrates how to train different text classifiers using Watson Natural Language Processing. The classifiers predict the product group from the text of a customer complaint. This could be used, for example to route a complaint to the appropriate staff member. The data that is used in this notebook is taken from the Consumer Complaint Database that is published by the Consumer Financial Protection Bureau (CFPB), a U.S. government agency and is publicly available. You will learn how to train a custom CNN model and a VotingEnsemble model and evaluate their quality.
This notebook demonstrates how to extract named entities from financial customer complaints using Watson Natural Language Processing. It uses data from the Consumer Complaint Database published by the Consumer Financial Protection Bureau (CFPB). In the notebook you will learn how to do dictionary-based term extraction to train a custom extraction model based on given dictionaries and extract entities using the BERT model.
If you don't want to download the sample notebooks to your project individually, you can download the entire sample project Text Analysis with Watson Natural Language Processing from the Cloud Pak for Data as a Service Gallery.
The sample project contains the sample notebooks listed in the previous section, including:
Analyzing hotel reviews using Watson Natural Language Processing
This notebook shows you how to use syntax analysis to extract the most frequently used nouns from the hotel reviews, classify the sentiment of the reviews and use aspect-oriented sentiment analysis for the most frequently extracted aspects. The data file that is used by this notebook is included in the project as a data asset.
You can run all of the sample notebooks with the
Default Python 3.8 + Watson NLP XS (beta) environment except for the Complaint classification with Watson Natural Language processing notebook. To run this notebook, you
need to create an environment template that is large enough to complete the training of the classification models on the training data.
Parent topic: Libraries and scripts for notebooks