The Watson Natural Language Processing Categories block assigns individual nodes within a hierarchical taxonomy to an input document. For example, in the text IBM announces new advances in quantum computing, examples of extracted categories
are technology and computing/hardware/computer
and technology and computing/operating systems
. These categories represent level 3 and level 2 nodes in a hierarchical taxonomy.
This block differs from the Classification block in that training starts from a set of seed phrases associated with each node in the taxonomy, and does not require labeled documents.
Note that the Hierarchical text categorization block can only be used in a notebook that is started in an environment that includes the Watson Natural Language Processing library.
Block name
categories_esa_en_stock
Supported languages
The Categories block is available for the English language
Capabilities
Use this block to determine the topics of documents on the web by categorizing web pages into a taxonomy of general domain topics, for ad placement and content recommendation. The model was tested on data from news reports and general web pages.
For a list of the categories that can be returned, see Category types.
Dependencies on other blocks
The following block must run before you can run the hierarchical categorization block:
syntax_izumo_<language>_stock
Code sample
import watson_nlp
# Load Syntax and a Categories model for English
syntax_model = watson_nlp.load('syntax_izumo_en_stock')
categories_model = watson_nlp.load('categories_esa_en_stock')
# Run the syntax model on the input text
syntax_prediction = syntax_model.run('IBM announced new advances in quantum computing')
# Run the categories model on the result of syntax
categories = categories_model.run(syntax_prediction)
print(categories)
Output of the code sample:
{
"categories": [
{
"labels": [
"technology & computing",
"computing"
],
"score": 0.992489,
"explanation": []
},
{
"labels": [
"science",
"physics"
],
"score": 0.945449,
"explanation": []
}
],
"producer_id": {
"name": "ESA Hierarchical Categories",
"version": "1.0.0"
}
}
Parent topic: Watson Natural Language Processing task catalog