Hierarchical text categorization
The Watson Natural Language Processing Categories block assigns individual nodes within a hierarchical taxonomy to an input document. For example, in the text IBM announces new advances in quantum computing, examples of extracted categories
are technology and computing/hardware/computer
and technology and computing/operating systems
. These categories represent level 3 and level 2 nodes in a hierarchical taxonomy.
This block differs from the Classification block in that training starts from a set of seed phrases associated with each node in the taxonomy, and does not require labeled documents.
Note that the Hierarchical text categorization block can only be used in a notebook that is started in an environment that includes the Watson Natural Language Processing library.
Block name
categories_esa_en_stock
Supported languages
The Categories block is available for the following languages. For a list of the language codes and the corresponding language, see Language codes.
de, en
Capabilities
Use this block to determine the topics of documents on the web by categorizing web pages into a taxonomy of general domain topics, for ad placement and content recommendation. The model was tested on data from news reports and general web pages.
For a list of the categories that can be returned, see Category types.
Dependencies on other blocks
The following block must run before you can run the hierarchical categorization block:
syntax_izumo_<language>_stock
Code sample
import watson_nlp
# Load Syntax and a Categories model for English
syntax_model = watson_nlp.load('syntax_izumo_en_stock')
categories_model = watson_nlp.load('categories_esa_en_stock')
# Run the syntax model on the input text
syntax_prediction = syntax_model.run('IBM announced new advances in quantum computing')
# Run the categories model on the result of syntax
categories = categories_model.run(syntax_prediction)
print(categories)
Output of the code sample:
{
"categories": [
{
"labels": [
"technology & computing",
"computing"
],
"score": 0.992489,
"explanation": []
},
{
"labels": [
"science",
"physics"
],
"score": 0.945449,
"explanation": []
}
],
"producer_id": {
"name": "ESA Hierarchical Categories",
"version": "1.0.0"
}
}
Parent topic: Watson Natural Language Processing task catalog