Hierarchical text categorization
The Watson Natural Language Processing Categories block assigns individual nodes within a hierarchical taxonomy to an input document. For example, in the text IBM announces new advances in quantum computing, examples of extracted categories
are
and technology and computing/hardware/computer
. These categories represent level 3 and level 2 nodes in a hierarchical taxonomy.technology and computing/operating systems
This block differs from the Classification block in that training starts from a set of seed phrases associated with each node in the taxonomy, and does not require labeled documents.
Note that the Hierarchical text categorization block can only be used in a notebook that is started in an environment that includes the Watson Natural Language Processing library.
Block name
categories_esa_en_stock
Supported languages
The Categories block is available for the English language
Capabilities
Use this block to determine the topics of documents on the web by categorizing web pages into a taxonomy of general domain topics, for ad placement and content recommendation. The model was tested on data from news reports and general web pages.
For a list of the categories that can be returned, see Category types.
Dependencies on other blocks
The following block must run before you can run the hierarchical categorization block:
syntax_izumo_<language>_stock
Code sample
import watson_nlp
# Load Syntax and a Categories model for English
syntax_model = watson_nlp.load('syntax_izumo_en_stock')
categories_model = watson_nlp.load('categories_esa_en_stock')
# Run the syntax model on the input text
syntax_prediction = syntax_model.run('IBM announced new advances in quantum computing')
# Run the categories model on the result of syntax
categories = categories_model.run(syntax_prediction)
print(categories)
Output of the code sample:
{ "categories": [ { "labels": [ "technology & computing", "computing" ], "score": 0.992489, "explanation": [] }, { "labels": [ "science", "physics" ], "score": 0.945449, "explanation": [] } ], "producer_id": { "name": "ESA Hierarchical Categories", "version": "1.0.0" } }
Parent topic: Watson Natural Language Processing task catalog