0 / 0
Hierarchical text categorization
Last updated: Jan 21, 2025
Hierarchical text categorization

The Watson Natural Language Processing Categories block assigns individual nodes within a hierarchical taxonomy to an input document. For example, in the text IBM announces new advances in quantum computing, examples of extracted categories are technology and computing/hardware/computer and technology and computing/operating systems. These categories represent level 3 and level 2 nodes in a hierarchical taxonomy.

This block differs from the Classification block in that training starts from a set of seed phrases associated with each node in the taxonomy, and does not require labeled documents.

Note that the Hierarchical text categorization block can only be used in a notebook that is started in an environment that includes the Watson Natural Language Processing library.

Block name

categories_esa_en_stock

Supported languages

The Categories block is available for the English language

Capabilities

Use this block to determine the topics of documents on the web by categorizing web pages into a taxonomy of general domain topics, for ad placement and content recommendation. The model was tested on data from news reports and general web pages.

For a list of the categories that can be returned, see Category types.

Dependencies on other blocks

The following block must run before you can run the hierarchical categorization block:

  • syntax_izumo_<language>_stock

Code sample

import watson_nlp

# Load Syntax and a Categories model for English
syntax_model = watson_nlp.load('syntax_izumo_en_stock')
categories_model = watson_nlp.load('categories_esa_en_stock')

# Run the syntax model on the input text
syntax_prediction = syntax_model.run('IBM announced new advances in quantum computing')

# Run the categories model on the result of syntax
categories = categories_model.run(syntax_prediction)
print(categories)

Output of the code sample:

{
  "categories": [
    {
      "labels": [
        "technology & computing",
        "computing"
      ],
      "score": 0.992489,
      "explanation": []
    },
    {
      "labels": [
        "science",
        "physics"
      ],
      "score": 0.945449,
      "explanation": []
    }
  ],
  "producer_id": {
    "name": "ESA Hierarchical Categories",
    "version": "1.0.0"
  }
}

Parent topic: Watson Natural Language Processing task catalog

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more