分层文本分类 | IBM Cloud Pak for Data as a Service

Go back to the English version of the documentation

分层文本分类

Last updated: 2024年7月29日

分层文本分类

Watson Natural Language Processing Categories 块将分层分类法中的各个节点分配给输入文档。例如，在文本 IBM 声明了量子计算的新进展中，抽取的类别的示例为 technology and computing/hardware/computer 和 technology and computing/operating systems。这些类别表示分层分类法中的级别 3 和级别 2 节点。

此块与 "分类" 块不同，因为训练从与分类法中的每个节点关联的一组种子短语开始，并且不需要带标签的文档。

请注意，分层文本分类块只能在包含 Watson Natural Language Processing 库的环境中启动的 Notebook 中使用。

块名

categories_esa_en_stock

受支持的语言

"类别" 块可用于以下语言。有关语言代码和相应语言的列表，请参阅语言代码。

德昂

功能

使用此块来确定 Web 上的文档主题，方法是将 Web 页面分类为常规域主题的分类法，以进行广告放置和内容推荐。该模型在来自新闻报道和一般网页的数据上进行了测试。

有关可返回的类别的列表，请参阅类别类型。

对其他块的依赖关系

必须先运行以下块，然后才能运行分层分类块:

syntax_izumo_<language>_stock

代码样本

import watson_nlp

# Load Syntax and a Categories model for English
syntax_model = watson_nlp.load('syntax_izumo_en_stock')
categories_model = watson_nlp.load('categories_esa_en_stock')

# Run the syntax model on the input text
syntax_prediction = syntax_model.run('IBM announced new advances in quantum computing')

# Run the categories model on the result of syntax
categories = categories_model.run(syntax_prediction)
print(categories)

代码示例的输出：

{
  "categories": [
    {
      "labels": [
        "technology & computing",
        "computing"
      ],
      "score": 0.992489,
      "explanation": []
    },
    {
      "labels": [
        "science",
        "physics"
      ],
      "score": 0.945449,
      "explanation": []
    }
  ],
  "producer_id": {
    "name": "ESA Hierarchical Categories",
    "version": "1.0.0"
  }
}

父主题: Watson Natural Language Processing 任务目录