Go back to the English version of the documentation关键字抽取和排名
关键字抽取和排名
Last updated: 2024年11月07日
使用排名块的 Watson Natural Language Processing Keyword 抽取根据其相关性从输入文本中抽取名词短语。
受支持的语言
关键词提取和文本排序适用于以下语言:
ar , cs , da , de , en , es , fi , fr , he , hi , it , ja , ko , nb , nl , nn , pt , ro , ru , sk , sv , tr , zh-cn
有关语言代码和相应语言的列表,请参阅语言代码。
功能
关键字和文本排序块根据从输入文档中抽取的名词短语在文档中的相关性对它们进行排序。
功能 | 示例 |
---|---|
根据相关性对抽取的名词短语进行排序 | " 安娜在加州大学圣克鲁斯分校上学。 安娜于 2015 年加入大学。" -> 安娜,加州大学圣克鲁斯分校 |
关键字抽取
块名
keywords_embed-rank_multi_stock
对其他块的依赖关系
必须先运行以下块,然后才能使用排名块运行关键字抽取:
syntax_izumo_<language>_stock
noun-phrases_rbr_<language>_stock
代码样本
import watson_nlp
from watson_nlp import data_model as dm
text = "Anna went to school at University of California Santa Cruz. \
Anna joined the university in 2015."
# Load Noun Phrases, Embedding and Keywords models for English
noun_phrases_model = watson_nlp.load('noun-phrases_rbr_en_stock')
use_model = watson_nlp.load('embedding_use_en_stock')
keywords_model = watson_nlp.load('keywords_embed-rank_multi_stock')
# Run the Noun Phrases model
noun_phrases = noun_phrases_model.run(text)
# Get document embeddings
# No need to run any Syntax model since the 'raw_text' embed style will be used for doc embedding
syntax_analysis = dm.SyntaxPrediction(text=text)
doc_embeddings = use_model.run(syntax_analysis, doc_embed_style='raw_text')
# Get embeddings for noun phrases
noun_phrases_analysis = [dm.SyntaxPrediction(text=c.span.text) for c in noun_phrases.noun_phrases]
noun_phrase_embeddings = use_model.run_batch(noun_phrases_analysis, doc_embed_style='raw_text')
# Run the keywords model
keywords = keywords_model.run(doc_embeddings, noun_phrases, noun_phrase_embeddings, limit=2, beta=0.5)
print(keywords)
代码示例的输出:
{
"keywords": [
{
"text": "University of California Santa Cruz",
"relevance": 1.0,
"mentions": [
{
"begin": 23,
"end": 58,
"text": "University of California Santa Cruz"
}
],
"count": 1
},
{
"text": "Anna",
"relevance": 0.6883336359588481,
"mentions": [
{
"begin": 0,
"end": 4,
"text": "Anna"
},
{
"begin": 68,
"end": 72,
"text": "Anna"
}
],
"count": 2
}
],
"producer_id": {
"name": "Embed Rank Keywords",
"version": "0.0.2"
}
}