Keyword extraction and ranking
The Watson Natural Language Processing Keyword extraction with ranking block extracts noun phrases from input text based on their relevance.
Supported languages
Keyword extraction with text ranking is available for the following languages:
ar, cs, da, de, en, es, fi, fr, he, hi, it, ja, ko, nb, nl, nn, pt, ro, ru, sk, sv, tr, zh-cn
For a list of language codes and corresponding languages, see Language codes.
Capabilities
The keywords and text rank block ranks noun phrases extracted from an input document based on how relevant they are within the document.
Capabilities | Examples |
---|---|
Ranks extracted noun phrases based on relevance | "Anna went to school at University of California Santa Cruz. Anna joined the university in 2015." -> Anna, University of California Santa Cruz |
Keyword extraction
Block name
keywords_embed-rank_multi_stock
Dependencies on other blocks
The following blocks must run before you can run the Keyword extraction with ranking block:
syntax_izumo_<language>_stock
noun-phrases_rbr_<language>_stock
Code sample
import watson_nlp
from watson_nlp import data_model as dm
text = "Anna went to school at University of California Santa Cruz. \
Anna joined the university in 2015."
# Load Noun Phrases, Embedding and Keywords models for English
noun_phrases_model = watson_nlp.load('noun-phrases_rbr_en_stock')
use_model = watson_nlp.load('embedding_use_en_stock')
keywords_model = watson_nlp.load('keywords_embed-rank_multi_stock')
# Run the Noun Phrases model
noun_phrases = noun_phrases_model.run(text)
# Get document embeddings
# No need to run any Syntax model since the 'raw_text' embed style will be used for doc embedding
syntax_analysis = dm.SyntaxPrediction(text=text)
doc_embeddings = use_model.run(syntax_analysis, doc_embed_style='raw_text')
# Get embeddings for noun phrases
noun_phrases_analysis = [dm.SyntaxPrediction(text=c.span.text) for c in noun_phrases.noun_phrases]
noun_phrase_embeddings = use_model.run_batch(noun_phrases_analysis, doc_embed_style='raw_text')
# Run the keywords model
keywords = keywords_model.run(doc_embeddings, noun_phrases, noun_phrase_embeddings, limit=2, beta=0.5)
print(keywords)
Output of the code sample:
{
"keywords": [
{
"text": "University of California Santa Cruz",
"relevance": 1.0,
"mentions": [
{
"begin": 23,
"end": 58,
"text": "University of California Santa Cruz"
}
],
"count": 1
},
{
"text": "Anna",
"relevance": 0.6883336359588481,
"mentions": [
{
"begin": 0,
"end": 4,
"text": "Anna"
},
{
"begin": 68,
"end": 72,
"text": "Anna"
}
],
"count": 2
}
],
"producer_id": {
"name": "Embed Rank Keywords",
"version": "0.0.2"
}
}
Parent topic: Watson Natural Language Processing task catalog