Entity extraction
The Watson Natural Language Processing Entity extraction blocks extract entities from input text.
Block name
The Watson Natural Language Processing library offers 2 entity extraction blocks:
- For machine-learning-based extraction:
entity-mentions_bert_multi_stock
- For rule-based extraction:
entity-mentions_rbr_xx_stock
(where xx is the language code)
Supported languages
Entity extraction is available for the following languages. For a list of the language codes and the corresponding language, see Language codes.
ar, cs, da, de, en, es, fi, fr, he, hi, it, ja, ko, nb, nl, nn, pt, ro, ru, sk, sv, tr, zh-cn, zh-tw (rbr only)
Machine-learning-based extraction
The machine-learning-based extraction model entity-mentions_bert_multi_stock
is trained on labeled data for the more complex entity types such as person, organization and location.
Capabilities
The entity block extract entities from the input text. The following types of entities are recognized:
- Date
- Duration
- Facility
- Geographic feature
- Job title
- Location
- Measure
- Money
- Ordinal
- Organization
- Person
- Time
Capabilities | Examples |
---|---|
Extracts entities from the input text. | 'IBM\'s CEO Arvind Krishna is based in the US' -> 'IBM\Organization' , 'CEO'\JobTitle, 'Arvind Krishna'\Person, 'US'\Location |
Dependencies on other blocks
The following block must run before you can run the Entity extraction block:
syntax_izumo_<language>_stock
Code sample
import watson_nlp
# Load Syntax Model for English, and the multilingual BERT Entity model
syntax_model = watson_nlp.load(watson_nlp.download('syntax_izumo_en_stock'))
bert_entity_model = watson_nlp.load(watson_nlp.download('entity-mentions_bert_multi_stock'))
# Run the syntax model on the input text
syntax_prediction = syntax_model.run('IBM\'s CEO Arvind Krishna is based in the US')
# Run the entity mention model on the result of syntax model
bert_entity_mentions = bert_entity_model.run(syntax_prediction)
print(bert_entity_mentions)
Output of the code sample:
{
"mentions": [
{
"span": {
"begin": 0,
"end": 3,
"text": "IBM"
},
"type": "Organization",
"producer_id": {
"name": "BERT Entity Mentions",
"version": "0.0.1"
},
"confidence": 0.9944692850112915,
"mention_type": "MENTT_UNSET",
"mention_class": "MENTC_UNSET",
"role": ""
},
{
"span": {
"begin": 6,
"end": 9,
"text": "CEO"
},
"type": "JobTitle",
"producer_id": {
"name": "BERT Entity Mentions",
"version": "0.0.1"
},
"confidence": 0.9871304631233215,
"mention_type": "MENTT_UNSET",
"mention_class": "MENTC_UNSET",
"role": ""
},
{
"span": {
"begin": 10,
"end": 24,
"text": "Arvind Krishna"
},
"type": "Person",
"producer_id": {
"name": "BERT Entity Mentions",
"version": "0.0.1"
},
"confidence": 0.9988446235656738,
"mention_type": "MENTT_UNSET",
"mention_class": "MENTC_UNSET",
"role": ""
},
{
"span": {
"begin": 41,
"end": 43,
"text": "US"
},
"type": "Location",
"producer_id": {
"name": "BERT Entity Mentions",
"version": "0.0.1"
},
"confidence": 0.9911670088768005,
"mention_type": "MENTT_UNSET",
"mention_class": "MENTC_UNSET",
"role": ""
}
],
"producer_id": {
"name": "BERT Entity Mentions",
"version": "0.0.1"
}
}
Rule-based extraction
The rule-based model entity-mentions_rbr_xx_stock
identifies syntactically regular entities.
Capabilities
Rule-based extraction handles syntactically regular entity types. The entity block extract entities from the input text. The following types of entities are recognized:
- PhoneNumber
- EmailAddress
- Number
- Percent
- IPAddress
- HashTag
- TwitterHandle
- URLDate
Capabilities | Examples |
---|---|
Extracts syntactically regular entity types from the input text. | 'My email is [email protected]' -> '[email protected]'\EmailAddress |
Dependencies on other blocks
None
Code sample
import watson_nlp
# Load a rule-based Entity Mention model for English
rbr_entity_model = watson_nlp.load(watson_nlp.download('entity-mentions_rbr_en_stock'))
# Run the entity model on the input text
rbr_entity_mentions = rbr_entity_model.run('My email is [email protected]')
print(rbr_entity_mentions)
Output of the code sample:
{
"mentions": [
{
"span": {
"begin": 12,
"end": 27,
"text": "[email protected]"
},
"type": "EmailAddress",
"producer_id": {
"name": "RBR mentions",
"version": "0.0.1"
},
"confidence": 0.8,
"mention_type": "MENTT_UNSET",
"mention_class": "MENTC_UNSET",
"role": ""
}
],
"producer_id": {
"name": "RBR mentions",
"version": "0.0.1"
}
}
Parent topic: Watson Natural Language Processing block catalog