0 / 0
Entity extraction
Entity extraction

Entity extraction

The Watson Natural Language Processing Entity extraction blocks extract entities from input text.

Block name

The Watson Natural Language Processing library offers 2 entity extraction blocks:

  • For machine-learning-based extraction: entity-mentions_bert_multi_stock
  • For rule-based extraction: entity-mentions_rbr_xx_stock (where xx is the language code)

Supported languages

Entity extraction is available for the following languages. For a list of the language codes and the corresponding language, see Language codes.

ar, cs, da, de, en, es, fi, fr, he, hi, it, ja, ko, nb, nl, nn, pt, ro, ru, sk, sv, tr, zh-cn, zh-tw (rbr only)

Machine-learning-based extraction

The machine-learning-based extraction model entity-mentions_bert_multi_stock is trained on labeled data for the more complex entity types such as person, organization and location.

Capabilities

The entity block extract entities from the input text. The following types of entities are recognized:

  • Date
  • Duration
  • Facility
  • Geographic feature
  • Job title
  • Location
  • Measure
  • Money
  • Ordinal
  • Organization
  • Person
  • Time
Capabilities of machine-learning-based extraction based on an example
Capabilities Examples
Extracts entities from the input text. 'IBM\'s CEO Arvind Krishna is based in the US' -> 'IBM\Organization' , 'CEO'\JobTitle, 'Arvind Krishna'\Person, 'US'\Location

Dependencies on other blocks

The following block must run before you can run the Entity extraction block:

  • syntax_izumo_<language>_stock

Code sample

import watson_nlp

# Load Syntax Model for English, and the multilingual BERT Entity model 
syntax_model = watson_nlp.load(watson_nlp.download('syntax_izumo_en_stock'))
bert_entity_model = watson_nlp.load(watson_nlp.download('entity-mentions_bert_multi_stock'))

# Run the syntax model on the input text
syntax_prediction = syntax_model.run('IBM\'s CEO Arvind Krishna is based in the US')

# Run the entity mention model on the result of syntax model
bert_entity_mentions = bert_entity_model.run(syntax_prediction)
print(bert_entity_mentions)

Output of the code sample:

{
  "mentions": [
    {
      "span": {
        "begin": 0,
        "end": 3,
        "text": "IBM"
      },
      "type": "Organization",
      "producer_id": {
        "name": "BERT Entity Mentions",
        "version": "0.0.1"
      },
      "confidence": 0.9944692850112915,
      "mention_type": "MENTT_UNSET",
      "mention_class": "MENTC_UNSET",
      "role": ""
    },
    {
      "span": {
        "begin": 6,
        "end": 9,
        "text": "CEO"
      },
      "type": "JobTitle",
      "producer_id": {
        "name": "BERT Entity Mentions",
        "version": "0.0.1"
      },
      "confidence": 0.9871304631233215,
      "mention_type": "MENTT_UNSET",
      "mention_class": "MENTC_UNSET",
      "role": ""
    },
    {
      "span": {
        "begin": 10,
        "end": 24,
        "text": "Arvind Krishna"
      },
      "type": "Person",
      "producer_id": {
        "name": "BERT Entity Mentions",
        "version": "0.0.1"
      },
      "confidence": 0.9988446235656738,
      "mention_type": "MENTT_UNSET",
      "mention_class": "MENTC_UNSET",
      "role": ""
    },
    {
      "span": {
        "begin": 41,
        "end": 43,
        "text": "US"
      },
      "type": "Location",
      "producer_id": {
        "name": "BERT Entity Mentions",
        "version": "0.0.1"
      },
      "confidence": 0.9911670088768005,
      "mention_type": "MENTT_UNSET",
      "mention_class": "MENTC_UNSET",
      "role": ""
    }
  ],
  "producer_id": {
    "name": "BERT Entity Mentions",
    "version": "0.0.1"
  }
}


Rule-based extraction

The rule-based model entity-mentions_rbr_xx_stock identifies syntactically regular entities.

Capabilities

Rule-based extraction handles syntactically regular entity types. The entity block extract entities from the input text. The following types of entities are recognized:

  • PhoneNumber
  • EmailAddress
  • Number
  • Percent
  • IPAddress
  • HashTag
  • TwitterHandle
  • URLDate
Capabilities of rule-based extraction based on an example
Capabilities Examples
Extracts syntactically regular entity types from the input text. 'My email is [email protected]' -> '[email protected]'\EmailAddress

Dependencies on other blocks

None

Code sample

import watson_nlp

# Load a rule-based Entity Mention model for English
rbr_entity_model = watson_nlp.load(watson_nlp.download('entity-mentions_rbr_en_stock'))

# Run the entity model on the input text
rbr_entity_mentions = rbr_entity_model.run('My email is [email protected]')
print(rbr_entity_mentions)

Output of the code sample:

{
  "mentions": [
    {
      "span": {
        "begin": 12,
        "end": 27,
        "text": "[email protected]"
      },
      "type": "EmailAddress",
      "producer_id": {
        "name": "RBR mentions",
        "version": "0.0.1"
      },
      "confidence": 0.8,
      "mention_type": "MENTT_UNSET",
      "mention_class": "MENTC_UNSET",
      "role": ""
    }
  ],
  "producer_id": {
    "name": "RBR mentions",
    "version": "0.0.1"
  }
}

Parent topic: Watson Natural Language Processing block catalog