Entity extraction
The Watson Natural Language Processing Entity extraction models extract entities from input text.
For details, on available extraction types, refer to these sections:
- Machine-learning-based extraction for general entities
- Rule-based extraction for general entities
- Rule-based extraction for PII entities
Machine-learning-based extraction for general entities
The machine-learning-based extraction models are trained on labeled data for the more complex entity types such as person, organization and location.
Capabilities
The entity models extract entities from the input text. The following types of entities are recognized:
- Date
- Duration
- Facility
- Geographic feature
- Job title
- Location
- Measure
- Money
- Ordinal
- Organization
- Person
- Time
Capabilities | Examples |
---|---|
Extracts entities from the input text. | IBM's CEO Arvind Krishna is based in the US -> IBM\Organization , CEO\JobTitle , Arvind Krishna\Person , US\Location |
Available workflows and blocks differ, depending on the runtime used.
Workflow names
entity-mentions_transformer-workflow_multilingual_slate.153m.distilled
: this workflow can be used on both CPUs and GPUs.entity-mentions_transformer-workflow_multilingual_slate.153m.distilled-cpu
: this workflow is optimized for CPU-based runtimes.
Supported languages
Entity extraction is available for the following languages.
ar, cs, da, de, en, es, fi, fr, he, hi, it, ja, ko, nb, nl, nn, pt, ro, ru, sk, sv, tr, zh-cn
For a list of language codes and corresponding languages, see Language codes
Code sample
import watson_nlp
# Load the workflow model
entities_workflow = watson_nlp.load('entity-mentions_transformer-workflow_multilingual_slate.153m.distilled')
# Run the entity extraction workflow on the input text
entities = entities_workflow.run('IBM\'s CEO Arvind Krishna is based in the US', language_code="en")
print(entities.get_mention_pairs())
Output of the code sample:
[('IBM', 'Organization'), ('CEO', 'JobTitle'), ('Arvind Krishna', 'Person'), ('US', 'Location')]
Rule-based extraction for general entities
The rule-based model entity-mentions_rbr_xx_stock
identifies syntactically regular entities.
Block name
entity-mentions_rbr_xx_stock
Capabilities
Rule-based extraction handles syntactically regular entity types. The entity block extract entities from the input text. The following types of entities are recognized:
- PhoneNumber
- EmailAddress
- Number
- Percent
- IPAddress
- HashTag
- TwitterHandle
- URLDate
Capabilities | Examples |
---|---|
Extracts syntactically regular entity types from the input text. | My email is [email protected] -> [email protected]\EmailAddress |
Supported languages
Entity extraction is available for the following languages. For a list of the language codes and the corresponding language, see Language codes.
ar, cs, da, de, en, es, fi, fr, he, hi, it, ja, ko, nb, nl, nn, pt, ro, ru, sk, sv, tr, zh-cn, zh-tw
Dependencies on other blocks
None
Code sample
import watson_nlp
# Load a rule-based Entity Mention model for English
rbr_entity_model = watson_nlp.load('entity-mentions_rbr_en_stock')
# Run the entity model on the input text
rbr_entity_mentions = rbr_entity_model.run('My email is [email protected]')
print(rbr_entity_mentions)
Output of the code sample:
{
"mentions": [
{
"span": {
"begin": 12,
"end": 27,
"text": "[email protected]"
},
"type": "EmailAddress",
"producer_id": {
"name": "RBR mentions",
"version": "0.0.1"
},
"confidence": 0.8,
"mention_type": "MENTT_UNSET",
"mention_class": "MENTC_UNSET",
"role": ""
}
],
"producer_id": {
"name": "RBR mentions",
"version": "0.0.1"
}
}
Rule-based extraction for PII entities
The rule-based model entity-mentions_rbr_multi_pii
handles the majority of the types by identifying common formats of PII entities and performing possible checksum or validations as appropriate for each entity type. For example,
credit card number candidates are validated using the Luhn algorithm.
Block name
entity-mentions_rbr_multi_pii
Capabilities
The entity block entity-mentions_rbr_multi_pii
recognizes the following types of entities:
Entity type name | Description | Supported languages |
---|---|---|
BankAccountNumber.CreditCardNumber.Amex | Credit card number for card types AMEX (15 digits). Checked through the Luhn algorithm. | All |
BankAccountNumber.CreditCardNumber.Master | Credit card number for card types Master card (16 digits). Checked through the Luhn algorithm. | All |
BankAccountNumber.CreditCardNumber.Other | Credit card number for left-over category of other types. Checked through the Luhn algorithm. | All |
BankAccountNumber.CreditCardNumber.Visa | Credit card number for card types VISA (16 to 19 digits). Checked through the Luhn algorithm. | All |
EmailAddress | Email addresses, for example: [email protected] | ar, cs, da, de, en, es, fi, fr, he, hi, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sv, tr, zh-cn |
IPAddress | IPv4 and IPv6 addresses, for example, 10.142.250.123 |
ar, cs, da, de, en, es, fi, fr, he, hi, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sv, tr, zh-cn |
PhoneNumber |
Any specific phone number, for example, 0511-123-456 | ar, cs, da, de, en, es, fi, fr, he, hi, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sv, tr, zh-cn |
Some PII entity type names are country-specific. The _
in the following entity types is a placeholder for a country code.
BankAccountNumber.BBAN._
: These are more variable national bank account numbers and the extraction is mostly language-specific without a general checksum algorithm.BankAccountNumber.IBAN._
: Highly standardized IBANs are supported in a language-independent way and with a checksum algorithm.NationalNumber.NationalID._
: These national IDs don’t have a (published) checksum algorithm, and are being extracted on a language-specific basis.NationalNumber.Passport._
: Checksums are implemented only for the countries where a checksum algorithm exists. These are specifically extracted language with additional context restrictions.NationalNumber.TaxID._
: These IDs don't have a (published) checksum algorithm, and are being extracted on a language-specific basis.
Which entity types are available for which languages and which country code to use is listed in the following table.
Country | Entity Type Name | Description | Supported Languages |
---|---|---|---|
Austria | BankAccountNumber.BBAN.AT |
Basic bank account number | de |
BankAccountNumber.IBAN.AT |
International bank account number | all | |
NationalNumber.Passport.AT |
Passport number | de | |
NationalNumber.TaxID.AT |
Tax identification number | de | |
Belgium | BankAccountNumber.BBAN.BE |
Basic bank account number | fr, nl |
BankAccountNumber.IBAN.BE |
International bank account number | all | |
NationalNumber.NationalID.BE |
National identification number | fr, nl | |
NationalNumber.Passport.BE |
Passport number | fr, nl | |
Bulgaria | BankAccountNumber.BBAN.BG |
Basic bank account number | bg |
BankAccountNumber.IBAN.BG |
International bank account number | all | |
NationalNumber.NationalID.BG |
National identification number | bg | |
Canada | NationalNumber.SocialInsuranceNumber.CA |
Social insurance number. Checksum algorithm is implemented. | en, fr |
Croatia | BankAccountNumber.BBAN.HR |
Basic bank account number | hr |
BankAccountNumber.IBAN.HR |
International bank account number | all | |
NationalNumber.NationalID.HR |
National identification number | hr | |
NationalNumber.TaxID.HR |
Tax identification number | hr | |
Cyprus | BankAccountNumber.BBAN.CY |
Basic bank account number | el |
BankAccountNumber.IBAN.CY |
International bank account number | all | |
NationalNumber.TaxID.CY |
Tax identification number | el | |
Czechia | BankAccountNumber.BBAN.CZ |
Basic bank account number | cs |
BankAccountNumber.IBAN.CZ |
International bank account number | cs | |
NationalNumber.NationalID.CZ |
National identification number | cs | |
NationalNumber.TaxID.CZ |
Tax identification number | cs | |
Denmark | BankAccountNumber.BBAN.DK |
Basic bank account number | da |
BankAccountNumber.IBAN.DK |
International bank account number | all | |
NationalNumber.NationalID.DK |
National identification number | da | |
Estonia | BankAccountNumber.BBAN.EE |
Basic bank account number | et |
BankAccountNumber.IBAN.EE |
International bank account number | all | |
NationalNumber.NationalID.EE |
National identification number | et | |
Finland | BankAccountNumber.BBAN.FI |
Basic bank account number | fi |
BankAccountNumber.IBAN.FI |
International bank account number | all | |
NationalNumber.NationalID.FI |
National identification number | fi | |
NationalNumber.Passport.FI |
Passport number | fi | |
France | BankAccountNumber.BBAN.FR |
Basic bank account number | fr |
BankAccountNumber.IBAN.FR |
International bank account number | all | |
NationalNumber.Passport.FR |
Passport number | fr | |
NationalNumber.SocialInsuranceNumber.FR |
Social insurance number. Checksum algorithm is implemented. | fr | |
Germany | BankAccountNumber.BBAN.DE |
Basic bank aAccount number | de |
BankAccountNumber.IBAN.DE |
International bank account number | all | |
NationalNumber.Passport.DE |
Passport number | de | |
NationalNumber.SocialInsuranceNumber.DE |
Social insurance number. Checksum algorithm is implemented. | de | |
Greece | BankAccountNumber.BBAN.GR |
Basic bank account number | el |
BankAccountNumber.IBAN.GR |
International bank account number | all | |
NationalNumber.Passport.GR |
Passport number | el | |
NationalNumber.TaxID.GR |
Tax identification number | el | |
NationalNumber.NationalID.GR |
National ID number | el | |
Hungary | BankAccountNumber.BBAN.HU |
Basic bank account number | hu |
BankAccountNumber.IBAN.HU |
International bank account number | all | |
NationalNumber.NationalID.HU |
National identification number | hu | |
NationalNumber.TaxID.HU |
Tax identification number | hu | |
Iceland | BankAccountNumber.BBAN.IS |
Basic bank account number | is |
BankAccountNumber.IBAN.IS |
International bank account number | all | |
NationalNumber.NationalID.IS |
National identification number | is | |
Ireland | BankAccountNumber.BBAN.IE |
Basic bank account number | en |
BankAccountNumber.IBAN.IE |
International bank account number | all | |
NationalNumber.NationalID.IE |
National identification number | en | |
NationalNumber.Passport.IE |
Passport number | en | |
NationalNumber.TaxID.IE |
Tax identification number | en | |
Italy | BankAccountNumber.BBAN.IT |
Basic bank account number | it |
BankAccountNumber.IBAN.IT |
International bank account number | all | |
NationalNumber.NationalID.IT |
National identification number | it | |
NationalNumber.Passport.IT |
Passport number | it | |
Latvia | BankAccountNumber.BBAN.LV |
Basic bank account number | lv |
BankAccountNumber.IBAN.LV |
International bank account number | all | |
NationalNumber.NationalID.LV |
National identification number | lv | |
Liechtenstein | BankAccountNumber.BBAN.LI |
Basic bank account number | de |
BankAccountNumber.IBAN.LI |
International bank account number | all | |
Lithuania | BankAccountNumber.BBAN.LT |
Basic bank account number | lt |
BankAccountNumber.IBAN.LT |
International bank account number | all | |
NationalNumber.NationalID.LT |
National identification number | lt | |
Luxembourg | BankAccountNumber.BBAN.LU |
Basic bank account number | de, fr |
BankAccountNumber.IBAN.LU |
International bank account number | all | |
NationalNumber.TaxID.LU |
Tax identification number | de, fr | |
Malta | BankAccountNumber.BBAN.MT |
Basic bank account number | mt |
BankAccountNumber.IBAN.MT |
International bank account number | all | |
Netherlands | BankAccountNumber.BBAN.NL |
Basic bank account number | nl |
BankAccountNumber.IBAN.NL |
International bank account number | all | |
NationalNumber.NationalID.NL |
National identification number | nl | |
NationalNumber.Passport.NL |
Passport number | nl | |
Norway | BankAccountNumber.BBAN.NO |
Basic bank account number | no |
BankAccountNumber.IBAN.NO |
International bank account number | all | |
NationalNumber.NationalID.NO |
National identification number | no | |
NationalNumber.NationalID.NO.Old |
National identification number old | no | |
NationalNumber.Passport.NO |
Passport number | no | |
Poland | BankAccountNumber.BBAN.PL |
Basic bank account number | pl |
BankAccountNumber.IBAN.PL |
International bank account number | all | |
NationalNumber.NationalID.PL |
National identification number | pl | |
NationalNumber.Passport.PL |
Passport number | pl | |
NationalNumber.TaxID.PL |
Tax identification number | pl | |
Portugal | BankAccountNumber.IBAN.PT |
International bank account number | all |
BankAccountNumber.BBAN.PT |
Basic bank account number | pt | |
NationalNumber.NationalID.PT |
National identification number | pt | |
NationalNumber.NationalID.PT.Old |
National identification number, obsolete format | pt | |
NationalNumber.TaxID.PT |
Tax identification number | pt | |
Romania | BankAccountNumber.BBAN.RO |
Basic bank account number | ro |
BankAccountNumber.IBAN.RO |
International bank account number | all | |
NationalNumber.NationalID.RO |
National identification number | ro | |
NationalNumber.TaxID.RO |
Tax identification number | ro | |
Slovakia | BankAccountNumber.IBAN.SK |
International bank account number | all |
BankAccountNumber.BBAN.SK |
Basic bank account number | sk | |
NationalNumber.TaxID.SK |
Tax identification number | sk | |
NationalNumber.NationalID.SK |
National identification number | sk | |
Slovenia | BankAccountNumber.IBAN.SI |
International bank account number | all |
Spain | BankAccountNumber.IBAN.ES |
International bank account number | all |
BankAccountNumber.BBAN.ES |
Basic bank account number | es | |
NationalNumber.NationalID.ES |
National identification number | es | |
NationalNumber.Passport.ES |
Passport number | es | |
NationalNumber.TaxID.ES |
Tax identification number | es | |
Sweden | BankAccountNumber.IBAN.SE |
International bank account number | all |
BankAccountNumber.BBAN.SE |
Basic bank account number | sv | |
NationalNumber.NationalID.SE |
National identification number | sv | |
NationalNumber.Passport.SE |
Passport number | sv | |
Switzerland | BankAccountNumber.IBAN.CH |
International bank account number | all |
BankAccountNumber.BBAN.CH |
Basic bank account number | de, fr, it | |
NationalNumber.NationalID.CH |
National identification number | de, fr, it | |
NationalNumber.Passport.CH |
Passport number | de, fr, it | |
NationalNumber.NationalID.CH.Old |
National identification number, obsolete format | de, fr, it | |
United Kingdom of Great Britain and Northern Ireland | BankAccountNumber.IBAN.GB |
International bank account number | all |
NationalNumber.SocialSecurityNumber.GB.NHS |
National Health Service number | all | |
NationalNumber.SocialSecurityNumber.GB.NINO |
National Social Security Insurance number | all | |
NationalNumber.NationalID.GB.Old |
National ID number, obsolete format | all | |
NationalNumber.Passport.GB |
Passport Number. Checksum algorithm is not implemented and hence come with additional context restrictions. | all | |
United States | NationalNumber.SocialSecurityNumber.US |
Social Security number. Checksum algorithm is not implemented and hence come with additional context restrictions. | en |
NationalNumber.Passport.US |
Passport Number. Checksum algorithm is not implemented and hence come with additional context restrictions. | en |
Dependencies on other blocks
None
Code sample
import watson_nlp
# Load the RBR PII model. Note that this is a multilingual model supporting multiple languages.
rbr_entity_model = watson_nlp.load('entity-mentions_rbr_multi_pii')
# Run the RBR model. Note that language code of the input text is passed as a parameter to the run method.
rbr_entity_mentions = rbr_entity_model.run('Please find my credit card number here: 378282246310005. Thanks for the payment.', language_code='en')
print(rbr_entity_mentions)
Output of the code sample:
{
"mentions": [
{
"span": {
"begin": 40,
"end": 55,
"text": "378282246310005"
},
"type": "BankAccountNumber.CreditCardNumber.Amex",
"producer_id": {
"name": "RBR mentions",
"version": "0.0.1"
},
"confidence": 0.8,
"mention_type": "MENTT_UNSET",
"mention_class": "MENTC_UNSET",
"role": ""
}
],
"producer_id": {
"name": "RBR mentions",
"version": "0.0.1"
}
}
Parent topic: Watson Natural Language Processing task catalog