Working with pre-trained models

Last updated: Nov 07, 2024

Watson Natural Language Processing provides pre-trained models in over 20 languages. They are curated by a dedicated team of experts, and evaluated for quality on each specific language. These pre-trained models can be used in production environments without you having to worry about license or intellectual property infringements.

Loading and running a model

To load a model, you first need to know its name. Model names follow a standard convention encoding the type of model (like classification or entity extraction), type of algorithm (like SVM or transformers), language code, and details of the type system.

To find the model that matches your needs, use the task catalog. See Watson NLP task catalog.

You can find the expected input for a given block class (for example to the Entity Mentions model) by using help() on the block class run() method:

import watson_nlp

help(watson_nlp.blocks.keywords.TextRank.run)

Watson Natural Language Processing encapsulates natural language functionality through blocks and workflows. Each block or workflow supports functions to:

load(): load a model
run(): run the model on input arguments
train(): train the model on your own data (not all blocks and workflows support training)
save(): save the model that has been trained on your own data

Blocks

Two types of blocks exist:

Blocks that operate directly on the input document
Blocks that depend on other blocks

Workflows run one more blocks on the input document, in a pipeline.

Blocks that operate directly on the input document

An example of a block that operates directly on the input document is the Syntax block, which performs natural language processing operations such as tokenization, lemmatization, part of speech tagging or dependency parsing.

Example: running syntax analysis on a text snippet:

import watson_nlp

# Load the syntax model for English
syntax_model = watson_nlp.load('syntax_izumo_en_stock')

# Run the syntax model and print the result
syntax_prediction = syntax_model.run('Welcome to IBM!')
print(syntax_prediction)

Blocks that depend on other blocks

Blocks that depend on other blocks cannot be applied on the input document directly. They are applied on the output of one or more preceeding blocks. For example, the Keyword Extraction block depends on the Syntax and Noun Phrases block.

These blocks can be loaded but can only be run in a particular order on the input document. For example:

import watson_nlp
text = "Anna went to school at University of California Santa Cruz. \
        Anna joined the university in 2015."

# Load Syntax, Noun Phrases and Keywords models for English
syntax_model = watson_nlp.load('syntax_izumo_en_stock')
noun_phrases_model = watson_nlp.load('noun-phrases_rbr_en_stock')
keywords_model = watson_nlp.load('keywords_text-rank_en_stock')

# Run the Syntax and Noun Phrases models
syntax_prediction = syntax_model.run(text, parsers=('token', 'lemma', 'part_of_speech'))
noun_phrases = noun_phrases_model.run(text)

# Run the keywords model
keywords = keywords_model.run(syntax_prediction, noun_phrases, limit=2)
print(keywords)

Workflows

Workflows are predefined end-to-end pipelines from a raw document to a final block, where all necessary blocks are chained as part of the workflow pipeline.

For an example of how to call the Entity Mentions workflow, refer to this sample:

import watson_nlp

# Load the workflow model
mentions_workflow = watson_nlp.load('entity-mentions_transformer-workflow_multilingual_slate.153m.distilled')

# Run the entity extraction workflow on the input text
mentions_workflow.run('IBM announced new advances in quantum computing', language_code="en")

Parent topic: Watson Natural Language Processing library