Certain algorithms in Watson Natural Language Processing can be trained with your own data, for example you can create custom models based on your own data for entity extraction, to classify data, to extract sentiments, and to extract target sentiments.
You can use a built-in transformer-based IBM foundation model called Slate to create your own models. The Slate model has been trained on a very large data set that was preprocessed to filter hate, bias, and profanity.
To create your own classification, entity extraction model, or sentiment model you can fine-tune the Slate model on your own data. To train the model in reasonable time, it's recommended to use GPU-based environments.
You can create custom models and use the following pretrained dictionary and classification models for the shown languages.
Supported languages for pretrained dictionary and classification models
Custom model
Supported language codes
Dictionary models
af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw (all languages supported in the Syntax part of speech tagging)
Regexes
af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw (all languages supported in the Syntax part of speech tagging)
SVM classification with TFIDF
af, ar, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw
af, ar, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw
Transformer model
af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw
Stopword lists
ar, de, en, es, fr, it, ja, ko
For a list of language codes and corresponding languages, see Language codes.
Saving and loading custom models
Copy link to section
If you want to use your custom model in another notebook, save it as a Data Asset to your project. This way, you can export the model as part of a project export.
Use the ibm-watson-studio-lib library to save and load custom models.
To save a custom model in your notebook as a data asset to export and use in another project:
Ensure that you have an access token on the Access control page on the Manage tab of your project. Only project admins can create access tokens. The access token can have viewer or editor access permissions.
Only editors can inject the token into a notebook.
Add the project token to a notebook by clicking More > Insert project token from the notebook action bar and then run the cell. When you run the inserted hidden code cell, a wslib object is created that you
can use for functions in the ibm-waton-studio-lib library. For details on the available ibm-watson-studio-lib functions, see Using ibm-watson-studio-lib for Python.
Run the train() method to create a custom dictionary, regular expression, or classification model and assign this custom model to a variable. For example:
If you want to save a custom dictionary or regular expression model, convert it to a RBRGeneric block. Converting a custom dictionary or regular expression model to a RBRGeneric block is useful if you want to load and execute the model using
the API for Watson Natural Language Processing for Embed. To date, Watson Natural Language Processing for Embed supports
running dictionary and regular expression models only as RBRGeneric blocks. To convert a model to a RBRGeneric block, run the following commands:
# Create the custom regular expression model
custom_regex_block = watson_nlp.resources.feature_extractor.RBR.train(module_folder, language='en', regexes=regexes)
# Save the model to the local file system
custom_regex_model_path = 'some/path'
custom_regex_block.save(custom_regex_model_path)
# The model was saved in a file "executor.zip" in the provided path, in this case "some/path/executor.zip"
model_path = os.path.join(custom_regex_model_path, 'executor.zip')
# Re-load the model as a RBRGeneric block
custom_block = watson_nlp.blocks.rules.RBRGeneric(watson_nlp.toolkit.rule_utils.RBRExecutor.load(model_path), language='en')
Copy to clipboardCopied to clipboard
Save the model as a Data Asset to your project using ibm-watson-studio-lib:
When saving transformer models, you have the option to save the model in CPU format. If you plan to use the model only in CPU environments, using this format will make your custom model run more efficiently. To do that, set the CPU format
option as follows:
To load a custom model to a notebook that was imported from another project:
Ensure that you have an access token on the Access control page on the Manage tab of your project. Only project admins can create access tokens. The access token can have viewer or editor access permissions.
Only editors can inject the token into a notebook.
Add the project token to a notebook by clicking More > Insert project token from the notebook action bar and then run the cell. When you run the the inserted hidden code cell, a wslib object is created that
you can use for functions in the ibm-watson-studio-lib library. For details on the available ibm-watson-studio-lib functions, see Using ibm-watson-studio-lib for Python.
Load the model using ibm-watson-studio-lib and watson-nlp:
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.