0 / 0
Linguistic resources used in Text Analytics (SPSS Modeler)

Linguistic resources

SPSS Modeler uses an extraction process that relies on linguistic resources. These resources serve as the basis for how to process the text data and extract information to get the concepts, types, and sometimes patterns.

The linguistic resources can be divided into different types:

Category sets
Categories are a group of closely related ideas and patterns that the text data is assigned to through a scoring process.
Libraries
Libraries are used as building blocks for both TAPs and templates. Each library is made up of several dictionaries, which are used to define and manage terms, synonyms, and exclude lists. While libraries are also delivered individually, they are prepackaged together in templates and TAPs.
Templates
Templates are made up of a set of libraries and some advanced linguistic and nonlinguistic resources. These resources form a specialized set that is adapted to a particular domain or context, such as product opinions.
Text analysis packages (TAP)
A text analysis package is a predefined template that is bundled with one or more sets of predefined category sets. TAPs bundle together these resources so that the categories and the resources that were used to generate them are both stored together and reusable.
Note: During extraction, some compiled internal resources are also used. These compiled resources contain many definitions that complement the types in the Core library. These compiled resources cannot be edited.

Project assets for Text Analytics

You can save Text Analytics assets as project assets to create your own custom linguistic resources. You can reuse these assets to work more efficiently in your flows or share them to collaborate with others.

You can save the following types of Text Analytics assets as project assets:
  • Templates
  • Libraries
  • Text Analysis packages (TAP)

Category sets are not saved as project assets. To save any modification you make to a category set, you must download and save the category set or save it as part of a TAP.

For more information about assets and projects, see Assets in Cloud Pak for Data.

Downloading linguistic resources

You can download linguistic resources to manage them directly or to share them across teams.

The following types of linguistic resources can be saved locally:
  • Templates (.lrt)
  • Libraries (.lib)
  • Text Analysis packages (.tap)
  • Category sets (.xlsx)
You can download any library, template, or TAP saved as a project asset:
  1. Within your project, go to the Assets tab and expand SPSS Modeler Components.

    The project assets for Text Analytics are sorted by type.

  2. Find the project asset that you want to download.
  3. Click the Options icon and select Download.

Custom resources

SPSS Modeler has a default set of specialized resources. You can use these resources to benefit from research and fine-tuning for specific languages and specific applications. However, these resources might not be optimized for your context or your data. You can edit and save your changes to these resources to optimize the extraction process for your flow.

You can also create and import custom resources that are uniquely fine-tuned to your organization's data. You can use local files to share these linguistic resources between users and projects. You can add a template, library, or TAP as a project asset from a local file.

For libraries and templates, you can upload them while you are working in the Text Analytics Workbench:

  1. Go to the Resource Editor tab.
  2. Click the Options icon and select Load library or Change template.
  3. Click Import, and then browse to or drag-and-drop a library or template.
  4. Enter details about the asset, and click Add.
  5. Click Apply.

For a custom TAP, you must upload it within the Text Mining node before you run your flow. For more information, see Uploading a custom asset in a Text Mining node.

For a custom category set, you can also upload it within the Text Analytics Workbench. For more information, see Reusing custom category sets.

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more