Linguistic resources used in SPSS Modeler for Text Analytics
SPSS Modeler uses an extraction process that relies on
linguistic resources. These linguistic resources serve as the basis for how to process the text
data and extract information to get the concepts, types, and sometimes patterns.
The linguistic resources can be divided into different types:
Category sets
Categories are a group of closely related ideas and patterns that the text data is
assigned to through a scoring process.
Libraries
Libraries are used as building blocks for both TAPs and templates. Each library is made
up of several dictionaries, which are used to define and manage terms, synonyms, and
exclude lists. While libraries are also delivered individually, they are prepackaged
together in templates and TAPs.
Templates
Templates consist of a set of libraries and some advanced linguistic and nonlinguistic
resources. These resources form a specialized set that is adapted to a particular domain
or context, such as product opinions.
Text analysis packages (TAP)
A text analysis package is a predefined template that is bundled with one or more
category sets. TAPs bundle together these resources so that the categories and the
resources that were used to generate them are both stored together and reusable. You can
then reuse a TAP to apply the same categories and resources to other flows.
Note: During extraction, some compiled internal linguistic resources are also used. These
compiled resources contain many definitions that complement the types in the Core library.
These compiled resources cannot be edited.
Custom linguistic resources
Copy link to section
SPSS Modeler has a default set of specialized linguistic
resources. You can use these linguistic resources to benefit from research and fine-tuning
for specific languages and specific applications. However, these linguistic resources might
not be optimized for your context or your data. You can edit and save your changes to these
linguistic resources to optimize the extraction process for your flow.
You can also create and import custom linguistic resources that are uniquely fine-tuned to
your organization's data. You can use local files to share these linguistic resources
between users and projects. You can add a template, library, or TAP as a project asset from
a local file.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.