Mining text data with Text Analytics in SPSS Modeler
SPSS Modeler offers nodes that are
specialized for handling text. The Text Analytics nodes offer powerful text analytics capabilities
that use advanced linguistic technologies and Natural Language Processing (NLP). They can rapidly
process a large variety of unstructured text data and extract the key concepts. Text Analytics can
also organize and group these concepts into categories.
Around 80% of data held within an organization is in the form of text documents—for
example, reports, web pages, e-mails, and call center notes. Text is a key factor in enabling an
organization to gain a better understanding of their customers' behavior. A system that incorporates
NLP can intelligently extract concepts, including compound phrases. Moreover, knowledge of the
underlying language allows classification of terms into related groups, such as products,
organizations, or people, using meaning and context. As a result, you can quickly determine the
relevance of the information to your needs. These extracted concepts and categories can be combined
with existing structured data, such as demographics, and applied to modeling in SPSS Modeler to yield better and more-focused decisions.
Linguistic systems
are knowledge sensitive—the more information contained in their dictionaries, the higher the quality
of the results. Text Analytics provides a set of linguistic resources, such as dictionaries for
terms and synonyms, libraries, and templates. These nodes further allow you to develop and refine
these linguistic resources to your context. Fine-tuning of the linguistic resources is often an
iterative process and is necessary for accurate concept retrieval and categorization. Custom
templates, libraries, and dictionaries for specific domains, such as CRM and genomics, are also
included.
Tips for getting started:
Watch the following video for an overview of Text Analytics.
In general, anyone who routinely needs to review large volumes of documents to
identify key elements for further exploration can benefit from using Text Analytics. Examples of
some specific applications include:
Scientific and medical research. Explore secondary
research materials, such as patent reports, journal articles, and protocol publications. Identify
associations that were previously unknown (such as a doctor associated with a particular product),
presenting avenues for further exploration. Minimize the time spent in the drug discovery process.
Use as an aid in genomics research.
Investment research. Review daily analyst reports,
news articles, and company press releases to identify key strategy points or market shifts. Trend
analysis of such information reveals emerging issues or opportunities for a firm or industry over a
period of time.
Fraud detection. Use in banking and health-care fraud
to detect anomalies and discover red flags in large amounts of text.
Market research. Use in market research endeavors to
identify key topics in open-ended survey responses.
Blog and Web feed analysis. Explore and build models
using the key ideas found in news feeds, blogs, etc.
CRM. Build models using data from all customer touch
points, such as e-mail, transactions, and surveys.
Nodes
Copy link to section
Along with the many standard nodes in SPSS Modeler, you can also work
with text mining nodes to incorporate the power of text analysis into your flows. These
nodes are available on the node palette, under Text Analytics:
The Language Identifier node is a process node that
scans source text to determine which human language it's written in and then marks that up in a new
field. Primarily designed to be used with large amounts of data, this node is particularly useful
when you have more than one language in your data sources and want to process just one language.
The Text Link Analysis node extracts concepts and
also identifies relationships between concepts based on known patterns within the text. You can use
pattern extraction to discover relationships between your concepts, as well as any opinions or
qualifiers attached to these concepts. The Text Link Analysis (TLA) node offers a more direct way to
identify and extract patterns from your text and then add the pattern results to the dataset in the
flow. But you can also perform TLA using a Text Analytics Workbench session via the Text Mining
modeling node.
The Text Mining node uses linguistic methods to
extract key concepts from the text, allows you to create categories with these concepts and other
data, and offers the ability to identify relationships and associations between concepts based on
known patterns (called text link analysis). You can use this node to explore the text data contents
or to produce either a concept model or category model. The concepts and categories can be combined
with existing structured data, such as demographics, and applied to modeling.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.