When creating category models in Text Analytics, there are several different techniques you can choose from to create categories. Because every dataset is unique, the number of techniques and the order in which you apply them may change.
Since your interpretation of the results may be different from someone else's, you may need to experiment with the different techniques to see which one produces the best results for your text data. In Text Analytics, you can create category models in a workbench session in which you can explore and fine-tune your categories further.
In this documentation, category building refers to the generation of category definitions and classification through the use of one or more built-in techniques, and categorization refers to the scoring, or labeling, process whereby unique identifiers (name/ID/value) are assigned to the category definitions for each record or document.
During category building, the concepts and types that were extracted are used as the building blocks for your categories. When you build categories, the records or documents are automatically assigned to categories if they contain text that matches an element of a category's definition.
Text Analytics offers you several automated category building techniques to help you categorize your documents or records quickly.
Grouping techniques
Each of the techniques available is well suited to certain types of data and situations, but often it is helpful to combine techniques in the same analysis to capture the full range of documents records. You may see a concept in multiple categories or find redundant categories.
Semantic Network. This technique begins by identifying the possible senses
of each concept from its extensive index of word relationships and then creates categories by
grouping related concepts. This technique is best when the concepts are known to the semantic
network and are not too ambiguous. It is less helpful when text contains specialized terminology or
jargon unknown to the network. In one example, the concept granny smith apple
could
be grouped with gala apple
and winesap apple
since they are
siblings of the granny smith. In another example, the concept animal
might be
grouped with cat
and kangaroo
since they are hyponyms of
animal
. This technique is available for English text only.
Concept Inclusion. This technique builds categories by grouping multiterm
concepts (compound words) based on whether they contain words that are subsets or supersets of a
word in the other. For example, the concept seat
would be grouped with
safety seat
, seat belt
, and seat belt buckle
.