Parsing elements (PRAGMA)
The standardization process begins by identifying tokens within the incoming data. A token can be a single character, a word, or multiple words that are not separated by spaces.
The parsing parameters of the table in the pattern-action file define the tokens. For example, for Latin-based languages, 123-456 has three tokens: 123, the hyphen (-), and 456. A hyphen separates words and is considered to be a token in itself.
Spaces are separate tokens. They are also stripped from the input. For example, 123 MAIN ST consists of three tokens: 123, MAIN, and ST.